
# MOD300 – Topic 2: Supervised learning – Machines versus human models

In this notebook we look at the Ebola epidemic in West Africa and compare a few
different supervised learning approaches. The idea is to see how simple
machine–learning models (linear regression, polynomial regression, a small
neural network and an LSTM) can be used to model the time evolution of the
number of new Ebola cases in three countries:

- **Guinea**
- **Liberia**
- **Sierra Leone**

This topic builds on Project 2, Exercise 5, where we used a mechanistic SEZR
type model (compartment model) to describe the same data. Here we ignore the
mechanistic structure and instead treat it as a pure data–driven regression
problem.

The tasks are:

* **Task 0** – Reproduce the basic data plots for the three countries
  (new cases + cumulative cases).
* **Task 1** – Fit a straight line using linear regression and inspect the fit.
* **Task 2** – Fit a “better” regression model (here: polynomial regression).
* **Task 3** – Train a small feed–forward neural network (MLP) on the data.
* **Task 4** – Train a simple LSTM for time–series prediction.
* **Task 5** – Discuss and summarise what we see, and reflect on when machine
  learning is useful compared to mechanistic models.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

from ebola_ml import (
    load_ebola_data,
    plot_data_and_cumulative,
    fit_linear_regression,
    plot_linear_regression,
    fit_polynomial_regression,
    plot_polynomial_regression,
    fit_mlp_regressor,
    plot_mlp_regression,
    fit_lstm,
    plot_lstm_results,
)

country_files = {
    "Guinea": "ebola_cases_guinea.dat",
    "Liberia": "ebola_cases_liberia.dat",
    "Sierra Leone": "ebola_cases_sierra_leone.dat",
}

country_data = {}
for name, fname in country_files.items():
    days, new_cases, cumulative = load_ebola_data(fname)
    country_data[name] = {
        "days": days,
        "new_cases": new_cases,
        "cumulative": cumulative,
    }

print("Loaded data for:", ", ".join(country_data.keys()))



## Task 0 – Reproduce the Ebola plots from Project 2

In Project 2 we plotted the number of new outbreaks per day together with the
cumulative number of outbreaks. Here we reproduce the same style of figures
for the three countries using a helper function from `ebola_ml.py`.


In [None]:

for name, data in country_data.items():
    days = data["days"]
    new_cases = data["new_cases"]
    cumulative = data["cumulative"]
    plot_data_and_cumulative(days, new_cases, cumulative, name)



## Task 1 – Linear regression

We now fit a **straight line** to the daily number of new cases as a function
of time. This is a very simple model and we do not expect it to work well, but
it is a good baseline.


In [None]:

for name, data in country_data.items():
    days = data["days"]
    new_cases = data["new_cases"]

    lin_model = fit_linear_regression(days, new_cases)
    plot_linear_regression(days, new_cases, lin_model, name, "New cases per day")

    a = lin_model.coef_[0]
    b = lin_model.intercept_
    print(f"{name}: linear model y = {a:.4f} * t + {b:.2f}")
    print("  -> The straight line clearly cannot capture the peak and decay of the epidemic.\n")



## Task 2 – Better regression model (polynomial regression)

As a slightly more flexible model we use **polynomial regression** (degree 4).


In [None]:

for name, data in country_data.items():
    days = data["days"]
    new_cases = data["new_cases"]

    poly_model, poly = fit_polynomial_regression(days, new_cases, degree=4)
    plot_polynomial_regression(days, new_cases, poly_model, poly, name, "New cases per day")

    print(f"{name}: polynomial regression of degree 4 used.\n")



## Task 3 – Feed–forward neural network (MLP)

Next we try a small **neural network** using scikit–learn's `MLPRegressor`.


In [None]:

for name, data in country_data.items():
    days = data["days"]
    new_cases = data["new_cases"]

    mlp = fit_mlp_regressor(days, new_cases, hidden_layers=(20, 20), max_iter=2000)
    plot_mlp_regression(days, new_cases, mlp, name, "New cases per day")

    print(f"{name}: MLP with two hidden layers (20, 20) trained.\n")



## Task 4 – LSTM model for time–series prediction

Finally we use an **LSTM (Long Short-Term Memory) network**, which is a type
of recurrent neural network designed for time–series.


In [None]:

window = 10
epochs = 10

for name, data in country_data.items():
    new_cases = data["new_cases"].astype(float)

    try:
        lstm_model, X_lstm, y_lstm, y_pred_lstm = fit_lstm(new_cases, window=window, epochs=epochs, verbose=0)
        plot_lstm_results(y_lstm, y_pred_lstm, name)
        print(f"{name}: LSTM trained with window={window}, epochs={epochs}.\n")
    except RuntimeError as e:
        print(f"{name}: could not train LSTM ({e}).\n")



## Task 5 – Discussion and conclusions

(Write your own reflections here. A short template text was provided in the
version generated by ChatGPT and can be edited freely.)
