![](../docs/ae_logo.png "Adapt & Enable")
# AE workshop 2023 - Data science

## Part 2 - Time Series Forecasting

We've explored the data and got a feel for it. Let's move on, and see if we can build a model that predicts the temperature for future points in time! 

First things first, let's import what we need.

In [1]:
import numpy as np
import pandas as pd
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from statsforecast import StatsForecast
from statsforecast.models import *
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    explained_variance_score,
)

pd.options.plotting.backend = "plotly"
import warnings
from tqdm import TqdmExperimentalWarning

warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=TqdmExperimentalWarning)
import plotly.io as pio

pio.templates.default = "plotly_white"

ae_orange = "#FD9129"
ae_orange2 = "#FFD580"
ae_gold = "#FFD700"

  from tqdm.autonotebook import tqdm


### Read data

Let's load the data we saved from the previous step. If the next cell fails, *make sure you ran through the entire [EDA notebook](./1_EDA.ipynb)*.

In [2]:
df = pd.read_csv("../data/cleaned.csv")
df.index = df.date
df = df.drop(columns=["date"])
df.head()

Unnamed: 0_level_0,temperature,uncertainty,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1750-01-01,3.034,3.574,1,1750
1750-02-01,3.083,3.702,2,1750
1750-03-01,5.626,3.076,3,1750
1750-04-01,8.49,2.451,4,1750
1750-05-01,11.573,2.072,5,1750


Alright, looks good! Let's get started. 

### Train-val-test split

First, as with any machine learning algorithm, we'll want to define on which data we'll **train** the model, and which part we'll use to **validate** whether our model is any good. In addition, we should define a hold-out **test** set. Validation splits can be used to tweak our model (and its hyperparameters), but the test set serves to yield a final benchmark for our model's performance. We'll skip the test set for this workshop, just this once.

We'll do this in a simple way, setting aside the final segments of the data for validation. It's good to be aware of slightly more complex strategies, such as the use of [sklearn's TimeSeriesSplit](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html). Since regular randomized cross-validation won't work for time series (you can figure out why, right?), other validation strategies were cooked up, such as *rolling window cross-validation* or *walk-forward validation*. The latter is implemented in `TimeSeriesSplit`.

![TimeSeriesSplit](https://scikit-learn.org/stable/_images/sphx_glr_plot_cv_indices_013.png)

In [3]:
# Helper function to generate splits
def train_val_test_split(data: pd.DataFrame, train=0.7, val=0.15, test=0.15):
    # Make sure our splits add up
    total = train + test + val
    if not total == 1:
        print(
            f"Train, validation and test portions don't add up to 1! Currently the total is{total}. Rebalancing."
        )
        train /= total
        val /= total
        test /= total

    # Calculate row counts
    n_rows = len(data)
    train_rows = int(train * n_rows)
    val_rows = int(val * n_rows)
    test_rows = n_rows - (train_rows + val_rows)

    # Slice up the data
    train_data = data.iloc[:train_rows]
    val_data = data.iloc[train_rows : train_rows + val_rows]
    test_data = data.iloc[train_rows + val_rows :]

    print(
        f"Training split:   {train_rows} data points\tfrom {train_data.index.min()} till {train_data.index.max()}"
    )
    print(
        f"Validation split: {val_rows} data points\tfrom {val_data.index.min()} till {val_data.index.max()}"
    )
    print(
        f"Test split:       {test_rows} data points\tfrom {test_data.index.min()} till {test_data.index.max()}"
    )

    return train_data, val_data, test_data

In [4]:
train, val, test = train_val_test_split(df, train=0.8, val=0.2, test=0)

Training split:   2544 data points	from 1750-01-01 till 1962-12-01
Validation split: 636 data points	from 1963-01-01 till 2015-12-01
Test split:       0 data points	from nan till nan


### 1. Statistical modelling: ARIMA (and friends)

We'll first try and fit an **A**uto**r**egressive **I**ntegrated **M**oving **A**verage model. That's a mouthful, we know. This statistical model is quite common in the industry (e.g., in stock price forecasting). 

* It's *autoregressive* because it uses past values of a variable to predict its future values (i.e., it regresses onto itself). 
* It uses a *moving average* of the past errors to predict future errors. 
* Finally, it's *integrated* in the sense that it uses differencing to make the data stationary -- that is, it removes trends and seasonal patterns to try and obtain a stationary signal.

Let's fit one! Here, we will use a class called `AutoARIMA`, which automatically finds the best ARIMA model based on an information criterion.

In [30]:
# Many of these packages are picky about the column names, so here's a helper function to prep the data
def prep_df(df: pd.DataFrame) -> pd.DataFrame:
    df = df.reset_index()
    df = df.rename(columns={"date": "ds", "temperature": "y"})
    df["unique_id"] = 1
    df = df[["unique_id", "ds", "y"]]

    return df

In [31]:
train_sf = prep_df(train)
val_sf = prep_df(val)

In [32]:
season_length = 12  # Monthly data
horizon = len(val)  # Predict the length of the validation df

# Include the models you imported
models = [
    # This is a baseline model
    HistoricAverage(),
    # This is a better model
    AutoARIMA(season_length=season_length),
    # YOU CAN ADD MODELS HERE
]

# Instantiate the StatsForecast class as sf
sf = StatsForecast(df=train_sf, models=models, freq="MS", n_jobs=-1)

# Forecast for the defined horizon
forecast = sf.forecast(horizon)

In [33]:
forecast.head()

Unnamed: 0_level_0,ds,HistoricAverage,AutoARIMA
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1963-01-01,8.203653,2.947784
1,1963-02-01,8.203653,3.428498
1,1963-03-01,8.203653,5.24926
1,1963-04-01,8.203653,8.438313
1,1963-05-01,8.203653,11.231355


As you can see, the forecast data frame now has an `AutoARIMA` column with forecast values, corresponding to the validation dates.

If you add another type of model to the list, they'll show up here as extra columns. **Have a look [here](https://github.com/Nixtla/statsforecast) if you're curious, or just want to play around.**

Before we calculate performance metrics, let's visualize our predictions!

In [34]:
base_plots = []
prediction_plots = []

# Plot our training data
base_plots.append(
    go.Scatter(
        x=train.index,
        y=train.temperature,
        mode="lines",
        name="training",
        line=dict(color=ae_gold),
    )
)

# Plot our observed validation values
base_plots.append(
    go.Scatter(
        x=val.index,
        y=val.temperature,
        mode="lines",
        name="observed",
        line=dict(color="grey"),
    )
)

# Plot our ARIMA prediction
for model in forecast.columns[1:]:
    prediction_plots.append(
        go.Scatter(
            x=forecast.ds, y=forecast[model], mode="lines", name=f"predicted_{model}"
        )
    )

# Display
fig = go.Figure(data=base_plots + prediction_plots)
fig.update_layout(title="Forecasting in action 🚀")

fig.show()

Not bad! Let's calculate some performance metrics.

In [None]:
# Helper function to give us some basic time series forecasting performance metrics
def forecast_metrics(y_true, y_pred):
    # Collect metrics in a dictionary
    metrics = {}

    metrics["MSE"] = mean_squared_error(y_true, y_pred)
    metrics["MAE"] = mean_absolute_error(y_true, y_pred)
    metrics["RMSE"] = np.sqrt(metrics["MSE"])
    metrics["R-squared"] = r2_score(y_true, y_pred)
    metrics["EV"] = explained_variance_score(y_true, y_pred)

    return metrics

In [None]:
# These models were implemented
models = forecast.columns[1:]

# Let's calculate and store metrics for every one of them
results = dict()
for model in models:
    results[model] = forecast_metrics(val.temperature, forecast[model])

In [None]:
results_df = pd.DataFrame(results)
results_df

In [None]:
# Helper function to visualize performance
def visualize_metrics(results_df: pd.DataFrame, baseline=True):
    # Don't mess with our original data
    viz_df = results_df.copy()

    # Get rid of baseline if not requested
    if not baseline:
        del viz_df["HistoricAverage"]

    # Create subplot grid with one row per metric
    fig = make_subplots(rows=1, cols=viz_df.shape[0], horizontal_spacing=0.1)

    # Loop through each row and add a bar chart to the corresponding subplot
    for idx, (name, row) in enumerate(viz_df.iterrows()):
        fig.add_trace(go.Bar(x=row.index, y=row.values, name=name), row=1, col=idx + 1)
        # Give the child a name
        fig.update_yaxes(title_text=name, row=1, col=idx + 1)

    # Update layout and show the figure
    fig.update_layout(
        height=400, width=1200, title="Model comparison", showlegend=False
    )
    fig.show()

In [None]:
visualize_metrics(results_df)

Clearly, the ARIMA approach blows the baseline model out of the water. Good! Now let's try another model: Facebook's [Prophet](https://facebook.github.io/prophet/)!

### 2. Prophet

> Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Prophet is a special case of the Generalized Additive Model. Whereas ARIMA tries to build a formula for future values as a function of past values, Prophet tries to detect “change points”; you can think of Prophet as curve-fitting.

In [None]:
from prophet import Prophet

In [None]:
model = Prophet()
model.fit(train_sf)

In [None]:
future = model.make_future_dataframe(periods=len(val), freq="MS")
forecast = model.predict(future)[-len(val) :]

In [None]:
# Plot prophet's prediction
prophet_plot = [
    go.Scatter(x=forecast.ds, y=forecast.yhat, mode="lines", name="predicted_prophet")
]

# Display
fig = go.Figure(data=base_plots + prediction_plots + prophet_plot)
fig.update_layout(title="Forecasting in action (part 2) 🚀")

fig.show()

In [None]:
results["Prophet"] = forecast_metrics(val.temperature, forecast.yhat)
results_df = pd.DataFrame(results)
results_df

In [None]:
visualize_metrics(results_df, baseline=True)

Interestingly, it appears that Prophet has a small edge in performance. Inspect the plot: it looks like, as time progresses, the ARIMA model is content to just keep oscillating around a certain value, and fails to capture the upwards trend well. Prophet does better in predicting the later time points. Get rid of the baseline model (HistoricAverage) to get a better look.

For the hell of it, let's go bonkers with some out-of-the-box deep learning: [NeuralProphet](https://neuralprophet.com). Overkill, yay!

### 3. NeuralProphet

> NeuralProphet is an easy to learn framework for interpretable time series forecasting. NeuralProphet is built on PyTorch and combines Neural Network and traditional time-series algorithms, inspired by Facebook Prophet and AR-Net.

In [None]:
from neuralprophet import NeuralProphet

In [None]:
train_neural = train_sf.drop(columns=["unique_id"])
overkill_model = NeuralProphet()
overkill_model.fit(train_neural)

In [None]:
future = overkill_model.make_future_dataframe(
    train_neural, periods=len(val), n_historic_predictions=len(train_neural)
)
forecast = overkill_model.predict(future)[-len(val) :]

In [None]:
forecast

In [None]:
# Plot prophet's prediction
neural_plot = [
    go.Scatter(
        x=forecast.ds, y=forecast.yhat1, mode="lines", name="predicted_neural_prophet"
    )
]

# Display
fig = go.Figure(data=base_plots + prediction_plots + neural_plot + prophet_plot)
fig.update_layout(title="Forecasting in action (part 2) 🚀")

fig.show()

Looks pretty similar, right? Hard to tell if we gained anything. What does the data tell us?

In [None]:
results["Neural Prophet"] = forecast_metrics(val.temperature, forecast.yhat1)
results_df = pd.DataFrame(results)
results_df

In [None]:
visualize_metrics(results_df, baseline=False)

You'll have to zoom in again, but it looks like we reduced our error yet a little more. Worth it? Depends on the time and compute we have at hand. The AutoARIMA model was actually quite expensive on both fronts. Prophet, in comparison, did fairly well! NeuralProphet did even better, but took a while longer to fit. Use the right tool for the right job!

In [None]:
# TODO: We'll go with Neural Prophet -- package the model (to pickle?)