![](../docs/ae_logo.png "Adapt & Enable")
# AE workshop 2023 - Data science

## Part 2 - Time Series Forecasting

We've explored the data and got a feel for it. Let's move on, and see if we can build a model that predicts the temperature for future points in time! 

First things first, let's import what we need.

In [None]:
import numpy as np
import pandas as pd
from plotly import offline
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from statsforecast import StatsForecast
from statsforecast.models import *
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    explained_variance_score,
)
import mlflow
from pathlib import Path

pd.options.plotting.backend = "plotly"

import warnings
from tqdm import TqdmExperimentalWarning

warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=TqdmExperimentalWarning)

from tqdm import tqdm

import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "iframe"

ae_orange = "#FD9129"
ae_orange2 = "#FFD580"
ae_gold = "#FFD700"

!mkdir ../mlruns

In [None]:
mlflow.set_tracking_uri(Path("../mlruns").absolute().as_uri())
print(mlflow.get_tracking_uri())

In [None]:
experiment_name = "ae_forecasting_workshop"
try:
    experiment_id = mlflow.create_experiment(
        name=experiment_name,
    )
except Exception as e:
    print(e)
    experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
print(f"Experiment id: {experiment_id}")

### Read data

Let's load the data we saved from the previous step. If the next cell fails, *make sure you ran through the entire [EDA notebook](./1_EDA.ipynb)*.

In [None]:
df = pd.read_csv("../data/cleaned.csv")
df.index = df.date
df = df.drop(columns=["date"])
df.head()

Alright, looks good! Let's get started. 

### Train-val-test split
When we train models, we need a way to know if they're worth a damn. 

<span style="color:#FD9129">**How would you approach this?**</span>

<span style="color:#FD9129">**Time series are a special case. Any idea why?**</span>

<span style="color:#FD9129">**How can we mitigate this issue?**</span>

In [None]:
# Helper function to generate splits
def train_val_test_split(data: pd.DataFrame, train=0.7, val=0.15, test=0.15):
    # Make sure our splits add up
    total = train + test + val
    if not total == 1:
        print(
            f"Train, validation and test portions don't add up to 1! Currently the total is{total}. Rebalancing."
        )
        train /= total
        val /= total
        test /= total

    # Calculate row counts
    n_rows = len(data)
    train_rows = int(train * n_rows)
    val_rows = int(val * n_rows)
    test_rows = n_rows - (train_rows + val_rows)

    # Slice up the data
    train_data = data.iloc[:train_rows]
    val_data = data.iloc[train_rows : train_rows + val_rows]
    test_data = data.iloc[train_rows + val_rows :]

    print(
        f"Training split:   {train_rows} data points\tfrom {train_data.index.min()} till {train_data.index.max()}"
    )
    print(
        f"Validation split: {val_rows} data points\tfrom {val_data.index.min()} till {val_data.index.max()}"
    )
    print(
        f"Test split:       {test_rows} data points\tfrom {test_data.index.min()} till {test_data.index.max()}"
    )

    return train_data, val_data, test_data

In [None]:
train, val, test = train_val_test_split(df, train=0.8, val=0.2, test=0)

We need metrics to quantify the performance of our models. 

<span style="color:#FD9129">**In our case, what could be good metrics?**</span>

Here's some inspiration: https://scikit-learn.org/stable/modules/model_evaluation.html

In [None]:
# Helper function to give us some basic time series forecasting performance metrics
def forecast_metrics(y_true, y_pred):
    
    # Collect metrics in a dictionary
    metrics = {}
    
    # Add your own metrics!
    # https://scikit-learn.org/stable/modules/model_evaluation.html

    #metrics["my_metric"] = some_metric_calculation_function(y_true, y_pred)
    
    return metrics

In [None]:
# Let's see if they work
a = [1,2,3,4,5,7,8,9,11]
b = [1,3,3,4,5,6,7,9,11]
print(forecast_metrics(a,b))

In [None]:
# Helper function to visualize performance
def visualize_metrics(results_df: pd.DataFrame, baseline=True):
    # Don't mess with our original data
    viz_df = results_df.copy()

    # Get rid of baseline if not requested
    if not baseline:
        del viz_df["HistoricAverage"]

    # Create subplot grid with one row per metric
    fig = make_subplots(rows=1, cols=viz_df.shape[0], horizontal_spacing=0.1)

    # Loop through each row and add a bar chart to the corresponding subplot
    for idx, (name, row) in enumerate(viz_df.iterrows()):
        fig.add_trace(go.Bar(x=row.index, y=row.values, name=name), row=1, col=idx + 1)
        # Give the child a name
        fig.update_yaxes(title_text=name, row=1, col=idx + 1)

    # Update layout and show the figure
    fig.update_layout(
        height=400, width=1200, title="Model comparison", showlegend=False
    )
    fig.show()

### 1. Statistical modelling: ARIMA (and friends)

We'll first try and fit an **A**uto**r**egressive **I**ntegrated **M**oving **A**verage model. That's a mouthful, we know. This statistical model is quite common in the industry (e.g., in stock price forecasting).

* It's *autoregressive* because it uses past values of a variable to predict its future values (i.e., it regresses onto itself).
* It uses a *moving average* of the past errors to predict future errors.
* Finally, it's *integrated* in the sense that it uses differencing to make the data stationary -- that is, it removes trends and seasonal patterns to try and obtain a stationary signal.

Let's fit one! Here, we will use a class called `AutoARIMA`, which automatically finds the best ARIMA model based on an information criterion.

In [None]:
# Many of these packages are picky about the column names, so here's a helper function to prep the data
def prep_df(df: pd.DataFrame) -> pd.DataFrame:
    df = df.reset_index()
    df = df.rename(columns={"date": "ds", "temperature": "y"})
    df["unique_id"] = 1
    df = df[["unique_id", "ds", "y"]]

    return df

In [None]:
train_sf = prep_df(train)
val_sf = prep_df(val)

<span style="color:#FD9129">*Before we go into model fitting, a word on experiment tracking!*</span>

In [None]:
season_length = 12  # Monthly data
horizon = len(val)  # Predict the length of the validation df

# Include the models you imported
models = [
    # This is a baseline model
    HistoricAverage(),
    # This is a better model
    AutoARIMA(season_length=season_length),
    # YOU CAN ADD MODELS HERE
]

# We'll store metrics here
metrics = dict()

# We'll keep track of our forecasts for visualization
forecasts = dict()

# Let's loop through the models
for m in tqdm(models, desc="Fitting models"):
    
    # Start an MLflow run for each model
    with mlflow.start_run(experiment_id=experiment_id, run_name=str(m)):
        
        # Instantiate the StatsForecast class as sf
        model = StatsForecast(df=train_sf, models=[m], freq="MS", n_jobs=-1)

        # Forecast for the defined horizon
        forecasts[str(m)] = model.forecast(horizon)

        # Let's calculate and store metrics for every one of them
        metrics[str(m)] = forecast_metrics(val.temperature, forecasts[str(m)][str(m)])

        # Log the metrics
        mlflow.log_metrics(metrics[str(m)])

        # Log the model TODO: this is probably wrong, use pyfunc instead
        mlflow.sklearn.log_model(model, "model")

In [None]:
forecasts

As you can see, the forecast dict now has model-forecast mappings, corresponding to the validation dates.

If you add another type of model to the list, they'll show up here as extra keys. 

<span style="color:#FD9129">**Have a look [here](https://github.com/Nixtla/statsforecast) if you're curious, or just want to play around.**</span>

Before we calculate performance metrics, let's visualize our predictions!

In [None]:
base_plots = []
prediction_plots = []

# Plot our training data
base_plots.append(
    go.Scatter(
        x=train.index,
        y=train.temperature,
        mode="lines",
        name="training",
        line=dict(color=ae_gold),
    )
)

# Plot our observed validation values
base_plots.append(
    go.Scatter(
        x=val.index,
        y=val.temperature,
        mode="lines",
        name="observed",
        line=dict(color="grey"),
    )
)

# Plot our ARIMA prediction
for model, forecast in forecasts.items():
    prediction_plots.append(
        go.Scatter(
            x=forecast.ds, y=forecast[model], mode="lines", name=f"predicted_{model}"
        )
    )

# Display everything together
fig = go.Figure(data=base_plots + prediction_plots)
fig.update_layout(title="Forecasting in action 🚀")

fig.show(renderer="iframe")

Not bad! Let's calculate some performance metrics.

In [None]:
results_df = pd.DataFrame(metrics)
results_df

In [None]:
visualize_metrics(results_df)

Clearly, the ARIMA approach blows the baseline model out of the water. Good! Now let's try another model: Facebook's [Prophet](https://facebook.github.io/prophet/)!

### 2. Prophet

> Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Prophet is a special case of the Generalized Additive Model. Whereas ARIMA tries to build a formula for future values as a function of past values, Prophet tries to detect “change points”; you can think of Prophet as curve-fitting.

In [None]:
from prophet import Prophet

In [None]:
# We'll use the same train and val dataframes as before
model = Prophet()

# Start an MLflow run
with mlflow.start_run(experiment_id=experiment_id, run_name="Prophet"):
    # Fit the model
    model.fit(train_sf)

    # Log the model
    mlflow.prophet.log_model(model, artifact_path="prophet")

    # Forecast for the defined horizon
    future = model.make_future_dataframe(periods=horizon, freq="MS")
    forecast = model.predict(future)[-len(val) :]

    # Calculate metrics
    metrics["Prophet"] = forecast_metrics(val.temperature, forecast.yhat)

    # Log the metrics
    mlflow.log_metrics(metrics["Prophet"])

In [None]:
# Plot prophet's prediction
prophet_plot = [
    go.Scatter(x=forecast.ds, y=forecast.yhat, mode="lines", name="predicted_prophet")
]

# Display
fig = go.Figure(data=base_plots + prediction_plots + prophet_plot)
fig.update_layout(title="Forecasting in action (part 2) 🚀")

fig.show()

In [None]:
results_df = pd.DataFrame(metrics)
results_df

In [None]:
visualize_metrics(results_df, baseline=True)

Interestingly, it appears that Prophet has a small edge in performance. Inspect the plot: it looks like, as time progresses, the ARIMA model is content to just keep oscillating around a certain value, and fails to capture the upwards trend well. Prophet does better in predicting the later time points. Get rid of the baseline model (HistoricAverage) to get a better look.

For the hell of it, let's go bonkers with some out-of-the-box deep learning: [NeuralProphet](https://neuralprophet.com). Overkill, yay!

### 3. NeuralProphet

> NeuralProphet is an easy to learn framework for interpretable time series forecasting. NeuralProphet is built on PyTorch and combines Neural Network and traditional time-series algorithms, inspired by Facebook Prophet and AR-Net.

In [None]:
from neuralprophet import NeuralProphet

In [None]:
# Adapt our training data to fit the expected format
train_neural = train_sf.drop(columns=["unique_id"])

# Initialize model
overkill_model = NeuralProphet()

# Start a new run
with mlflow.start_run(experiment_id=experiment_id, run_name="Neural Prophet"):
    # Fit the model
    overkill_model.fit(train_neural)

    # Forecast for the defined horizon
    future = overkill_model.make_future_dataframe(
        train_neural, periods=len(val), n_historic_predictions=len(train_neural)
    )
    forecast = overkill_model.predict(future)[-len(val) :]

    # Calculate metrics
    metrics["Neural Prophet"] = forecast_metrics(val.temperature, forecast.yhat1)

    # Log the metrics
    mlflow.log_metrics(metrics["Neural Prophet"])

In [None]:
# Plot prophet's prediction
neural_plot = [
    go.Scatter(
        x=forecast.ds, y=forecast.yhat1, mode="lines", name="predicted_neural_prophet"
    )
]

# Display
fig = go.Figure(data=base_plots + prediction_plots + neural_plot + prophet_plot)
fig.update_layout(title="Forecasting in action (part 2) 🚀")

fig.show()

Looks pretty similar, right? Hard to tell if we gained anything. What does the data tell us?

In [None]:
metrics["Neural Prophet"] = forecast_metrics(val.temperature, forecast.yhat1)
results_df = pd.DataFrame(metrics)
results_df

In [None]:
visualize_metrics(results_df, baseline=False)

You'll have to zoom in again, but it looks like we reduced our error yet a little more. Worth it? Depends on the time and compute we have at hand. The AutoARIMA model was actually quite expensive on both fronts. Prophet, in comparison, did fairly well! NeuralProphet did even better, but took a while longer to fit. Use the right tool for the right job!