## AutoARIMA

- Univariate
- The auto-ARIMA process seeks to identify the most optimal parameters for an ARIMA model, setting on a single fitted ARIMA model. Accept only univariate dataset.
- AR(auto regression model) MA(moving average model) -> ARMA
- ARIMA: Auto Regressive Integrated Moving Average

In [1]:
from darts import TimeSeries
from darts.models import StatsForecastAutoARIMA
from darts.utils.statistics import plot_hist
from darts.metrics import mape
import pandas as pd
import numpy as np
from darts import concatenate
from darts.utils.timeseries_generation import datetime_attribute_timeseries as dt_attr
from darts.dataprocessing.transformers import Scaler


In [2]:
df = pd.read_csv('../../../data/prepared/df_energy_climate_2020.csv')

In [3]:
df['datetime'] = pd.to_datetime(df['datetime'])

In [8]:
df['day_of_week'] = df['datetime'].dt.dayofweek

In [None]:
df = df.drop(['date', 'day_of_week', 'time', 'month'], axis=1)

In [None]:
# Create a TimeSeries, specifying the time and value columns
series = TimeSeries.from_dataframe(
    df, 
    time_col="datetime", 
    value_cols='energy_price',
    freq='H'
)

splitting_point = (int(len(series)*0.20))

# Set aside the second half of series as a validation series
train, val = series[:-splitting_point], series[-splitting_point:]
# train, val = series.split_before(0.75)

In [None]:
model = StatsForecastAutoARIMA()

In [None]:
model.fit(train)

forecast = model.predict(len(val))
print(f'model {model} obtains MAPE: {mape(val, forecast):.2f}%')

## Historical forecasts
Forecasting using historical data.

*Backtesting: General method for seeing how well a model would have done using historical data.

In [None]:
historical_fcast = model.historical_forecasts(
    series,
    start=0.80,
    forecast_horizon=168,
    stride=5,
    overlap_end=False,
    verbose=True
)

In [None]:
series.plot(label='data')
historical_fcast.plot(label='backtest 1 week ahead (AutoARIMA)')

In [None]:
print(f'MAPE = {mape(historical_fcast, series):.2f}%')

## Backtest
It repeatedly builds a training set from the beginning of series. It trains the current model on the training set, emits a forecast of length equal to forecast_horizon, and then moves the end of the training set forward by stride time steps.

In [None]:
raw_errors = model.backtest(
    series,
    start=0.80,
    forecast_horizon=168,
    stride=5,
    reduction=None,
    metric=mape,
    verbose=True
)

In [None]:
plot_hist(
    raw_errors,
    bins=np.arange(0, max(raw_errors), 1),
    title='individual backtest error scores (histogram)'
)

In [None]:
median_error = model.backtest(
    series,
    start=0.8,
    forecast_horizon=168,
    stride=5,
    reduction=np.median,
    metric=mape,
    verbose=True
)

In [None]:
print(f'Median error (MAPE) over all historical forecasts: {median_error:.2f}%')

In [None]:
average_error = model.backtest(
    series,
    start=0.8,
    forecast_horizon=168,
    stride=5,
    reduction=np.mean,
    metric=mape,
    verbose=True
)

In [None]:
print(f'Average error (MAPE) over all historical forecasts: {average_error:.2f}%')