[Time series] Forecasting
=========

Forecasting involves predicting the behavior of a variable that typically has some **stochastic** element.

A key distinction relates to whether we are forecasting based on **extrapolating a single variable**, or using a **multivariate** approach to improve the forecasting model.

A key concept is **periodicity** or **seasonality** -- periodic repeating patterns in the data that must be accounted for. Commonly quoted is the **STL decomposition** (seasonality, trend, residual):

$y(t) = S(t) + T(t) + R(t)$


Terminology:
-----------
 - **Trend**: the (non-seasonal) pattern in the data
 - **Seasonality**: the seasonal (periodic) variation in the data (also weekly, monthly trends)
 - **Cyclic**: Patterns that are not of a fixed frequency. An example is the ups and downs of the "business cycle".

Forecasting Tools:
-------
 - Decomposition models:
 - Smoothing models:
 - **(S)ARIMA models (Auto-Regressive Integrated Moving Average)**: 
 - **TBATS**: Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal Components. Able to account for multiple seasonalities (e.g. weekly and yearly simultaneously)
 - **NNETAR (Neural Network AutoRegression)**
 - **LSTMs**: typically not used for a single trend forecast, but instead works well with a lot of input data.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 15})

In [None]:
df = pd.read_csv('elecequip.csv')

fig,ax = plt.subplots(figsize=(16, 6))
ax.plot(df['Index'],df['x'])
ax.plot(df['Index'],df['x'].rolling(window=12,center=True,min_periods=1).mean(),label='rolling average',color='red')
ax.plot(df['Index'],df['x'].diff().fillna(0))

ax.set_xlabel('year')
ax.set_ylabel('new orders index')

In [None]:
# Plot data monthly
df['Month'] = (df['Index'] % 1 )*12
df_byMonth = df.groupby('Month').mean()

fig,ax = plt.subplots(figsize=(8, 6))
ax.plot(df_byMonth.index,df_byMonth['x'])
ax.set_xlabel('Month')
ax.set_ylabel('<new orders index>')
df_byMonth

Seasonal Decompose
=======

This function essentially applies a rolling average of sorts to the data in order to extract the larger trend, and then subtracts this trend from the data. The seasonality is then the mean in each month of this subtracted data. Finally, the residual is the remainder with the monthly periodic trend subtracted.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(df['x'], model='additive', period=12)

fig, (ax1,ax2,ax3,ax4) = plt.subplots(4,1, figsize=(10,20))
ax1.plot(result.observed); ax1.set_xlabel('observed')
ax2.plot(result.trend); ax2.set_xlabel('trend')
ax2.plot(df['x'].rolling(window=12,center=True,min_periods=1).mean())
ax3.plot(result.resid); ax3.set_xlabel('resid')
ax4.plot(result.seasonal[:24]); ax4.set_xlabel('seasonal')
ax4.plot(df_byMonth['x']-95)

Autocorrelation
=========

See also the Hida Challenge description.

In [None]:
fig,ax = plt.subplots(figsize=(16, 6))
pd.plotting.autocorrelation_plot(df['x'],ax=ax,label='autocorrelation of data')

Bibliography
-------

 - Hyndman & Athanasopoulos, **Forecasting: Principles and Practice**, https://otexts.com/fpp2/
 - Davide Burba, https://towardsdatascience.com/an-overview-of-time-series-forecasting-models-a2fa7a358fcb