## Time Series Forecasting models (ARIMA and Exponential Smoothing)

ARIMA and Exponential Smoothing are the two most widely used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data. 
![image.png](attachment:814ef416-5208-444a-9690-d5fdbb385f94.png)

In [None]:
import warnings
import itertools
import pandas as pd
import numpy as np
import statsmodels.api as sm # has Arima model 
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

In [None]:
time_series = pd.read_csv("timeseries_rev.csv", parse_dates = True)
time_series.head()

In [None]:
# set date column datatype
time_series['date'] = pd.to_datetime(time_series['date'])
## set the date as index
time_series = time_series.set_index('date')

In [None]:
monthly_series = time_series.total_revenue.resample('M').sum()
monthly_series.head()

In [None]:
plt.figure(figsize = (6,2))
monthly_series.plot();

In [None]:
# components of time series (trend, seasonal and residual) , using the below methods from the statsmodels.api library
components = sm.tsa.seasonal_decompose(monthly_series)
plt.rcParams["figure.figsize"] = (10,6)
components.plot();

In [None]:
trend = components.trend
trend.head()

In [None]:
seasonality = components.seasonal
seasonality.head()

In [None]:
remainder = components.resid
remainder.head()

In [None]:
# Checking for seasonality
plt.figure(figsize=(8,3))
monthly_series.plot()
monthly_series.rolling(window=12).mean().plot()
monthly_series.rolling(window=12).std().plot();

We can argue that the mean is stationary by looking at the graph above , also standard deviation seemd to change very slightly. But we can confirm this result using AD Fuller Test.

In [None]:
ad_fuller_test = sm.tsa.stattools.adfuller(monthly_series, autolag="AIC")
ad_fuller_test # null hypothesis: data is not stationary, mean is varying
#alternate hypothesis: data is stationary, mean is changing 
#test: pvalue is less than 0.05, then we reject null hypothesis (that is accept alternate hypothesis) 

The second value in the ad fuller test above is the pvalue, which is less than 0.05 meaning that we reject null hypothesis and come to the conlusion that our data (monthly_series) is stationary, which was how the graph was looking above.

In [None]:
## ACF and PACF plots
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(monthly_series);

In [None]:
plot_pacf(monthly_series)

In [None]:
## moving averages model
model_ma = sm.tsa.statespace.SARIMAX(monthly_series, order=(0,0,1)) 
results = model_ma.fit() 
results.aic

In [None]:
### Autoregressive model with order 1
model_ar = sm.tsa.statespace.SARIMAX(monthly_series, order=(1,0,0))
results_ar1 = model_ar.fit() 
results_ar1.aic

In [None]:
### Autoregressive model with order 2
model_ar = sm.tsa.statespace.SARIMAX(monthly_series, order=(2,0,0))
results_ar2 = model_ar.fit() 
results_ar2.aic

In [None]:
## ARMA model 
model_arma = sm.tsa.statespace.SARIMAX(monthly_series, order=(1,0,1))
results_arma = model_arma.fit() 
results_arma.aic

In [None]:
### ARIMA model
model_arima = sm.tsa.statespace.SARIMAX(monthly_series, order=(1,1,1))
results_arima = model_arima.fit() 
results_arima.aic

### ARIMA Diagnostics

The best model we have is the ARIMA model.

In [None]:
results_arima.plot_diagnostics(figsize=(15, 12));

### ARIMA Grid Search 

In [None]:
P=D=Q=p=d=q= range(0,3)
S = 12
combinations = list(itertools.product(P,D,Q,p,d,q))

In [None]:
arima_order = [(x[0], x[1], x[2]) for x in combinations]

In [None]:
seasonal_order = [(x[3], x[4], x[5], S) for x in combinations]

In [None]:
results_data = pd.DataFrame(columns=['p', 'd', 'q', 'P', 'D', 'Q', 'AIC'])
## length of combinations
len(combinations)

In [None]:
for i in range(len(combinations)):
    try:
        
        model = sm.tsa.statespace.SARIMAX(monthly_series, order=arima_order[i],
                                     seasonal_order = seasonal_order[i])
        result = model.fit()
        results_data.loc[i, 'p'] = arima_order[i][0]
        results_data.loc[i, 'd'] = arima_order[i][1]
        results_data.loc[i, 'q'] = arima_order[i][2]
        results_data.loc[i, 'P'] = seasonal_order[i][0]
        results_data.loc[i, 'D'] = seasonal_order[i][1]
        results_data.loc[i, 'Q'] = seasonal_order[i][2]
        results_data.loc[i, 'AIC'] = result.aic
    except:
        continue

In [None]:
results_data