# Content #
## [1. Introduction](#1) ##
### [1.1. Time series](#1-1) ###
### [1.2. Stationarity vs Stationality](#1-2) ###
## [2. Time Serie Stationarity](#2) ##
### [2.1. Augmented Dickey-Fuller test](#2-1) ###
## [3. Time series Trend](#3) ##
### [3.1. Dealing with trend](#3-1) ###
#### [3.1.1. Estimating trend](#3-1-1) ####
#### [3.1.2. Eliminating trend](#3-1-2) ####
## [4. Trend and seasonality](#4) ##
### [4.1. Decomposing](#4-1) ###
## [5. Forecasting Time Series](#5) ##
### [5.1. ARIMA Models](#5-1) ###
### [5.2 SARIMAX Models](#5-2) ###
## [6. Conclusions](#6) ##
## [7. References](#7) ##

 <a id="1"></a> <br>
# 1. Introduction #
 <a id="1-1"></a> <br>
   ## 1.1. Time series ##
Time series are chronologic secuence of data, it has four different components listed below
   * Level: Average value in the series
   * Trend: Defined as the trend that the serie follows, it represents if it goes up, down or remain constant
   * Sesonality: Shows if exists any patern present through seasons (periods).
   * Noise: White noise present during sampling or random factors such as politics, natural disasters, strikes, etc.
   
It's important to note that trend and sesonality components are optional since series can be stationary.
   <a id="1-2"></a> <br>
   ## 1.2. Stationarity vs Stationality ##
As mentioned, stationarity is a property of time series that represents trend and seasonality constant while seasonality shows periodic fluctuations. In order to perform an time series analyzing and avoid spurious regressions, its necessary to remove trend and stationality of our data.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # Plotting

# Read data and make Date column index
djia = pd.read_csv("../input/DJIA_table.csv", parse_dates=['Date'], index_col='Date')
print(djia.head())

In [None]:
ts = djia['Open']
ts = ts.head(100)
plt.plot(ts)

<a id="2"></a> <br>
# 2. Time serie Stationarity #

In this section it's going to be mentioned a method to perform a test to proof stationarity. The following code is taken from [here](https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/)
<a id="2-1"></a> <br>
## 2.1. Augmented Dickey-Fuller test ##
This test part of the assumption of the null hypotesis that a unit root exists in a autoregressive model; otherwise, it doesn't. The importance of this test lies on abvoiding spurious regressions, so the model fits the data as much as possible.
In the next graph, we can appreciate two different ways to evaluate if a model has unit roots or not. The graph shows how mean and standard deviation variate over time, in fact the standard deviation varies a little it's enough to reject he null hypotesis.

The Augmented Dickey-Fuller test says that if the p-value is below of one critical values, then we can't reject the null hypotesis, meaning that the model has a unit root.

In [None]:
from statsmodels.tsa.stattools import adfuller
def test_stationarity(tieseries):

    rollmean = tieseries.rolling(window=12).mean()
    rollstd = tieseries.rolling(window=12).std()

    plt.plot(tieseries, color="blue", label="Original")
    plt.plot(rollmean, color="red", label="Rolling Mean")
    plt.plot(rollstd, color="black", label="Rolling Std")    
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)

    #Perform Dickey-Fuller test:
    print('Results of Dickey-Fuller Test:')
    dftest = adfuller(tieseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print(dfoutput,'%f' % (1/10**8))

In [None]:
test_stationarity(ts)

<a id="3"></a> <br>
# 3. Time Series Trend #
Considering the trend as an indicator telling if series goes up or down, we can realize that it's a component affecting directly the serie. So, as a component, we can deal with it getting important information of it such as the direction of the serie. In this case, we want to make predictions that implies making it stationary and for this trend has to be removed.
<a id="3-1"></a> <br>
## 3.1. Dealing with trends ##
This section involves various methods to estimate trend and how to remove it. It includes linear regression, moving average and decomposition, these are evaluated and compared to find out the best result.
<a id="3-1-1"></a> <br>
### 3.1.1. Estimating trend ###
Trend estimating is required to approach the behaviour of time series. However, for forecasting trend is not necessary as it represents a non-stationary serie.

Then, we're going to compare methods for trend estimating and decide the one gives better results. Also, it's important to notice log transformation for simplicity.

In [None]:
import matplotlib.gridspec as gridspec

ts_log = np.log(ts)
fig = plt.figure(constrained_layout = True)
gs_1 = gridspec.GridSpec(2, 3, figure = fig)
ax_1 = fig.add_subplot(gs_1[0, :])
ax_1.plot(ts_log)
ax_1.set_xlabel('time')
ax_1.set_ylabel('data')
plt.title('Logged time serie')

ax_2 = fig.add_subplot(gs_1[1, :])
ax_2.plot(ts)
ax_1.set_xlabel('time')
ax_1.set_ylabel('data')
plt.title('Original time serie')

#### Linear regression ####
It'll give us a linear trend estimation, python implementation requires importing sklearn library. To do a linear regression separate X and Y axis and add a second dimension to Y axis.

In [None]:
from sklearn import datasets, linear_model

ts_wi = ts_log.reset_index()
df_values = ts_wi.values
train_y = df_values[:,1]
train_y = train_y[:, np.newaxis]
train_x = ts_wi.index
train_x = train_x[:, np.newaxis]
regr = linear_model.LinearRegression()
regr.fit(train_x, train_y)
pred = regr.predict(train_x)
plt.plot(ts_wi.Date, pred)
plt.plot(ts_log)

#### Moving Average ####
The moving average works, also, for trend estimation. This was achieved by smoothing the series and getting just the reresentative data.

The period was selected afeter making repeted observation and concluding that, for this case, 12 gives better results.

In [None]:
mov_average = ts_log.rolling(12).mean()
plt.plot(mov_average)
plt.plot(ts_log)

<a id="3-1-2"></a> <br>
### 3.1.2. Eliminating trend ###
Removing trend is an important part of time series forecasting since time series are time dependant and stationary is necessary to make regression works.

Completing this task requires a method to estimate trend, as done it before, and substract this component from time series as a first step to reach stationarity.

Using moving averaga we get the following results.

As Dickey-Fuller test says, our data is under 5% critical value, giving a 90% of certainty that our data is stationarity

In [None]:
ts_log_mov_av_diff = ts_log - mov_average
#ts_log_mov_av_diff.head(12)
ts_log_mov_av_diff.dropna(inplace=True)

test_stationarity(ts_log_mov_av_diff)

In the other hand, we have the linear regression with a value over 10%, meaning that we can improve the results.

In [None]:
ts_log_mov_reg_diff = ts_log - pred[:,0]
#ts_log_mov_av_diff.head(12)
ts_log_mov_reg_diff.dropna(inplace=True)

test_stationarity(ts_log_mov_reg_diff)

Finally we've got another method that is differencing, for this one we substract a previous instant of time to original one. As well, it's possible to increse the order of the differencing by getting the previous point of first difference and we can increse the order by doing the same process. Incresing the order of the difference depends on the following rules (you can se the hole rules in this [page](http://people.duke.edu/~rnau/arimrule.htm))
* If lag-1 term in ACF is zero or negative, or autocorrelations are small or partnerless, then doesn't need a higer order of differencing.
* The optimal order of differencing is at which the standard deviation is lowest.

In [None]:
from statsmodels.graphics import tsaplots as tsa
ts_log_diff = ts_log - ts_log.shift(1)
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = tsa.plot_acf(ts_log_diff.iloc[13:], lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = tsa.plot_pacf(ts_log_diff.iloc[13:], lags=40, ax=ax2)

Once we decide the differencing that better fits our data, perform the stationary test to validate it. As the results describe, the parameter related to stationarity, i.e. test statistic, is under 1% critial value, so we're 99% confident that the time serie is stationary.

In [None]:
ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)

<a id="4"></a> <br>
# 4. Trend and seasonality #

As did when removing trend, seasonality will be done in the same way. Decomposing is a technique referered to remove trend and sesonal components of time serie.

For practicity, the time serie wouldn't be differencing, so graphics will show all components.

<a id="4-1"></a> <br>
## 4.1. Decomposing ##
As mentioned, seasonal decompose is the fastest way to remove trend and seasonality components from a time serie to becoming it stationary.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log, freq=4, model='additive')

trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()

The results of the test proof that the residual of the serie is stationary by comparing the test statistic value it's under 1% critical value, meaning we have 99% of confidence that the serie is stationary.

In [None]:
#ts_decompose = residual
ts_decompose = ts_log_diff
ts_decompose.dropna(inplace=True)
test_stationarity(ts_decompose)

<a id="5"></a> <br>
# 5. Forecasting Time Series #

Once the serie is stationary, forecasting can be done. The ARIMA model has non-seasonal part, but it can handle seasonal data. Selecting the best model implies analize autocorrelation function and partial autocorrelation function. Also, please note the following rules. The complete version of these rules can be consulted [here](http://people.duke.edu/~rnau/arimrule.htm)
* If PACF shows *"a sharp cutoff and/or the lag-1 autocorrelation is positive, (...) then consider adding one or more AR terms to the model"*.
* If ACF shows *"a sharp cutoff and/or the lag-1 autocorrelation is negative, (...)then consider adding an MA term to the model."*

In [None]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = tsa.plot_acf(ts_decompose, lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = tsa.plot_pacf(ts_decompose, lags=40, ax=ax2)

<a id="5-1"></a> <br>
## 5.1. ARIMA Models ##

ARIMA models can be separated as nonseasonal and seasonal. The noneseasonals are classified as $ARIMA(p,d,q)$ model, where
* $p$ is the number of autoregressive terms,
* $d$ is the number of noneseasonal differences needed for stationarity
* $q$ is the number of lagged forecast errors in the prediction equation.

In a general way, the forecast equation can be written as $\hat{y}_t=\mu+{\phi_1}{y_t}_{-1}+\cdots+{\phi_t}{y_t}_{-p}{-\theta_1}{e_t}_{-1}-\cdots-{\theta}_q{e_t}_{-q}$. The moving average parameters are represented as $\theta$, $\phi$ expresses coeficients for difference parameters and $\mu$ represents a constant value.

Selecting the correct number of AR and MA terms for ARIMA model can be performed in two different ways. One is by trying all kind of combinations using a computer software until it finds which fit correctly the time serie, or by looking at the ACF and PACF to get an aproximation. Note that exists more sophisticated alternatives.

Then, AR and MA terms are going to be selected based on the previous ACF and PACF plots. It's possible to determine the AR terms if lag *k* is more significant than the higher orders lags, that is PACF cuts off at lag *k*, thus would be *k* AR components. The same occurs for MA terms and ACF, if lag *k* is more significant than the higher orders lags, then would be *k* components for MA.

By performing lots of test, the best results given by ARIMA model was with $p = 1$, $d = 0$ and $q = 0$.

In [None]:
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Raw time serie
fig = plt.figure(constrained_layout=True) 
gs = gridspec.GridSpec(2, 1, figure=fig)
ax = fig.add_subplot(gs[0, :])
ax.plot(ts_decompose)
ax.set_xlabel('time [months]')
ax.set_ylabel('data')
ax.set_title('Logged data')

model = ARIMA(ts_decompose, order=(1, 0, 0))
res = model.fit(disp=-2)

ax2 = fig.add_subplot(gs[1, :])
ax2.plot(res.fittedvalues)
ax2.set_xlabel('time [months]')
ax2.set_ylabel('data')
ax2.set_title('ARIMA model')

print('ARIMA RMSE: %.6f'% np.sqrt(sum((res.fittedvalues-ts_decompose)**2)/len(ts)))

<a id="5-2"></a> <br>
## 5.2. SARIMAX Models ##
SARIMAX model is an extended version of ARIMA considering a seasonal component. It's compound of $p$, $d$ and $q$ parameters, as ARIMA model, but it includes a sesonal order $P$, $D$, $Q$ and $X$ component which is explanatory and may be used when the residual is expected to exhibit a seasonal trend or pattern.

In [None]:
mod = SARIMAX(ts_decompose, trend='n', order=(1,1,0), seasonal_order=(3,0,3,4))
resSARIMAX = mod.fit()
pred = resSARIMAX.predict()

# Raw time serie
fig = plt.figure(constrained_layout=True) 
gs = gridspec.GridSpec(2, 1, figure=fig)
ax = fig.add_subplot(gs[0, :])
ax.plot(ts_decompose)
ax.set_xlabel('time [months]')
ax.set_ylabel('data')
ax.set_title('Logged data')

ax2 = fig.add_subplot(gs[1, :])
ax2.plot(pred)
ax2.set_xlabel('time [months]')
ax2.set_ylabel('data')
ax2.set_title('SARIMAX model')
print('SARIMAX RMSE: %.6f'% np.sqrt(sum((pred-ts_decompose)**2)/len(ts)))

<a id="6"></a> <br>
# 6. Conclusions #
As noted, the ARIMA model has it advantage over SARIMAX, it easier to implement and finding out its parameter that fits better the data, but it can't deal with seasonal data as well as SARIMAX. While SARIMAX fits better, it's more complicated to spot the correct parametrs. Although there are lot of algorithms that help out to reach the best results such as Kalman filters, likelihood of ARMA process and Hyndman-Khandakar algorithm (used by R to perform the automatic ARIMA function).

In the graph bellow, it's possible to observe the comparasion between ARIMA and SARIMAX models. Where SARIMAX gives better approach to the serie without considering the RMSE of 215.1053. 

In [None]:
predictions_SARIMA_diff = pd.Series(pred, copy=True)
predictions_SARIMA_diff_cumsum = predictions_SARIMA_diff.cumsum()
predictions_SARIMA_log = pd.Series(ts_log.iloc[0], index=ts_log.index)
predictions_SARIMA_log = predictions_SARIMA_log.add(predictions_SARIMA_diff_cumsum,fill_value=0)
predictions_SARIMA = np.exp(predictions_SARIMA_log)

predictions_ARIMA_diff = pd.Series(res.fittedvalues, copy=True)
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(ts_log.iloc[0], index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.plot(predictions_ARIMA)
print('ARIMA RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-ts)**2)/len(ts)))

plt.plot(ts)
plt.plot(predictions_SARIMA)
print('SARIMA RMSE: %.4f'% np.sqrt(sum((predictions_SARIMA-ts)**2)/len(ts)))

<a id="7"></a> <br>
# 7. References #
* Jain, A. (2016) A comprehensive beginner’s guide to create a Time Series Forecast (with Codes in Python). Consulted from: https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/

* Hyndman, R. and Athanasopoulos, G. (2018) Forecasting: Principles and Practice. Consulted from: https://otexts.org/fpp2/seasonal-arima.html#fig:euretail3

* Abu, S. (2016) Seasonal ARIMA with Python. Consulted from: http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

* Nau, R. (2018) Statistical forecasting: notes on regression and time series analysis. Consulted from: https://people.duke.edu/~rnau/arimrule.htm