# Moving Averages - EWMA, Seasonal Decomposition and Holt Winters

Welcome to the first notebook. We start out with explanations and implementations for some beginner tools (still, very useful) for univariate and multivariate time series analysis. We will explore Moving Averages, Exponential weigthed averages, touch seasonal decomposition of a time  series using the naive decomposition module of statsmodels and the decomposition into trend and cycles using HP filter (useful for visualising the short term fluctuations) and finally discuss Holt Winters triple exponential Smoothening model, a powerful model in its own right for more 'smoother' time series.

This discussion will come in handy when we move to our second notebook, outlining general autoregressive linear models, i.e, ARIMA and GARCH.

## Data import and Imputations

In [1]:
import datetime
import time
import itertools
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from math import sqrt
from matplotlib.pylab import rcParams

import statsmodels.api as sm
from statsmodels.datasets import macrodata
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose, STL
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.filters.hp_filter import hpfilter
import statsmodels.tsa.stattools as stools

from sklearn.metrics import mean_squared_error as MSE, r2_score, mean_absolute_percentage_error as MAPE

In [10]:
df = pd.read_csv('data/AC_data.csv')
df = df.rename(columns={'0': 'Datetime'})

df['Datetime']= pd.to_datetime(df.Datetime)
#tdf = df[['Datetime', 'Open', 'Close', 'High', 'Low']]
df

Unnamed: 0,Datetime,AC 1,AC 2,AC 3,AC 4,AC 5,AC 6,AC 7,AC 8,AC 9,AC 10,AC 11,AC 12,AC 13,AC 14,AC 15,AC 16,AC 17,AC 18
0,2019-08-01 00:00:00,7.518632,8.788315,0.000000,0.000000,2.617045,4.079041,2.782276,4.624447,5.222060,2.151238,1.585072,0.560373,3.142941,2.749470,5.417774,4.113460,3.305072,6.735981
1,2019-08-01 00:01:00,,,,,,,,,,,,,,,,,,
2,2019-08-01 00:02:00,7.426114,8.940615,0.000000,0.000000,2.581625,3.781231,2.529366,5.057423,5.349465,2.414715,2.168184,1.818730,3.085110,2.720484,3.302422,3.986483,3.220588,6.379500
3,2019-08-01 00:03:00,,,,,,,,,,,,,,,,,,
4,2019-08-01 00:04:00,7.052986,9.161103,0.000000,0.000000,2.592095,3.800127,2.332304,6.322521,3.995392,2.237114,3.345624,2.310409,3.132799,2.676861,3.539026,3.797881,3.131560,6.363475
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87835,2019-09-30 23:55:00,0.000000,6.122385,2.192198,2.083315,1.046250,0.000000,3.668421,3.006311,3.614301,1.860847,5.019769,3.154221,3.648026,2.439526,4.062719,1.854352,3.027539,7.259989
87836,2019-09-30 23:56:00,,,,,,,,,,,,,,,,,,
87837,2019-09-30 23:57:00,1.209176,6.152532,2.211421,0.000000,1.448103,0.000000,3.151248,2.871690,3.417942,2.185493,5.174168,2.772349,3.786657,2.060023,4.057525,1.870886,2.991946,7.134647
87838,2019-09-30 23:58:00,,,,,,,,,,,,,,,,,,


We resample for 2 mins given the values are provided at 2 minute intervals. We choose this over dropna as dropna led to removal of some data.

In [12]:
df.set_index('Datetime', drop=True, inplace=True)
df = df.resample('2Min').sum()
df

Unnamed: 0_level_0,AC 1,AC 2,AC 3,AC 4,AC 5,AC 6,AC 7,AC 8,AC 9,AC 10,AC 11,AC 12,AC 13,AC 14,AC 15,AC 16,AC 17,AC 18
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2019-08-01 00:00:00,7.518632,8.788315,0.000000,0.000000,2.617045,4.079041,2.782276,4.624447,5.222060,2.151238,1.585072,0.560373,3.142941,2.749470,5.417774,4.113460,3.305072,6.735981
2019-08-01 00:02:00,7.426114,8.940615,0.000000,0.000000,2.581625,3.781231,2.529366,5.057423,5.349465,2.414715,2.168184,1.818730,3.085110,2.720484,3.302422,3.986483,3.220588,6.379500
2019-08-01 00:04:00,7.052986,9.161103,0.000000,0.000000,2.592095,3.800127,2.332304,6.322521,3.995392,2.237114,3.345624,2.310409,3.132799,2.676861,3.539026,3.797881,3.131560,6.363475
2019-08-01 00:06:00,6.665446,9.065626,0.000000,0.000000,2.575639,3.772891,2.596200,5.805132,3.553778,1.878356,2.737645,2.510972,3.090007,2.666604,4.607439,3.062610,3.063953,6.127366
2019-08-01 00:08:00,6.674838,9.096130,0.000000,0.000000,2.021472,3.155697,2.845417,6.315060,3.042244,1.914650,2.452500,2.171791,3.097202,2.710572,5.332696,2.640150,2.864680,5.662474
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-09-30 23:50:00,0.000000,6.089533,2.322677,1.422932,0.000000,0.000000,3.729901,3.747301,3.578350,1.714157,5.146438,3.253012,2.464426,2.829701,3.693877,1.992432,3.029462,7.202612
2019-09-30 23:52:00,0.000000,6.046765,2.195754,1.861175,0.813454,0.000000,3.666080,3.017628,3.600397,1.547953,4.379771,3.144666,3.045807,2.693068,4.090679,1.944854,3.048858,7.348333
2019-09-30 23:54:00,0.000000,6.122385,2.192198,2.083315,1.046250,0.000000,3.668421,3.006311,3.614301,1.860847,5.019769,3.154221,3.648026,2.439526,4.062719,1.854352,3.027539,7.259989
2019-09-30 23:56:00,1.209176,6.152532,2.211421,0.000000,1.448103,0.000000,3.151248,2.871690,3.417942,2.185493,5.174168,2.772349,3.786657,2.060023,4.057525,1.870886,2.991946,7.134647


In [14]:
null_data = df.isnull().sum()
print(null_data)

AC 1     0
AC 2     0
AC 3     0
AC 4     0
AC 5     0
AC 6     0
AC 7     0
AC 8     0
AC 9     0
AC 10    0
AC 11    0
AC 12    0
AC 13    0
AC 14    0
AC 15    0
AC 16    0
AC 17    0
AC 18    0
dtype: int64


In [15]:
df.describe()

Unnamed: 0,AC 1,AC 2,AC 3,AC 4,AC 5,AC 6,AC 7,AC 8,AC 9,AC 10,AC 11,AC 12,AC 13,AC 14,AC 15,AC 16,AC 17,AC 18
count,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0,43920.0
mean,2.367883,3.850482,2.556306,2.383379,1.415096,2.506499,2.964527,2.536176,3.165096,1.530288,2.392445,2.686915,4.260306,2.902244,4.027397,2.786662,4.199191,6.433688
std,2.27726,2.553006,1.49453,1.800963,0.771625,1.660726,1.486085,1.511464,1.537768,0.692674,1.560585,1.444706,1.865855,1.466128,2.031103,1.614024,1.15976,2.354335
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,2.093627,1.692456,1.238621,0.964796,1.515302,2.17641,1.748428,2.280023,1.110545,1.575432,1.936731,3.243123,1.839066,3.012571,1.865833,3.728744,4.000272
50%,2.081219,3.779912,2.480197,2.300987,1.376681,2.656643,2.99617,2.601814,3.072282,1.50046,2.342875,2.719753,4.208208,2.77875,3.991747,2.665496,4.607949,7.120929
75%,3.653354,5.451999,3.434028,3.470321,1.867081,3.612421,3.890272,3.489748,4.088733,1.960576,3.248267,3.561838,5.373939,3.818686,5.252323,3.688627,4.957075,8.308771
max,11.194789,11.147141,8.430909,9.8795,5.495358,9.929291,9.356776,9.331506,10.026747,4.62503,10.363151,9.887306,11.670892,9.054969,10.349998,10.65443,10.077851,11.979021


In [None]:
## Stationarity

def kpss_test(timeseries):
    kpsstest = stools.kpss(timeseries, regression="c",nlags="auto")
    kpss_output = pd.Series(
        kpsstest[0:3], index=["Test Statistic", "p-value", "Lags Used"]
    )
    for key, value in kpsstest[3].items():
        kpss_output["Critical Value (%s)" % key] = value

    if kpss_output["p-value"] >= 0.05:
        return "Yes"
    else:
        return "No"

def adf_test(timeseries):
    dftest = stools.adfuller(timeseries, autolag="AIC")
    dfoutput = pd.Series(
        dftest[0:4],
        index=[
            "Test Statistic",
            "p-value",
            "#Lags Used",
            "Number of Observations Used",
        ],
    )
    for key, value in dftest[4].items():
        dfoutput["Critical Value (%s)" % key] = value
    
    if dfoutput["p-value"] >= 0.05:
        return "No"
    else:
        return "Yes"