# ARIMA and SARIMA models

**1) Import of the packages and libraries**

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
import seaborn as sn
from statsmodels.tsa.stattools import acf

**2) Import of the data**

The data chosen was from 26/01/2021 to 31/02/2021

Nomenclature:

- df(ap.)() - electrical consumptions        
- df(ap.)(w) - water consumption

After pd.read_csv (insert location of the file in your computer)

In [None]:
#Energy Consumption
df25 = pd.read_csv ()
df25 = df25 ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

df34 = pd.read_csv ()
df34 = df34 ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

df64 = pd.read_csv ()
df64 = df64 ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

#DHW
df25w = pd.read_csv()
df25w = df25w ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

df34w = pd.read_csv()
df34w = df34w ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

df64w = pd.read_csv()
df64w = df64w ['2021-01-26 00:00:00' : '2021-01-31 00:00:00']

**3) Resample the data in a time step of 1 hour**



In [None]:
#Energy consumption
s25 = df25.Value.resample('H').mean() 
s34 = df34.Value.resample('H').mean() 
s64 = df64.Value.resample('H').mean() 

#DHW
s25w = df25w.Value.resample('H').mean() 
s34w = df34w.Value.resample('H').mean() 
s64w = df64w.Value.resample('H').mean() 

**4) Correlation analysis**

In [None]:
#Energy
matriz_corre = {'ap.25' : s25,
                'ap.34' : s34,
                'ap.64' : s64}
df_matriz = pd.DataFrame(matriz_corre)
df1_matriz = df_matriz.corr()
sn.heatmap(df1_matriz, annot=True, vmax=1, vmin=-1, center=0, cmap='vlag')
plt.show()

#DWH
matriz_corre_w = {'ap.25' : s25w,
                'ap.34' : s34w,
                'ap.64' : s64w}
df_matriz_w = pd.DataFrame(matriz_corre_w)
df1_matriz_w = df_matriz_w.corr()
sn.heatmap(df1_matriz_w, annot=True, vmax=1, vmin=-1, center=0, cmap='vlag')
plt.show()

**5) Visualization of the data of the 3 ap.**

In [None]:
#Energy
plt.figure(figsize=(15,6))
plt.plot(s25, label='ap.25', lw=3 ) 
plt.plot(s34, label='ap.34', lw=3 )
plt.plot(s64, label='ap.64', lw=3 )
plt.xlabel('Date')
plt.ylabel('Power (W)')
plt.title('Energy consumption', fontsize=15)
plt.legend(loc='upper left', fontsize=20)
plt.show()

#DHW
plt.figure(figsize=(15,6))
plt.plot(s25w, label='ap.25', lw=3 ) 
plt.plot(s34w, label='ap.34', lw=3 )
plt.plot(s64w, label='ap.64', lw=3 )
plt.xlabel('Date')
plt.ylabel('Flow rate (l/s)')
plt.title('DHW consumption', fontsize=15)
plt.legend(loc='upper left', fontsize=20)
plt.show()

**6) Selection of the ap. (Energy or DHW consumption)**

For example analysis of the ap.25 (Energy)



In [None]:
s = s25 

**7) Verify if the data is stationary using Dickey-Fuller**

Normally if p-value is under 0,05 we consider data as stationary


In [None]:
from statsmodels.tsa.stattools import adfuller
from numpy import log
result = adfuller(s.dropna())
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])

**8) Division of the data in training and testing**

We are going to consider a ratio of 80:20 

- 80% training
- 20% testing

In this case we have 120 values, so 96 values are for training and 24 are for testing

In [None]:
train = s[:96]   
test = s[96:]

**9) Creation of the model using AUTO ARIMA**

AUTO ARIMA automatically selects the parameters of ARIMA model

The inputs are the data fro trainig the model (train) and d (differetiation parameter)

- d=0 data is stationary
- d>0 when data is not stationary

d=None the AUTO ARIMA will select d, but normally is always 0 even when the data is not stationary, because of that we strongly recommend to test for d=0 and d=1 (usually enough)


In [None]:
! pip install pmdarima
import pmdarima as pm
model_auto = pm.auto_arima(train, d=None, trace=True)                 
print(model_auto.summary())

The model summary gives the most suitible ARIMA model (p,d,q)

in model_auto1 = ARIMA (train, order=(0, 1, 0)) replace the values for the parameters (p,d,q) chosen by AUTO ARIMA

In [None]:
from statsmodels.tsa.arima.model import ARIMA  
model_auto1 = ARIMA(train, order=(0, 1, 0)) 
fitted_auto1 = model_auto1.fit() 
fc_auto = fitted_auto1.forecast(24, alpha=0.05)


fc_auto is our forescast, in this case it has 24 values (20% testing)

Plot of the testing, traininig and forescast (ARIMA Model)

In [None]:
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training') 
plt.plot(test, label='test') 
plt.plot(fc_auto, label='forecast ARIMA') 
plt.title('Forecast vs training and testing')
plt.xlabel('Date')
plt.ylabel('Power (W)')
plt.legend(loc='upper left', fontsize=8)
plt.show()

**10) Model evaluation**

3 types of evaluation will be done :

- MAPE
- MAE
- Correlation

In [None]:
def forecast_accuracy_auto (forecast, actual):
    mape = np.mean(np.abs(forecast - actual)/np.abs(actual))
    mae = np.mean(np.abs(forecast - actual))   
    corr = np.corrcoef(forecast, actual)[0,1]   
    return({'mape':mape, mae': mae, 'corr':corr})

forecast_accuracy_auto (fc_auto, test)

A comparison using d=0 and d=1 must be done

The selection of the model is based in 3 criteria:

- Analysis of the errors

- Correlation

- Visual analysis of the graphs