<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Visualizations-to-choose-our-model:-ACF-(AR)-&amp;-PACF-(MA)" data-toc-modified-id="Visualizations-to-choose-our-model:-ACF-(AR)-&amp;-PACF-(MA)-0.1"><span class="toc-item-num">0.1&nbsp;&nbsp;</span>Visualizations to choose our model: ACF (AR) &amp; PACF (MA)</a></span></li></ul></li><li><span><a href="#Predictive-models" data-toc-modified-id="Predictive-models-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Predictive models</a></span></li><li><span><a href="#FORECASTING:-temperatures" data-toc-modified-id="FORECASTING:-temperatures-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>FORECASTING: temperatures</a></span><ul class="toc-item"><li><span><a href="#Reading-the-data" data-toc-modified-id="Reading-the-data-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Reading the data</a></span></li><li><span><a href="#Seasonality" data-toc-modified-id="Seasonality-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Seasonality</a></span></li><li><span><a href="#ARIMA-model" data-toc-modified-id="ARIMA-model-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>ARIMA model</a></span></li><li><span><a href="#Into-the-future:-forecasting-🔮" data-toc-modified-id="Into-the-future:-forecasting-🔮-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Into the future: forecasting 🔮</a></span></li></ul></li><li><span><a href="#Further-Resources" data-toc-modified-id="Further-Resources-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Further Resources</a></span></li></ul></div>

In [None]:
import pandas as pd
import numpy as np

import datetime
from dateutil.relativedelta import relativedelta

# Viz mantra
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
%config Inlinebackend.figure_format = 'retina'
sns.set_context('poster')
sns.set(rc={'figure.figsize': (16., 9.)})
sns.set_style('whitegrid')

import plotly.express as px
import plotly.graph_objects as go

# Statsmodel
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.api import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose


# Scikit learn
from sklearn.linear_model import LinearRegression #python3 -m pip install scikit-learn

import warnings
warnings.filterwarnings('ignore')

In [None]:
def import_and_clean ():
    df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/a10.csv")
    df.index = pd.to_datetime(df.date)
    infer_ = pd.infer_freq(df.index)
    return df

In [None]:
df = import_and_clean ()
df

### Visualizations to choose our model: ACF (AR) & PACF (MA)

[ACF & PCF](https://towardsdatascience.com/interpreting-acf-and-pacf-plots-for-time-series-forecasting-af0d6db4061c)

`plot_acf(df.Series)`

- What is the X axis
- What is the Y axis? 
- What is the first line?
- What does the line corresponging to x=12 correspond to?

In [None]:
# x = number of lags, months of lag; 
# y = how much it is correlated

In [None]:
def plotting_acf (df):
    plot_acf(df.value);
    plt.title("Autocorrelation of Diabetes drugs: value of correlation across different # of lags", size=20)
    plt.axvline(x=12, c="y", linestyle="--", label="one year")
    plt.axvline(x=16, c="r", linestyle="--")
    plt.legend();

In [None]:
plotting_acf (df)

In [None]:
from statsmodels.tsa.stattools import pacf

`plot_acf(df.Series)`

- What is the X axis
- What is the Y axis? 

- What is the first line?
- What does the line corresponging to x=12 correspond to?

In [None]:
def plotting_partial (df):
    plot_pacf(df.value)
    plt.title("Partial autocorrelation of Diabetes drugs: value of correlation across different # of lags", size=20)
    plt.axvline(x=2, c="r", linestyle="--", label="lags before entering shaded area")
    plt.axvline(x=4, c="r", linestyle="--", label="lags before entering shaded area");

In [None]:
plotting_partial (df)

In [None]:
# Stationary
    # MA 
    # AR

# ARMA


# Non-stationary
    # AR I MA

## Predictive models

"All models are wrong, but some are useful" - George E. P. Box, statistician

![GeorgeEPBox_%28cropped%29.jpg](attachment:GeorgeEPBox_%28cropped%29.jpg)

- **Scenario 1**
    - ACF: gradually decrease
    - PACF: sharp drop
    
    Model: AR (dependant on previous values)
    
- **Scenario 2**
    - ACF: sharp drop
    - PACF: gradual decrease
    
    Model: MA (dependant on errors)

- **Scenario 3**
    - ACF: gradual decrease
    - PACF: gradual decrease
    
    Model: ARMA (combination)


![image.png](attachment:image.png)

-  **AR Model (p)**: AR stands for AutoRegressive. It is the simplest model. Basically Autoregressive models predict the current value of our time series based on past values. For example, we would use the autoregressive model in series of the type where "if we know today's prices, we can make an approximate prediction of tomorrow's prices". This has to do with autocorrelation. The AR model relies only on past period values to predict current period values. They do not work well if the data is not stationary. <br>

- **MA Model (q)**: Moving average models predict the current value of our time series based on past residuals. A simple order moving average model, of order one, would only consider the value of the residual in the previous period.<br>

- **ARMA Models (p, q)**: It is the combination of both AR and MA models, with which an ARMA model would have two orders `(p,q)`, where `p` is the order of the **autoregressive** part and `q` is the order of the **moving average** part<br>

- **ARIMA Model (p, d, q)**: It is an ARMA model applied to the result of integrating the time series a certain number of times, in the order of integration. That is, an ARIMA model of order (p,d,q) consists of integrating the original series d times, and then fitting an ARMA(p,q) model to that integrated series. The objective of the integration is to obtain a stationary series, since the ARMA models perform worse for non-stationary series.<br>

- **SARMA AND SARIMA Models (P, D, R)**: Let's remember that seasonality occurs when certain patterns appear periodically, for example, something that is repeated every year. Therefore, in this case, a simple AR autoregressive model will not describe the data well. Why? Because it only considers the value of the previous period to estimate the current period.<br>

- **MAX models**: MAX models are models that consider exogenous information to explain the endogenous variable (the series we are studying). We are going to have MAX versions of all the models: ARMAX, ARIMAX, SARMAX and SARIMAX.

- Facebook Prophet: https://facebook.github.io/prophet/docs/quick_start.html

- `(p, q, d)`
    - `p`: The number of **lag observations** included in model. How far back I go for my new observation
    - `q`: Size of **moving avarege window** - How big your season is
    - `d`: degree of differencing - if your data is seasonal - we make it stationary : mean does not change
            `diff() 1 diff()diff() 2 diff()diff()diff() 3` 

[diff](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html)

`(AR, I, MA)`

|   Params| (p,    | q,   | d)     |   |
|---|------|-----|-------|---|
|   Acronym| (AR,   | I,  | MA )   |   |
|   Meaning| (lags | int, | error) |   |
|   Visualization| (PACF | -   | ACF )  |   |

![image.png](attachment:image.png)

`Y = (Auto-Regressive Parameters) + (Moving Average Parameters)`

In [None]:
# Base is Autoregression & Moving average
# The models grow in complexity but they have the same base
# They take parameters: p, q, d
# There's criteria to choose one or the other: trends & seasonality
# Mostly you'll be working with ARMA/ARIMA

## FORECASTING: temperatures

[The docs](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html)

In [None]:
from statsmodels.tsa.arima.model import ARIMA

from random import random

# 1. Data
# 2. Fit/train:
    # finding the expression that draws that line
       # maximize: the points that it can describe
    
# 3. Predict

### Reading the data

In [None]:
df = pd.read_csv('datasets/weather_data.csv')
df.head(3)

In [None]:
df.index = pd.DatetimeIndex(df["month"])

### Seasonality

Decompose the time series

- Seasons
- General trend
- The rest

In [None]:
decomp = sm.tsa.seasonal_decompose(df["temperature"])
decomp.plot();

### ARIMA model

- **AR**: Autoregressive model
- **I**: Integrated - we dont use the values we use the diference
- **MA**: Moving avarege model

- `(p, d, q)`
    - `p`: The number of lag observations included in model
    - `d`: degree of differencing - if your data is seasonal - we make it stationary : mean does not change
        `diff() 1 diff()diff() 2 diff()diff()diff() 3`
    - `q`: Size of moving avarege window - how big is your season


- Visualization
   - Autocorrelation `acf` plot
   - Partial autocorrelation: `pacf` plot


`acf`

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <msub>
    <mi>x</mi>
    <mi>t</mi>
  </msub>
  <mo>=</mo>
  <mi>&#x3B4;</mi>
  <mo>+</mo>
  <msub>
    <mi>&#x3D5;</mi>
    <mn>1</mn>
  </msub>
  <msub>
    <mi>x</mi>
    <mrow data-mjx-texclass="ORD">
      <mi>t</mi>
      <mo>&#x2212;</mo>
      <mn>1</mn>
    </mrow>
  </msub>
  <mo>+</mo>
  <msub>
    <mi>w</mi>
    <mi>t</mi>
  </msub>
</math>

In [None]:
plot_acf(df.temperature);
plt.title("Autocorrelation: value of correlation across different # of lags", size=20)
plt.axvline(x=1, c="r", linestyle="--")
plt.axvline(x=3, c="g", linestyle="--")
plt.axvline(x=6, c="g", linestyle="--")

`pacf: 𝑦𝑡,1=𝜙1,1𝑦𝑡−1+𝜖𝑡`

In [None]:
plot_pacf(df.temperature)
plt.title("Partial autocorrelation: value of correlation across different # of lags", size=20)
plt.axvline(x=2, c="r", linestyle="--");

- **Scenario 1**
    - ACF: gradually decrease
    - PACF: sharp drop
    
    Model: AR (dependant on previous values)
    
- **Scenario 2**
    - ACF: sharp drop
    - PACF: gradual decrease
    
    Model: MA (dependant on errors)

- **Scenario 3**
    - ACF: gradual decrease
    - PACF: gradual decrease

`2. fit the model`

In [None]:
model = ARIMA(df["temperature"], order = (6, 0, 2), freq="MS").fit()

In [None]:
model.mae # mean average error

In [None]:
model.mse # mean squared error

In [None]:
model.summary()

### Into the future: forecasting 🔮

`3. predict`

In [None]:
df["forecasting_arima"] = model.predict(start=len(df["temperature"])-20, end=len(df["temperature"])-1)

In [None]:
df[["forecasting_arima", "temperature"]].plot();

`creating new date-data points`

In [None]:
start = datetime.datetime.strptime('2019-01-01', '%Y-%m-%d')
date_list = [start+relativedelta(month=x) for x in range(0,12)]
future = pd.DataFrame(index=date_list, columns=df.columns)

`concatenating them to the original df`

In [None]:
forecast_df = pd.concat([df, future], axis=0)
forecast_df

`plotting the past and the future`

In [None]:
forecast_df["future_prediction"] = model.predict(start=35, end=47)
forecast_df[["future_prediction", "temperature"]].plot();

# RECAP



[Prophet](https://medium.com/mlearning-ai/time-series-forecasting-445e2dde194c)

[Pycaret](https://towardsdatascience.com/time-series-forecasting-with-pycaret-regression-module-237b703a0c63)

## Further Resources
- [Pycaret](https://pycaret.org/)
- [Prophet](https://facebook.github.io/prophet/docs/quick_start.html)
- [Auto ARIMA](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.auto_arima.html)
- [Detailed step by step recognition of model](https://people.duke.edu/~rnau/arimrule.htm)
- [Detailed step by step of time series exploration](https://www.machinelearningplus.com/time-series/time-series-analysis-python/)
- [Modeling with ARIMA](https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/)
