# Time Series Forecasting - Part 1

<center>
<img src="./assets/timser.gif" width="500">
</center>

In this series, we dive deep into the world of __Time Series Forecasting__.

In order to master this domain, it is necessary to be familiar with all the mathematical concepts and theory surrounding this topic.

In this notebook, we begin our journey by exploring two of the fundamental concepts in Time Series Analysis -

- __Hodrick-Prescott Filter__
- __ETS Decomposition__

We will be using the following libraries to illustrate some of our code examples -
- [Statsmodels](https://www.statsmodels.org/)
- [Pandas](https://pandas.pydata.org/)
- [Numpy](https://numpy.org/)
- [Plotly](https://plotly.com/)

__Let's get started!__

First let's install and import the necessary libraries

In [109]:
%pip install pandas numpy plotly statsmodels --quiet

Note: you may need to restart the kernel to use updated packages.


In [110]:
import pandas as pd
import numpy as np
from statsmodels.tsa.filters.hp_filter import hpfilter
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.express as px
from plotly.subplots import make_subplots

### Properties of Time Series Data

Time Series Data has particular properties, let's take a look at some plots and discuss some important terms.

#### Trends

<center>
<img src="./assets/trends.png" width="500">
</center>

Three types of Trends - 
<ul><li>Upward</li><li>Horizontal/Stationary</li><li>Downward</li></ul>

#### Seasonality

<center>
<img src="./assets/seasonality.jpeg" width="500">
<p>Repeating trends noticed <br> (In this example, every three months)</p>
</center>

#### Cyclical

<center>
<img src="./assets/cyclical.png" width="750">
<p>Trends with no set repetition</p>
</center>

#### Components of Time Series Data

<center>
<img src="./assets/components.png" width="500">
</center>

### Hodrick-Prescott Filter

The Hodrick-Prescott Filter separates a time-series $y_t$ into a trend component $\tau_t$ and a cyclical component $c_t$

\begin{align}
y_t = \tau_t + c_t
\end{align}

The trend and cyclical components are determined by minimizing the following quadratic loss function, where $\lambda$ is the _smoothing_ parameter.

\begin{align}
min_{\tau_t} \sum_{t=1}^{T} {c_t}^2 + \lambda \sum_{t=1}^{T} [(\tau_t - \tau_{t-1}) - (\tau_{t-1} - \tau_{t-2})]^2
\end{align}

The $\lambda$ value above handles variations in the growth rate of the trend component

The ideal values of $\lambda$ have already been identified -

- __Default Data__ - 1600 (_Recommended_)
- __Annual Data__ - 6.25
- __Monthly Data__ - 129,600

Let's implement the Hodrick-Prescott Filter using the Statsmodels Library

In [111]:
# Read in data
gdp = pd.read_csv("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Time_Series/Data/GDPC1.csv")

# Convert date column to be of data type datetime64
gdp['DATE'] = pd.to_datetime(gdp['DATE'])

# Set the DATE as the index
gdp.set_index('DATE', inplace=True)

In [112]:
gdp.shape

(292, 1)

In [113]:
gdp.head()

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
1947-01-01,2033.061
1947-04-01,2027.639
1947-07-01,2023.452
1947-10-01,2055.103
1948-01-01,2086.017


In [114]:
# Plot the GDP Trends
fig = px.line(x=gdp.index, y=gdp['GDPC1'], title='Time Series', labels={'x':'Date', 'y':'GDP'})

# Display the Chart
fig.show()

In [115]:
# Tuple Unpacking
# Lambda set to 1600 (Default) because data is quarterly
gdp_cycle, gdp_trend = hpfilter(gdp['GDPC1'], lamb=1600)

In [116]:
# Plot the Trend Component
fig = px.line(x=gdp_trend.index, y=gdp_trend, title='Trend Component', labels={'x':'Date', 'y':'GDP'})

# Display the Chart
fig.show()

In [117]:
# Plot the Cyclical Component
fig = px.line(x=gdp_cycle.index, y=gdp_cycle, title='Cyclical Component', labels={'x':'Date', 'y':'Deviation'})

# Display the chart
fig.show()

In [118]:
# Overlapping the Trend with the Original Time Series Data
fig = px.line(gdp, title='Time Series Analysis')

# Add each column to the line chart
fig.add_scatter(x=gdp_trend.index, y=gdp_trend, name='GDP Trends')

# Display the chart
fig.show()

### ETS Decomposition

<center>
<img src="./assets/ets.png" width="400">
</center>

As its name suggests ETS Decomposition breaks down a Time Series into three separate components,
- __Trend__ - General Growth Pattern
- __Seasonality__ - Repeating Patterns
- __Error__ (__Residual__) - Not explained by Trends or Seasonality also known as Noise

The Hodrick-Prescott Filter explored above can be seen as a simplistic version of ETS Decomposition.

Two types of main ETS Models -
- __Additive Model__ - Use when Trend is more _Linear_, Seasonality and Trend Components seem _constant_ over time. 
- __Multiplicative Model__ - Use when Trend increases or decreases at a non-linear rate.

In [119]:
# Plot the GDP Trends
fig = px.line(x=gdp.index, y=gdp['GDPC1'], title='Time Series', labels={'x':'Date', 'y':'GDP'})

# Display the Chart
fig.show()

In [120]:
result = seasonal_decompose(gdp['GDPC1'], model='multiplicative')

In [121]:
# Create subplots with one row and four columns
fig = make_subplots(rows=4, cols=1, subplot_titles=['Original Time Series', 'Trend Component', 'Seasonal Component', 'Residual Component'])

# Add the original time series subplot
fig.add_trace(px.line(x=result.observed.index, y=result.observed, color_discrete_sequence=['black']).data[0], row=1, col=1)

# Add the trend component subplot
fig.add_trace(px.line(x=result.trend.index, y=result.trend, color_discrete_sequence=['blue']).data[0], row=2, col=1)

# Add the seasonal component subplot
fig.add_trace(px.line(x=result.seasonal.index, y=result.seasonal, color_discrete_sequence=['green']).data[0], row=3, col=1)

# Add the residual component subplot
fig.add_trace(px.line(x=result.resid.index, y=result.resid, color_discrete_sequence=['red']).data[0], row=4, col=1)

# Update layout
fig.update_layout(height=1000, title='ETS Decomposition Components', showlegend=False)

# Show the figure
fig.show()
