# Basic Time Series Forecasting Tequniques

[Link to the video](https://www.youtube.com/watch?v=_mBWlAA4Am4&list=PLKmQjl_R9bYd32uHImJxQSFZU5LPuXfQe&index=9)

In [7]:
import sympy as sym
import pandas as pd
import numpy as np
import plotly.express as px
from statsmodels.graphics.tsaplots import plot_pacf
sym.init_printing()
from IPython.display import display, Math
import matplotlib.pyplot as plt
import plotly.graph_objects as go

## Intro

Forecasting is a wide domain with numerous applications in almost every industry. Due to this, the range of forecasting models is also very large with each model having its own pros and cons.

In this article, I want to go over some basic and simple forecasting models. Despite their simplicity, these models can offer good results in practice and provide a good basis to iterate from.

## Average Forecast

The first model we will consider is the average forecast. This model simply assumes that all future values are equal to the mean of all the previous observations:

![forecasting](../images/forecasting1.png)

Where h is the future time-step we are forecasting for, T is the length of the time series, y_t is an observed value at time t and y_bar is the mean of the observed values. For this model we must have some past data available to compute the forecast.

In [13]:
# Read in the data
data = pd.read_csv('../data/airline.csv')
data.head()

Unnamed: 0,Month,#Passengers
0,1949-01,112
1,1949-02,118
2,1949-03,132
3,1949-04,129
4,1949-05,121


In [15]:
data['Month'] = pd.to_datetime(data['Month'])

# split and train the data
train = data.iloc[:-int(len(data) * 0.2)]
test = data.iloc[-int(len(data) * 0.2):]


In [16]:
def plot_func(forecast, title):
    """Function to plot the forecasts."""
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=train['Month'], y=train['#Passengers'], name='Train'))
    fig.add_trace(go.Scatter(x=test['Month'], y=forecast, name='Test'))
    fig.update_layout(template="simple_white", font=dict(size=18), title_text=title,
                      width=650, title_x=0.5, height=400, xaxis_title='Date',
                      yaxis_title='Passenger Volume')

    return fig.show()

In [18]:
# Average forecast
test['mean_forecast'] = train['#Passengers'].mean()




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [20]:
test.head()

Unnamed: 0,Month,#Passengers,mean_forecast
116,1958-09-01,404,242.232759
117,1958-10-01,359,242.232759
118,1958-11-01,310,242.232759
119,1958-12-01,337,242.232759
120,1959-01-01,360,242.232759


In [19]:
plot_func(test['mean_forecast'], 'Average Forecast')

## Naive Forecasting

The second model, naive forecasting, is setting the future forecast equal to the latest observed value:

$ yT + h = yT $

This model is considered the benchmark for any forecast and is often used to model stock market and financial data due to its erratic nature. The naive model can also be called random-walk-without-drift model.

In [21]:
# Naive forecast
test['naive_forecast'] = train['#Passengers'].iloc[-1]
plot_func(test['naive_forecast'], 'Naive Forecast')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## Seasonal Naive Forecasting

The third method is an extension of the naive method, but this time the forecast is equal to the most recent observed value in the same season. Hence, it is know as the seasonal naive model. For example, the forecast for the next quarter one is equal to the previous years quarter one value. This model is useful when we have a clear and large seasonal variation in our time series.

![seasonal_naive](../images//seasonal_naive.png)

Where m is the seasonality of the data. So, for monthly data with a yearly seasonality m=12, quarterly data would have m=4 and weekly data would have m=52.

In [22]:
# Seasonal naive forecast
train['month_number'] = pd.DatetimeIndex(train['Month']).month
test['month_number'] = pd.DatetimeIndex(test['Month']).month

snaive_fc = []
for row_idx, row in test.iterrows():
    month = row['month_number']
    forecast = train['#Passengers'] .loc[train['month_number'] == month].iloc[-1]
    snaive_fc.append(forecast)

plot_func(snaive_fc, 'Seasonal Naive Forecast')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



As our model has quite an obvious and large seasonality component, the seasonal naive model is performing pretty well. However, it hasn’t full captured the trend of data as we expect the passenger volumes to increase overtime.

## Drift Model

The final model we will consider is the drift model. This is also an extension of the naive forecast where we let the prediction either linearly increase or decrease through time as a function of time step, h, scaled by the average historical trend:

![drift](../images/drift.png)

This is basically just drawing a straight line from the first to last point and extending it forwards through time. However, this is where the issue lies as the model will always either increase or decease through time which is often not the case in real life scenarios.

In [23]:
# Drift forecast
constant = (train['#Passengers'].iloc[-1] - train['#Passengers'].iloc[0])/(len(train)-1)
test['h'] = range(len(test))
test['drift_forecast'] = train['#Passengers'].iloc[-1] + test['h']*constant

plot_func(test['drift_forecast'], 'Drift Forecast')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

