# Tutorial 03 tasks

For this week tasks, please re-use the AirPassenger dataset with the same train/test split ratio used in the tutorial 03.

## 1. Implement the simple naive forecasting method in Section 3 using the `shift()` method of Series class.

 Let's begin our script by importing necessary libraries:

In [1]:
# Import some external libraries
import pandas as pd                # For data manipulation
import matplotlib.pyplot as plt    # For visualization
import numpy as np                 # For linear algebra

Then read the data file into a `DataFrame` object and use the `Month` column as the index of the `DataFrame`

In [2]:
data = pd.read_csv('AirPassengers.csv')       # Read data from a csv file
data['Month'] = pd.to_datetime(data['Month']) # Convert the data in the Month column to datetime
data.set_index('Month', inplace=True)         # Indexing the DataFrame by the Month column

# Extract Passengers column
ts = data['Passengers']

Implement the train/test split step for the AirPassengers data. We use the observations during $1949-1959$ as the in-sample data and $1960$ as the out-of-sample data.

In [3]:
# Extract data for in-sample period (1949-1959)
ts_in = ts['1949':'1959']
T_in = len(ts_in) 

# Extract data for out-of-sample period (1960)
ts_out = ts['1960']
T_out = len(ts_out)

Implement the simple naive forecasting method using the `shift()` method of Series object.

In [5]:
ts.shift(1)

Month
1949-01-01      NaN
1949-02-01    112.0
1949-03-01    118.0
1949-04-01    132.0
1949-05-01    129.0
              ...  
1960-08-01    622.0
1960-09-01    606.0
1960-10-01    508.0
1960-11-01    461.0
1960-12-01    390.0
Name: Passengers, Length: 144, dtype: float64

In [7]:
# Define forecast horizon (one-step ahead)
h = 1   

# 
ts_one_step_naive1 = ts.shift(1)[-T_out:]

# It is useful to put true and forecast out-out-sample data in the same DataFrame
ts_forecast_df = pd.DataFrame(ts_out)
ts_forecast_df['Pred_one_step_naive1'] = ts_one_step_naive1
ts_forecast_df

Unnamed: 0_level_0,Passengers,Pred_one_step_naive1
Month,Unnamed: 1_level_1,Unnamed: 2_level_1
1960-01-01,417,405.0
1960-02-01,391,417.0
1960-03-01,419,391.0
1960-04-01,461,419.0
1960-05-01,472,461.0
1960-06-01,535,472.0
1960-07-01,622,535.0
1960-08-01,606,622.0
1960-09-01,508,606.0
1960-10-01,461,508.0


## 2. Implement the seasonal naive forecasting method in Section 4 without using the for loop (multiple solutions).

There are many possible ways to answer the question. Here is an example:

In [8]:
# Define the seasonal period
M = 12

# Compute one-step-ahead forecast 
ts_one_step_naive2 = ts.shift(M)[-T_out:]

# It is useful to put true and forecast out-out-sample data in the same DataFrame
ts_forecast_df['Pred_one_step_naive2'] = ts_one_step_naive2
ts_forecast_df

Unnamed: 0_level_0,Passengers,Pred_one_step_naive1,Pred_one_step_naive2
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1960-01-01,417,405.0,360.0
1960-02-01,391,417.0,342.0
1960-03-01,419,391.0,406.0
1960-04-01,461,419.0,396.0
1960-05-01,472,461.0,420.0
1960-06-01,535,472.0,472.0
1960-07-01,622,535.0,548.0
1960-08-01,606,622.0,559.0
1960-09-01,508,606.0,463.0
1960-10-01,461,508.0,407.0


## 3. Repeat the comparison in Section 5 using different predictive measures, e.g. MAD and MAPE.

We define functions to compute the MAD and MAPE measures for the arrays of true and forecast values.

In [19]:
# Input series are numpy array
def MAD(y_true,y_pred):
    mad_out = np.mean(np.abs(y_true-y_pred))
    return(mad_out)

# Define a function to calculate MAPE of 2 series
# Input series are numpy array
def MAPE(y_true,y_pred):
    # Ignore zero returns
    y_true_non_zero = y_true[y_true != 0]
    y_pred_non_zero = y_pred[y_true != 0]
    
    mape_out = np.mean(np.abs((y_true_non_zero-y_pred_non_zero)/y_true_non_zero))
    return mape_out

Then we can use two defined function to compute the MAD and MAPE measure:

In [20]:
print('MAD (Simple naive)  : {:.2f}'.format(MAD(ts_out,ts_one_step_naive1)))
print('MAD (Seasonal naive): {:.2f}'.format(MAD(ts_out,ts_one_step_naive2)))

MAD (Simple naive)  : 45.25
MAD (Seasonal naive): 47.83


In [21]:
print('MAPE (Simple naive)  : {:.2f}'.format(MAPE(ts_out,ts_one_step_naive1)))
print('MAPE (Seasonal naive): {:.2f}'.format(MAPE(ts_out,ts_one_step_naive2)))

MAPE (Simple naive)  : 0.09
MAPE (Seasonal naive): 0.10
