# Introduction
Welcome to this graded Python exercise.

In this graded exercise, you will apply your learnings from the **Time Series Analysis** module and solve a real-world problem related to airline management.

## Problem Statement
In this exercise, you will apply time series modelling techniques to a real-world problem that involves an airline company and its passenger count data that it has collected over the years. The company has historical passenger count data that it has captured over a period of time and wishes to forecast future passenger counts so that it may optimise its budgeting and resource management processes.

The problem statement for this demonstration can be summarised as follows:
> Given historical passenger count data of an airline company, forecast its future passenger counts.

By studying the forecast, the company can effectively plan for future demands.

## Data Description
You have been provided with a data set containing monthly passenger count data of the airline company for the period 1949 to 1960. The data set has the following attributes:

<table>
  <tr>
    <th> Attributes </th>
    <th> Description <th>
  </tr>
  <tr>
    <td> Month </td>
    <td> The month for which passenger count was recorded (in yyyy-mm format) </td>
  </tr>
  <tr>
    <td> Passengers </td>
    <td> Number of air passengers in a particular month </td>
  </tr>

</table>

## Outline
In this exercise, you will:
- Prepare the data for time series modelling
- Forecast passenger count using the following models:
  - Linear regression
  - Naive
  - Simple average
  - Simple moving average
  - Simple exponential smoothing
  - Holt's
  - Holt-Winters' additive
  - Holt-Winters' multiplicative
  - Autotegressive (AR)
  - Moving average (MA)
  - Autoregressive moving average (ARMA)
  - Autoregressive integrated moving average (ARIMA)
  - Seasonal autoregressive integrated moving average (SARIMA)

You will analyse the performance of these models using root mean squared error (RMSE) and mean absolute percentage error (MAPE).

# Part 1 - Setup and Data Preparation
In this section, you will:
- Import necessary packages for executing the code
- Load the data
- Prepare the data for further analysis

In [None]:
# Import 'numpy' and 'pandas' for working with numbers and dataframes
import numpy as np
import pandas as pd

# Import 'pyplot' from 'matplotlib' and 'seaborn' for visualisations
from matplotlib import pyplot as plt
import seaborn as sns

# Import 'seasonal_decompose' from 'statsmodels' for seasonal decomposition of time series
from statsmodels.tsa.seasonal import seasonal_decompose

# Import 'LinearRegression' from 'sklearn' for building regression models
from sklearn.linear_model import LinearRegression

# Import 'SimpleExpSmoothing' from 'statsmodels' for simple exponential smoothing
from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Import 'ExponentialSmoothing' from 'statsmodels' for exponential smoothing
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Import 'mean_squared_error' from 'sklearn' for error computations
from sklearn.metrics import mean_squared_error

# Import 'plot_acf' from 'statsmodels' to compute and visualise the autocorrelation function (ACF) for the time series
from statsmodels.graphics.tsaplots import plot_acf

# Import 'plot_pacf' from 'statsmodels' to compute and visualise the partial autocorrelation function (ACF) for the time series
from statsmodels.graphics.tsaplots import plot_pacf

# Import the 'boxcox' method from 'scipy' to implement the Box-Cox transformation
from scipy.stats import boxcox

# Import 'ARIMA' from 'statsmodels' for building autoregressive models
from statsmodels.tsa.arima.model import ARIMA

# Import 'SARIMAX' from 'statsmodels' for building autoregressive models
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Import and execute method for suppressing warnings
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore', ConvergenceWarning)

In [None]:
# Load the data and take a look at it using the '.head()' method
df = pd.read_csv('AirPassengers.csv')
df.##### CODE HERE #####

In [None]:
# View specifics of the data frame using the '.info()' method
df.##### CODE HERE #####

In [None]:
# Convert the 'Month' feature to the 'datetime' data type using the '.to_datetime()' method
df['Month'] = ##### CODE HERE #####

In [None]:
# View specifics of the data frame
df.##### CODE HERE #####

In [None]:
# Ensure that the data are ordered chronologically using the '.sort_values()' method
df.##### CODE HERE #####

In [None]:
# Set the index of the data frame to 'Month' using the '.set_index()' method
df.##### CODE HERE #####

In [None]:
# Take a look at the data using the '.head()' method
df.##### CODE HERE #####

In [None]:
# Plot the time series data
plt.figure(figsize = (14, 6))
sns.lineplot(data = df, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue')
plt.title('Sales Data');

In [None]:
# Take a look at the shape of the data using the '.shape' attribute
df.##### CODE HERE #####

In [None]:
# Split the data into training and testing data sets
train_len = 120
df_train = df[:train_len] # first 120 months as training set
df_test = df[train_len:] # last 24 months as out-of-time test set

In [None]:
# Plot the time series data with the train-test split
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green')
plt.title('Passenger Count');

# Part 2 - Simple Time Series Models
In this part of the demonstration, you will fit basic models to the data and analyse their performance using RMSE and MAPE values. You will build following models:
- Linear regression
- Naive
- Simple average
- Simple moving average

### Subpart 1 - Linear Regression Method

In [None]:
# Create the independent variable for the linear regression model
linreg_X = ##### CODE HERE #####

In [None]:
# Convert the training variables into 2D arrays using the '.reshape()' method
linreg_X = linreg_X.##### CODE HERE #####
linreg_y = np.array(df_train['Passengers']).##### CODE HERE #####

In [None]:
# Create and fit a linear regression model to the training data using the 'LinearRegression()' and the '.fit()' methods
linreg_model = ##### CODE HERE #####
linreg_model.##### CODE HERE #####

In [None]:
# Create the independent variable for the complete data set and reshape it accordingly
linreg_X_all = ##### CODE HERE #####
linreg_X_all = linreg_X_all.##### CODE HERE #####

In [None]:
# Generate the complete regression line including both the training and testing data using the '.predict()' method
y_pred_lr = linreg_model.##### CODE HERE #####

In [None]:
# Convert the predictions into a 1D array using the '.reshape()' method
y_pred_lr = y_pred_lr.##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df, x = 'Month', y = y_pred_lr, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Linear Regression Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_lr_list = y_pred_lr[train_len:]

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_lr_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_lr_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df = pd.DataFrame(index = [0],
                              data = {'Model': 'Linear Regression', 'RMSE': rmse, 'MAPE': mape})

performance_df.set_index(keys = 'Model', inplace = True)

performance_df

### Subpart 2 -  Naive Method

In [None]:
# Generate predictions for the test data for the naive method
y_pred_n = ##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df, x = 'Month', y = y_pred_n, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Naive Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_n_list = [y_pred_n] * len(df_test)

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_n_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_n_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Naive', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 3 -  Simple Average Method

In [None]:
# Generate predictions for the test data using the simple average method
y_pred_sa = ##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df, x = 'Month', y = y_pred_sa, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Simple Average Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_sa_list = [y_pred_sa] * len(df_test)

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_sa_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_sa_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Simple Average', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 4 -  Simple Moving Average Method

In [None]:
# Generate predictions for the complete data using the simple moving average method with a window size of 3
# Note: Use the '.rolling()', the .mean()' and the '.shift()' methods accordingly
ma_window = ##### CODE HERE #####
y_pred_sma = ##### CODE HERE #####
y_pred_sma = ##### CODE HERE #####
y_pred_sma[train_len:] = ##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df, x = 'Month', y = y_pred_sma, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Simple Moving Average Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_sma_list = [y_pred_sma[train_len]] * len(df_test)

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_sma_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_sma_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Simple Moving Average', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

# Part 3 - Exponential Time Series Models
In this part of the exercise, you will fit exponential smoothing models to the data. The models that you will build are:
- Simple exponential smoothing
- Holt's exponential smoothing
- Holt Winters' exponential smoothing

### Subpart 1 -  Simple Exponential Smoothing

In [None]:
# Create a model instance for the training data using the 'SimpleExpSmoothing()' method
model = ##### CODE HERE #####

In [None]:
# Fit the model object on the training data using the '.fit()' method
# Note: Set 'optimized' to 'True'
model = model.##### CODE HERE #####

In [None]:
# Obtain predictions for the test data using the '.forecast()' method
y_pred_ses = model.##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df, x = 'Month', y = y_pred_ses, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Simple Exponential Smoothing Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_ses_list = [y_pred_ses[0]] * len(df_test)

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_ses_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_ses_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Simple Exponential Smoothing', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 2 - Holt's Method

In [None]:
# Create a model instance for the training data using the 'ExponentialSmoothing()' method
# Note: Set 'seasonal_periods' to 12, 'trend' to 'additive' and 'seasonal' to 'None'
model = ##### CODE HERE #####

In [None]:
# Fit the model object on the training data using the '.fit()' method
# Note: Set 'optimized' to 'True'
model = model.##### CODE HERE #####

In [None]:
# Obtain predictions for the test data using the '.forecast()' method
y_pred_hes = model.##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df_test, x = 'Month', y = y_pred_hes, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Holt\'s Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_hes_list = y_pred_hes

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_hes_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_hes_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Holt\'s', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 3 - Holt-Winters' Additive Method

In [None]:
# Create a model instance for the training data using the 'ExponentialSmoothing()' method
# Note: Set 'seasonal_periods' to 12, 'trend' to 'additive' and 'seasonal' to 'additive'
model = ##### CODE HERE #####

In [None]:
# Fit the model object on the training data using the '.fit()' method
# Note: Set 'optimized' to 'True'
model = model.##### CODE HERE #####

In [None]:
# Obtain predictions for the test data using the '.forecast()' method
y_pred_hwa = model.##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df_test, x = 'Month', y = y_pred_hwa, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Holt Winter\'s Additive Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_hwa_list = y_pred_hwa

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_hwa_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_hwa_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Holt Winter\'s Additive', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 4 - Holt-Winters' Multiplicative Method

In [None]:
# Create a model instance for the training data using the 'ExponentialSmoothing()' method
# Note: Set 'seasonal_periods' to 12, 'trend' to 'additive' and 'seasonal' to 'multiplicative'
model = ##### CODE HERE #####

In [None]:
# Fit the model object on the training data using the '.fit()' method
# Note: Set 'optimized' to 'True'
model = model.##### CODE HERE #####

In [None]:
# Obtain predictions for the test data using the '.forecast()' method
y_pred_hwm = model.##### CODE HERE #####

In [None]:
# Visualise the time series data and the predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', label = 'Train', marker = 'o', color = 'blue')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', label = 'Test', marker = 'o', color = 'green')
sns.lineplot(data = df_test, x = 'Month', y = y_pred_hwm, label = 'Predictions', marker = 'o', color = 'purple')
plt.legend(loc = 'best')
plt.title('Holt-Winter\'s Multiplicative Method');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
y_pred_hwm_list = y_pred_hwm

rmse = np.sqrt(mean_squared_error(y_true = df_test['Passengers'], y_pred = y_pred_hwm_list))
mape = np.mean(np.abs(df_test['Passengers'] - y_pred_hwm_list) / df_test['Passengers']) * 100

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'Holt Winter\'s Multiplicative', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

# Part 4 - Autoregressive Models
In this part of the demonstration, you will fit autoregressive models. You will build following models:
- Autotegressive (AR)
- Moving average (MA)
- Autoregressive moving average (ARMA)
- Autoregressive integrated moving average (ARIMA)
- Seasonal autoregressive integrated moving average (SARIMA)

### Subpart 0 - Transformation and Differencing

In [None]:
# Apply the Box-Cox transformation on the training data using the 'boxcox' method
# Note: Convert the resulting list into a Pandas Series with a suitable index
# Note: Use 'lmbda = 0'
df_boxcox = ##### CODE HERE #####

In [None]:
# Apply the differencing transformation on the Box-Cox transformed training data
df_boxcox_diff = ##### CODE HERE #####

### Subpart 1 - Autoregressive (AR) Method

In [None]:
# Use the 'plot_pacf' method to look at the partial autocorrelation values for the training data for various lag orders
# Note: The 'plot_pacf' method assumes stationarity of time series, so use the 'df_boxcox_diff' data
# Note: Since differencing results in a missing value at the beginning of 'df_boxcox_diff', we must exclude it
##### CODE HERE #####

In [None]:
# Fit an AR model to the transformed training data with lag order 7 and view its optimal parameter values
# Note: Higher lag orders can be used at the cost of model complexity, but for this exercise, kindly use 'p = 7'
# Note: ARIMA(7, 0, 0) = AR(7)
ar_model = ARIMA(##### CODE HERE #####)
ar_model = ar_model.##### CODE HERE #####
ar_model.params

In [None]:
# Obtain predictions from the AR model for the testing data indices using the '.predict()' method
ar_model_preds = ar_model.##### CODE HERE #####

In [None]:
# Append 'ar_model_preds' with 'df_boxcox_diff' to prepare the data for inverse transformation
df_boxcox_diff_preds = ##### CODE HERE #####

In [None]:
# Reverse the differencing transformation that was done on the data using the '.cumsum()' method on 'df_boxcox_diff_preds'
# Note: Remember to add the constant 'df_boxcox[0]' to all the values as well
df_boxcox_preds = ##### CODE HERE #####
df_boxcox_preds = ##### CODE HERE #####

In [None]:
# Reverse the Box-Cox transformation that was done on the data by exponentiating the values in 'df_boxcox_preds'
df_preds = ##### CODE HERE #####

In [None]:
# Plot the time series data with the train-test split and the testing data predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue', label = 'Train')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green', label = 'Test')
sns.lineplot(x = df_preds.index[train_len:], y = df_preds.values[train_len:], marker = 'o', color = 'purple', label = 'Predictions')
plt.title('Passenger Count');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
rmse = ##### CODE HERE #####
mape = ##### CODE HERE #####

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df = pd.DataFrame(index = [0],
                              data = {'Model': 'AR', 'RMSE': rmse, 'MAPE': mape})

performance_df.set_index(keys = 'Model', inplace = True)

performance_df

### Subpart 2 - Moving Average (MA) Method

In [None]:
# Use the 'plot_acf' method to look at the autocorrelation values for the training data for various lag orders
# Note: The 'plot_acf' method assumes stationarity of time series, so use the 'df_boxcox_diff' data
# Note: Since differencing results in a missing value at the beginning of 'df_boxcox_diff', we must exclude it
##### CODE HERE #####

In [None]:
# Fit an MA model to the transformed training data with lag order 4 and view its optimal parameter values
# Note: Higher lag orders can be used at the cost of model complexity, but for this exercise, kindly use 'q = 4'
# Note: ARIMA(0, 0, 4) = AR(4)
ma_model = ARIMA(##### CODE HERE #####)
ma_model = ma_model.##### CODE HERE #####
ma_model.params

In [None]:
# Obtain predictions from the MA model for the testing data indices using the '.predict()' method
ma_model_preds = ma_model.##### CODE HERE #####

In [None]:
# Append 'ma_model_preds' with 'df_boxcox_diff' to prepare the data for inverse transformation
df_boxcox_diff_preds = ##### CODE HERE #####

In [None]:
# Reverse the differencing transformation that was done on the data using the '.cumsum()' method on 'df_boxcox_diff_preds'
# Note: Remember to add the constant 'df_boxcox[0]' to all the values as well
df_boxcox_preds = ##### CODE HERE #####
df_boxcox_preds = ##### CODE HERE #####

In [None]:
# Reverse the Box-Cox transformation that was done on the data by exponentiating the values in 'df_boxcox_preds'
df_preds = ##### CODE HERE #####

In [None]:
# Plot the time series data with the train-test split and the testing data predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue', label = 'Train')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green', label = 'Test')
sns.lineplot(x = df_preds.index[train_len:], y = df_preds.values[train_len:], marker = 'o', color = 'purple', label = 'Predictions')
plt.title('Passenger Count');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
rmse = ##### CODE HERE #####
mape = ##### CODE HERE #####

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'MA', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 3 - Autoregressive Moving Average (ARMA) Method

In [None]:
# Fit an ARMA model to the transformed training data with lag orders 'p = 7' and 'q = 4' and view its optimal parameter values
# Note: ARIMA(7, 0, 4) = ARMA(7, 4)
arma_model = ARIMA(##### CODE HERE #####)
arma_model = arma_model.##### CODE HERE #####
arma_model.params

In [None]:
# Obtain predictions from the ARMA model for the testing data indices using the '.predict()' method
arma_model_preds = arma_model.##### CODE HERE #####

In [None]:
# Append 'arma_model_preds' with 'df_boxcox_diff' to prepare the data for inverse transformation
df_boxcox_diff_preds = ##### CODE HERE #####

In [None]:
# Reverse the differencing transformation that was done on the data using the '.cumsum()' method on 'df_boxcox_diff_preds'
# Note: Remember to add the constant 'df_boxcox[0]' to all the values as well
df_boxcox_preds = ##### CODE HERE #####
df_boxcox_preds = ##### CODE HERE #####

In [None]:
# Reverse the Box-Cox transformation that was done on the data by exponentiating the values in 'df_boxcox_preds'
df_preds = ##### CODE HERE #####

In [None]:
# Plot the time series data with the train-test split and the testing data predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue', label = 'Train')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green', label = 'Test')
sns.lineplot(x = df_preds.index[train_len:], y = df_preds.values[train_len:], marker = 'o', color = 'purple', label = 'Predictions')
plt.title('Passenger Count');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
rmse = ##### CODE HERE #####
mape = ##### CODE HERE #####

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'ARMA', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 4 - Autoregressive Integrated Moving Average (ARIMA) Method

In [None]:
# Fit an ARIMA model to the transformed training data with lag orders 'p = 7' and 'q = 4' and view its optimal parameter values
# Note: Since differencing is integrated, the endogenous variable to be used here is 'df_boxcox'
# Note: Use a differencing order of 1
arima_model = ARIMA(##### CODE HERE #####)
arima_model = arima_model.##### CODE HERE #####
arima_model.params

In [None]:
# Obtain predictions from the ARIMA model for the testing data indices using the '.predict()' method
arima_model_preds = arima_model.##### CODE HERE #####

In [None]:
# Append 'arima_model_preds' with 'df_boxcox' to prepare the data for inverse transformation
df_boxcox_preds = ##### CODE HERE #####

In [None]:
# Reverse the Box-Cox transformation that was done on the data by exponentiating the values in 'df_boxcox_preds'
df_preds = ##### CODE HERE #####

In [None]:
# Plot the time series data with the train-test split and the testing data predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue', label = 'Train')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green', label = 'Test')
sns.lineplot(x = df_preds.index[train_len:], y = df_preds.values[train_len:], marker = 'o', color = 'purple', label = 'Predictions')
plt.title('Passenger Count');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
rmse = ##### CODE HERE #####
mape = ##### CODE HERE #####

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'ARIMA', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df

### Subpart 5 - Seasonal Autoregressive Integrated Moving Average (SARIMA) Method

In [None]:
# Fit a SARIMA model to the transformed training data with lag orders 'p = 7' and 'q = 4' and view its optimal parameter values
# Note: Since differencing is integrated, the endogenous variable to be used here is 'df_boxcox'
# Note: Use a differencing order of 1
# Note: Use seasonal parameters of 'P = 0', 'D = 1', 'Q = 0', 'm = 3'
sarima_model = SARIMAX(##### CODE HERE #####)
sarima_model = sarima_model.##### CODE HERE #####
sarima_model.params

In [None]:
# Obtain predictions from the SARIMA model for the testing data indices using the '.predict()' method
sarima_model_preds = sarima_model.##### CODE HERE #####

In [None]:
# Append 'sarima_model_preds' with 'df_boxcox' to prepare the data for inverse transformation
df_boxcox_preds = ##### CODE HERE #####

In [None]:
# Reverse the Box-Cox transformation that was done on the data by exponentiating the values in 'df_boxcox_preds'
df_preds = ##### CODE HERE #####

In [None]:
# Plot the time series data with the train-test split and the testing data predictions
plt.figure(figsize = (14, 6))
sns.lineplot(data = df_train, x = 'Month', y = 'Passengers', marker = 'o', color = 'blue', label = 'Train')
sns.lineplot(data = df_test, x = 'Month', y = 'Passengers', marker = 'o', color = 'green', label = 'Test')
sns.lineplot(x = df_preds.index[train_len:], y = df_preds.values[train_len:], marker = 'o', color = 'purple', label = 'Predictions')
plt.title('Passenger Count');

In [None]:
# Summarise the performance of the model on the test data using RMSE and MAPE
rmse = ##### CODE HERE #####
mape = ##### CODE HERE #####

rmse = np.round(rmse, 2)
mape = np.round(mape, 2)

performance_df_temp = pd.DataFrame(index = [0],
                                   data = {'Model': 'SARIMA', 'RMSE': rmse, 'MAPE': mape})

performance_df_temp.set_index(keys = 'Model', inplace = True)

performance_df = pd.concat(objs = [performance_df, performance_df_temp])

performance_df