### Prof. Pedram Jahangiry

You need to make a copy to your own Google drive if you want to edit the original notebook! Start by opening this notebook on Colab 👇

<a href="https://colab.research.google.com/github/PJalgotrader/Deep_forecasting-USU/blob/main/Lectures%20and%20codes/Module%204-%20ARIMA/Module4-ARIMA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> 



![logo](https://upload.wikimedia.org/wikipedia/commons/4/44/Huntsman-Wordmark-with-USU-Blue.gif#center) 


## 🔗 Links

[![linkedin](https://img.shields.io/badge/LinkedIn-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/pedram-jahangiry-cfa-5778015a)

[![Youtube](https://img.shields.io/badge/youtube_channel-1DA1F2?style=for-the-badge&logo=youtube&logoColor=white&color=FF0000)](https://www.youtube.com/channel/UCNDElcuuyX-2pSatVBDpJJQ)

[![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/PedramJahangiry.svg?style=social&label=Follow%20%40PedramJahangiry)](https://twitter.com/PedramJahangiry)


---


# Module 4: ARIMA models

In this module, we cover the basics of ARIMA (AutoRegressive Integrated Moving Average) models, a commonly used statistical method for time series forecasting. Our focus will be on understanding the underlying concepts and components of ARIMA models, as well as how to implement them in practice.

We start by discussing the properties of time series data and the need for a statistical model to capture its behavior. Next, we delve into the components of ARIMA models - autoregression, integration, and moving average - and their role in capturing patterns and making predictions based on past values.

We also cover the process of making time series data stationary and selecting the appropriate ARIMA parameters (p, d, q) based on autocorrelation and partial autocorrelation plots. Finally, we demonstrate how to fit ARIMA models to time series data and make predictions using Python packages such as sktime and PyCaret.

Documentation: 

1. **PyCaret**: https://pycaret.readthedocs.io/en/latest/index.html PyCaret3.0
2. **sktime** : https://www.sktime.org/en/stable/api_reference/forecasting.html

# Installation

Follow the steps here: https://pycaret.gitbook.io/docs/get-started/installation


In [1]:
#only if you want to run it in Google Colab: 
# for this chapter, we can install the light version of PyCaret as below. 

#!pip install pycaret

In [2]:
# if you got a warning that you need to "RESTART RUNTIME", go ahead and press that button. 

# let's double ckeck the Pycaret version: 
from pycaret.utils import version
version()

'3.2.0'

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Importing Dataset

In [4]:
from pycaret.datasets import get_data
airline = get_data('airline')

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
Freq: M, Name: Number of airline passengers, dtype: float64

In [5]:
# or alternatively, 
df = pd.read_csv("https://raw.githubusercontent.com/PJalgotrader/Deep_forecasting-USU/main/data/airline_passengers.csv", index_col="Month")
df.head()

Unnamed: 0_level_0,Passengers
Month,Unnamed: 1_level_1
1949-01,112
1949-02,118
1949-03,132
1949-04,129
1949-05,121


In [6]:
# if you are working with Pandas, your first job should be changing the type of the index to datetime and then to period! This is a compatibility issue with other packages. 
df.index

Index(['1949-01', '1949-02', '1949-03', '1949-04', '1949-05', '1949-06',
       '1949-07', '1949-08', '1949-09', '1949-10',
       ...
       '1960-03', '1960-04', '1960-05', '1960-06', '1960-07', '1960-08',
       '1960-09', '1960-10', '1960-11', '1960-12'],
      dtype='object', name='Month', length=144)

In [7]:
df.index = pd.to_datetime(df.index).to_period('M')
df.index

PeriodIndex(['1949-01', '1949-02', '1949-03', '1949-04', '1949-05', '1949-06',
             '1949-07', '1949-08', '1949-09', '1949-10',
             ...
             '1960-03', '1960-04', '1960-05', '1960-06', '1960-07', '1960-08',
             '1960-09', '1960-10', '1960-11', '1960-12'],
            dtype='period[M]', name='Month', length=144)

Setting up PyCaret Experiment:

In [8]:
from pycaret.time_series import *

In [9]:
exp = TSForecastingExperiment()
exp.setup(data = df, target='Passengers' ,  fh = 12, coverage=0.95)

Unnamed: 0,Description,Value
0,session_id,220
1,Target,Passengers
2,Approach,Univariate
3,Exogenous Variables,Not Present
4,Original data shape,"(144, 1)"
5,Transformed data shape,"(144, 1)"
6,Transformed train set shape,"(132, 1)"
7,Transformed test set shape,"(12, 1)"
8,Rows with missing values,0.0%
9,Fold Generator,ExpandingWindowSplitter


<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x221f98cda50>

In [10]:
exp.check_stats()

Unnamed: 0,Test,Test Name,Data,Property,Setting,Value
0,Summary,Statistics,Transformed,Length,,144.0
1,Summary,Statistics,Transformed,# Missing Values,,0.0
2,Summary,Statistics,Transformed,Mean,,280.298611
3,Summary,Statistics,Transformed,Median,,265.5
4,Summary,Statistics,Transformed,Standard Deviation,,119.966317
5,Summary,Statistics,Transformed,Variance,,14391.917201
6,Summary,Statistics,Transformed,Kurtosis,,-0.364942
7,Summary,Statistics,Transformed,Skewness,,0.58316
8,Summary,Statistics,Transformed,# Distinct Values,,118.0
9,White Noise,Ljung-Box,Transformed,Test Statictic,"{'alpha': 0.05, 'K': 24}",1606.083817


* The Ljung-Box test is a statistical test that is commonly used to check whether there are any autocorrelations in a time series. Specifically, it tests the null hypothesis that the autocorrelations of the time series data for lags 1 through K are all equal to zero.
* ADF null hypothesis: The time series has a unit root, meaning it is non-stationary. small p-value is in favor of stationarity. 
* KPSS null hypothesis: The time series is stationary around a deterministic trend (or simply stationary if no trend is included in the test equation). Large p-value is in favor of stationarity
* Shapiro-Wilk null hypothesis: The sample comes from a normally distributed population. Large p-value is in favor of normality. 

In [11]:
exp.plot_model(plot='train_test_split')

---
---
# ARIMA models:

## Selecting p and q: 
Remember, to select p and q, we must first make the data astationary! That's why we will plot the difference model. 

**Difference plotting using orders:**

In [12]:
exp.plot_model(plot="diff", data_kwargs={"order_list": [1,2], "acf": True, "pacf": True})


**Difference Plot Using Lags:**

For example, given a timeseries with monthly periodicity, using lags=[1, 12] corresponds to applying a standard first difference to handle trend, and followed by a seasonal difference (at lag 12) to attempt to account for seasonal dependence.

In [13]:
exp.plot_model(plot="diff", data_kwargs={"lags_list": [[1, 12]], "acf": True, "pacf": True})


Based on the above plot, it seems that SARIMA(1,1,0)(1,1,1,12) is a good start. 

However, to compare the performance of different components of ARIMA model, let's construct 4 more models. So we have 5 models to compare + two bench marks! 
1. AR(1)
2. MA(1)
3. ARIMA(1,1,1)
4. SARIMA(1,1,0)(1,1,1,12)
5. Random walk: ARIMA(0,1,0) with no constant
6. Random walk with drift: ARIMA(0,1,0) with constant

---
### ARIMA

In [14]:
ar1 = exp.create_model('arima', order = (1,0,0), seasonal_order=(0,0,0,12), with_intercept=True, cross_validation=False) 
# by default, "with_intercept=True", we don't need to add it mannually. 

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,3.376,3.746,102.7986,129.4174,0.1972,0.2297,-2.0236


In [15]:
ma1 = exp.create_model('arima', order = (0,0,1), seasonal_order=(0,0,0,12), with_intercept= True, cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,6.6833,6.4226,203.5061,221.8905,0.4119,0.5336,-7.8881


In [16]:
arima111= exp.create_model('arima', order = (1,1,1), seasonal_order=(0,0,0,12) , with_intercept= True, cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,1.999,2.3844,60.8693,82.3772,0.1166,0.1256,-0.225


In [17]:
sarima110111= exp.create_model('arima', order = (1,1,0), seasonal_order=(1,1,1,12) , with_intercept= True, cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,0.586,0.6625,17.8427,22.8893,0.0402,0.0388,0.9054


In [18]:
rw= exp.create_model('arima', order = (0,1,0), seasonal_order=(0,0,0,12) , with_intercept= False, cross_validation=False)
# remember, Random walk is equivalent to naive forecaster. So this code also works: exp.create_model('naive', cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,2.4959,2.9807,76.0,102.9765,0.1425,0.1612,-0.9143


In [19]:
rwwd= exp.create_model('arima', order = (0,1,0), seasonal_order=(0,0,0,12) , with_intercept= True, cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,2.1776,2.6822,66.3079,92.6664,0.1242,0.1381,-0.5502


In [20]:
my_models = [rw, rwwd, ar1, ma1, arima111, sarima110111]
my_model_lables = ['Random Walk', 'Random Walk with drift', 'AR(1)', 'MA(1)', 'ARIMA(1,1,1)', 'SARIMA(1,1,0)(1,1,1,12)']

In [21]:
exp.compare_models(my_models, cross_validation=False)

Unnamed: 0,Model,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2,TT (Sec)
5,ARIMA,0.586,0.6625,17.8427,22.8893,0.0402,0.0388,0.9054,0.39
4,ARIMA,1.999,2.3844,60.8693,82.3772,0.1166,0.1256,-0.225,0.05
1,ARIMA,2.1776,2.6822,66.3079,92.6664,0.1242,0.1381,-0.5502,0.01
0,ARIMA,2.4959,2.9807,76.0,102.9765,0.1425,0.1612,-0.9143,0.01
2,ARIMA,3.376,3.746,102.7986,129.4174,0.1972,0.2297,-2.0236,0.02
3,ARIMA,6.6833,6.4226,203.5061,221.8905,0.4119,0.5336,-7.8881,0.02


---


In [22]:
sarima110111.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,132.0
Model:,"SARIMAX(1, 1, 0)x(1, 1, [1], 12)",Log Likelihood,-445.735
Date:,"Tue, 20 Feb 2024",AIC,901.469
Time:,11:28:38,BIC,915.365
Sample:,01-31-1949,HQIC,907.112
,- 12-31-1959,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,1.2230,1.882,0.650,0.516,-2.465,4.911
ar.L1,-0.2615,0.094,-2.791,0.005,-0.445,-0.078
ar.S.L12,-0.9997,0.165,-6.068,0.000,-1.323,-0.677
ma.S.L12,0.9922,2.486,0.399,0.690,-3.881,5.865
sigma2,97.6533,222.515,0.439,0.661,-338.468,533.775

0,1,2,3
Ljung-Box (L1) (Q):,0.02,Jarque-Bera (JB):,0.14
Prob(Q):,0.89,Prob(JB):,0.93
Heteroskedasticity (H):,1.59,Skew:,-0.01
Prob(H) (two-sided):,0.15,Kurtosis:,3.17


**Exercise**: write down the equation for the SARIMA model?

---
#### Plotting models

In [23]:
exp.plot_model(sarima110111  , plot='forecast', data_kwargs={'fh':20, 'labels':['SARIMA(1,1,0)(1,1,1,12)']})

In [24]:
exp.plot_model(sarima110111, plot='diagnostics')

In [25]:
# let's test the stationarity of the residuals for the SARIMA(1,1,0)(1,1,1,12) model:
exp.check_stats(sarima110111, test = 'adf')


Unnamed: 0,Test,Test Name,Data,Property,Setting,Value
0,Stationarity,ADF,Residual,Stationarity,{'alpha': 0.05},True
1,Stationarity,ADF,Residual,p-value,{'alpha': 0.05},0.0
2,Stationarity,ADF,Residual,Test Statistic,{'alpha': 0.05},-11.606002
3,Stationarity,ADF,Residual,Critical Value 1%,{'alpha': 0.05},-3.481682
4,Stationarity,ADF,Residual,Critical Value 5%,{'alpha': 0.05},-2.884042
5,Stationarity,ADF,Residual,Critical Value 10%,{'alpha': 0.05},-2.57877


In [26]:
exp.check_stats(sarima110111, test = 'kpss')


Unnamed: 0,Test,Test Name,Data,Property,Setting,Value
0,Stationarity,KPSS,Residual,Trend Stationarity,{'alpha': 0.05},True
1,Stationarity,KPSS,Residual,p-value,{'alpha': 0.05},0.1
2,Stationarity,KPSS,Residual,Test Statistic,{'alpha': 0.05},0.030694
3,Stationarity,KPSS,Residual,Critical Value 10%,{'alpha': 0.05},0.119
4,Stationarity,KPSS,Residual,Critical Value 5%,{'alpha': 0.05},0.146
5,Stationarity,KPSS,Residual,Critical Value 2.5%,{'alpha': 0.05},0.176
6,Stationarity,KPSS,Residual,Critical Value 1%,{'alpha': 0.05},0.216


In [27]:
my_models

[ARIMA(order=(0, 1, 0), seasonal_order=(0, 0, 0, 12), with_intercept=False),
 ARIMA(order=(0, 1, 0), seasonal_order=(0, 0, 0, 12)),
 ARIMA(seasonal_order=(0, 0, 0, 12)),
 ARIMA(order=(0, 0, 1), seasonal_order=(0, 0, 0, 12)),
 ARIMA(order=(1, 1, 1), seasonal_order=(0, 0, 0, 12)),
 ARIMA(order=(1, 1, 0), seasonal_order=(1, 1, 1, 12))]

In [28]:
my_model_lables

['Random Walk',
 'Random Walk with drift',
 'AR(1)',
 'MA(1)',
 'ARIMA(1,1,1)',
 'SARIMA(1,1,0)(1,1,1,12)']

In [29]:
exp.plot_model(ar1, plot='forecast', data_kwargs={'fh':36})

In [30]:
exp.plot_model(ma1, plot='forecast', data_kwargs={'fh':36})

In [31]:
exp.plot_model(my_models, plot='forecast', data_kwargs={'fh':36, 'labels':my_model_lables})

In [32]:
exp.plot_model(my_models, plot='insample', data_kwargs={'labels':my_model_lables})

---
### Auto ARIMA
https://www.sktime.org/en/stable/api_reference/auto_generated/sktime.forecasting.arima.AutoARIMA.html

Wrapper of the pmdarima implementation of fitting Auto-(S)ARIMA(X) models. The auto-ARIMA algorithm seeks to identify the most optimal parameters for an ARIMA model, settling on a single fitted ARIMA model. This process is based on the commonly-used R function, forecast::auto.arima.

Auto-ARIMA works by conducting differencing tests (i.e., Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller or Phillips–Perron) to determine the order of differencing, d, and then fitting models within ranges of defined start_p, max_p, start_q, max_q ranges. If the seasonal optional is enabled, auto-ARIMA also seeks to identify the optimal P and Q hyper-parameters after conducting the Canova-Hansen to determine the optimal order of seasonal differencing, D.

In order to find the best model, auto-ARIMA optimizes for a given information_criterion, one of (‘aic’, ‘aicc’, ‘bic’, ‘hqic’, ‘oob’) (Akaike Information Criterion, Corrected Akaike Information Criterion, Bayesian Information Criterion, Hannan-Quinn Information Criterion, or “out of bag”–for validation scoring–respectively) and returns the ARIMA which minimizes the value


In [33]:
auto_arima = exp.create_model('auto_arima', cross_validation=False)

Unnamed: 0,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
Test,0.4893,0.5365,14.8982,18.5365,0.031,0.0309,0.938


In [34]:
# getting auto_arima's hyperparameters
auto_arima.get_params()

{'D': None,
 'alpha': 0.05,
 'concentrate_scale': False,
 'd': None,
 'enforce_invertibility': True,
 'enforce_stationarity': True,
 'error_action': 'warn',
 'hamilton_representation': False,
 'information_criterion': 'aic',
 'max_D': 1,
 'max_P': 2,
 'max_Q': 2,
 'max_d': 2,
 'max_order': 5,
 'max_p': 5,
 'max_q': 5,
 'maxiter': 50,
 'measurement_error': False,
 'method': 'lbfgs',
 'mle_regression': True,
 'n_fits': 10,
 'n_jobs': 1,
 'offset_test_args': None,
 'out_of_sample_size': 0,
 'random': False,
 'random_state': 220,
 'scoring': 'mse',
 'scoring_args': None,
 'seasonal': True,
 'seasonal_test': 'ocsb',
 'seasonal_test_args': None,
 'simple_differencing': False,
 'sp': 12,
 'start_P': 1,
 'start_Q': 1,
 'start_p': 2,
 'start_params': None,
 'start_q': 2,
 'stationary': False,
 'stepwise': True,
 'test': 'kpss',
 'time_varying_regression': False,
 'trace': False,
 'trend': None,
 'update_pdq': True,
 'with_intercept': True}

In [35]:
auto_arima.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,132.0
Model:,"SARIMAX(3, 0, 0)x(0, 1, 0, 12)",Log Likelihood,-447.843
Date:,"Tue, 20 Feb 2024",AIC,905.686
Time:,11:28:46,BIC,919.623
Sample:,01-31-1949,HQIC,911.346
,- 12-31-1959,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,5.5341,2.007,2.757,0.006,1.600,9.468
ar.L1,0.7049,0.095,7.393,0.000,0.518,0.892
ar.L2,0.2574,0.131,1.968,0.049,0.001,0.514
ar.L3,-0.1434,0.107,-1.338,0.181,-0.354,0.067
sigma2,101.0969,12.818,7.887,0.000,75.974,126.220

0,1,2,3
Ljung-Box (L1) (Q):,0.0,Jarque-Bera (JB):,2.83
Prob(Q):,0.96,Prob(JB):,0.24
Heteroskedasticity (H):,1.41,Skew:,-0.14
Prob(H) (two-sided):,0.29,Kurtosis:,3.7


In [36]:
# recall, 
sarima110111.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,132.0
Model:,"SARIMAX(1, 1, 0)x(1, 1, [1], 12)",Log Likelihood,-445.735
Date:,"Tue, 20 Feb 2024",AIC,901.469
Time:,11:28:46,BIC,915.365
Sample:,01-31-1949,HQIC,907.112
,- 12-31-1959,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,1.2230,1.882,0.650,0.516,-2.465,4.911
ar.L1,-0.2615,0.094,-2.791,0.005,-0.445,-0.078
ar.S.L12,-0.9997,0.165,-6.068,0.000,-1.323,-0.677
ma.S.L12,0.9922,2.486,0.399,0.690,-3.881,5.865
sigma2,97.6533,222.515,0.439,0.661,-338.468,533.775

0,1,2,3
Ljung-Box (L1) (Q):,0.02,Jarque-Bera (JB):,0.14
Prob(Q):,0.89,Prob(JB):,0.93
Heteroskedasticity (H):,1.59,Skew:,-0.01
Prob(H) (two-sided):,0.15,Kurtosis:,3.17


Comparing the output of our own SARIMA and auto ARIMA, our sarima111010 generates better AIC (smaller) number in the trainset. However, based on R2, the auto_arima is a better model in the test set. We continue to work with our own SARIMA model because we understand it better :)

In [37]:
exp.plot_model([sarima110111, auto_arima], plot='forecast', data_kwargs={'fh':36, 'labels':['SARIMA(1,1,0)(1,1,1,12)', 'Auto ARIMA']})

## In-sample performance metrics? 

In [38]:
# recall, our forecasting horizon was 12 months.
df.index[:-12] # train set index 

PeriodIndex(['1949-01', '1949-02', '1949-03', '1949-04', '1949-05', '1949-06',
             '1949-07', '1949-08', '1949-09', '1949-10',
             ...
             '1959-03', '1959-04', '1959-05', '1959-06', '1959-07', '1959-08',
             '1959-09', '1959-10', '1959-11', '1959-12'],
            dtype='period[M]', name='Month', length=132)

In [39]:
df.head()

Unnamed: 0_level_0,Passengers
Month,Unnamed: 1_level_1
1949-01,112
1949-02,118
1949-03,132
1949-04,129
1949-05,121


In [40]:
predictions = df.copy()
predictions['y_pred']= sarima110111.predict(df.index)
predictions['residuals']= sarima110111.predict_residuals(df[['Passengers']] )
predictions

Unnamed: 0_level_0,Passengers,y_pred,residuals
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1949-01,112,,
1949-02,118,112.483095,5.516905
1949-03,132,118.485068,13.514932
1949-04,129,132.484309,-3.484309
1949-05,121,129.485025,-8.485025
...,...,...,...
1960-08,606,624.421791,-18.421791
1960-09,508,525.863664,-17.863664
1960-10,461,471.708288,-10.708288
1960-11,390,428.065515,-38.065515


In [41]:
predictions.dropna(inplace=True)
predictions

Unnamed: 0_level_0,Passengers,y_pred,residuals
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1949-02,118,112.483095,5.516905
1949-03,132,118.485068,13.514932
1949-04,129,132.484309,-3.484309
1949-05,121,129.485025,-8.485025
1949-06,135,121.484990,13.515010
...,...,...,...
1960-08,606,624.421791,-18.421791
1960-09,508,525.863664,-17.863664
1960-10,461,471.708288,-10.708288
1960-11,390,428.065515,-38.065515


In [42]:
from sklearn.metrics import r2_score, mean_absolute_percentage_error

In [43]:
r2_score(predictions.Passengers, predictions.y_pred)

0.9882704738658122

In [44]:
mean_absolute_percentage_error(predictions.Passengers, predictions.y_pred)

0.03984002477519679

---
## Predict Model

This function predicts Label using a trained model. When data is None, it predicts label on the holdout set.

note: so far, our best model is the ets model


In [45]:
holdout_pred = exp.predict_model(sarima110111)

Unnamed: 0,Model,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2
0,ARIMA,0.586,0.6625,17.8427,22.8893,0.0402,0.0388,0.9054


## Finalize Model

This function trains a given estimator on the entire dataset including the holdout set.

Model finalization is the last step in the experiment. This workflow will eventually lead you to the best model for use in making predictions on new and unseen data. The finalize_model() function fits the model onto the complete dataset including the test/hold-out sample. The purpose of this function is to train the model on the complete dataset before it is deployed in production.

In [46]:
final_model = exp.finalize_model(sarima110111)

In [47]:
final_model

---
## Final prediciton on unseen data

The predict_model() function is also used to predict on the unseen dataset.

In [48]:
exp.plot_model(plot='train_test_split')

In [49]:
exp.plot_model(final_model, plot='forecast', data_kwargs={'fh':24, 'labels':['SARIMA(1,1,0)(1,1,1,12)']})

In [50]:
unseen_predictions = exp.predict_model(final_model, fh=24)
unseen_predictions

Unnamed: 0,y_pred
1961-01,447.6891
1961-02,423.4427
1961-03,458.4676
1961-04,496.7219
1961-05,509.1409
1961-06,568.2553
1961-07,655.8859
1961-08,641.8929
1961-09,547.2144
1961-10,497.6758


## Save Model

This function saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.

In [51]:
exp.save_model(final_model, 'best_arima_model')

Transformation Pipeline and Model Successfully Saved


(ForecastingPipeline(steps=[('forecaster',
                             TransformedTargetForecaster(steps=[('model',
                                                                 ForecastingPipeline(steps=[('forecaster',
                                                                                             TransformedTargetForecaster(steps=[('model',
                                                                                                                                 ARIMA(order=(1,
                                                                                                                                              1,
                                                                                                                                              0),
                                                                                                                                       seasonal_order=(1,
                                                

## Load model

This function loads a previously saved pipeline.



In [52]:
my_model = load_model('best_arima_model')

Transformation Pipeline and Model Successfully Loaded


In [53]:
my_model

# Done!