# Introduction to Time Series Forecasting with `sktime`


### Unit Convenor & Lecturer {-}
[George Milunovich](https://www.georgemilunovich.com)  
[george.milunovich@mq.edu.au](mailto:george.milunovich@mq.edu.au)

### References {-}


1. Various open-source material as provided in the URL's throughout this notebook

### Overview {-}

- Introduction to Time Series data
- How is Time Series forecasting different
- Univariate vs Multivariate Time Series
- Autocorrelations and Partial Autocorrelations
- White noise, MA, AR and ARMA models
- Time Series forecasting libraries in Python
- Introduction to `sktime` library
- Some Basic Univariate Time Series Models
- Predicting Airline Passenger Dataset

---

## Time Series Data

So far we have been studying prediction problems using mainly cross-sectional datasets
- A **cross-sectional dataset** refers to data collected at a single point in time or over a very short period of time
    - Provides a "snapshot" of various variables at that particular moment
- Examples:
    - House prices sold in a suburb on a given day
    - Used cars listed for sale listed on an online marketplace
    - Etc.


**What is a Time Series?**

- A time series is a sequence of data points recorded or measured at successive points in time, often at regular intervals
- This type of data is particularly useful in statistical, economic, and business contexts where monitoring changes over time is crucial

**Why Study Time Series**
- Predict future values from historical data
- Understand and quantify how underlying patterns evolve over time
- Make informed strategic decisions to improve business operations or respond to changes in the environment  

**Time Series Examples**
- Economics: Forecasting stock prices, economic indicators, and market trends
- Healthcare: Predicting disease outbreaks, patient admissions, and resource requirements
- Retail: Estimating future product demand to optimise inventory management and reduce costs
- Energy: Forecasting demand and supply to manage resources efficiently and plan for future energy needs

---

## Distinguishing Time Series Forecasting from Cross-Sectional Forecasting

- **Cross-Sectional Forecasting**
    - Data collected on multiple variables at a single point in time or over a very short period
    - The data points are *not in a sequence* but are instead independent observations across various subjects or entities (e.g. Airbnb properties listed for rent within a suburb)
    - Forecasting relies on the relationship between the target and explanatory variables measured **at the same time period**  



- **Time Series Forecasting**
    - Focuses on data that is collected over time and is inherently sequential   
    - Primary interest is in **predicting future values based on past and present data points**   
    - Forecasting relies on the temporal (time) dependence/patterns between observations  
    - The past and current values are used to predict future values, considering trends, seasonality, cycles, and other time-related patterns
    - Past values of the target variable as well as past values of the explanatory variables can be utilized to build predictive models

<div style="text-align: center;">
<img src="images/week11_1.png" alt="Drawing" style="width: 400px;"/>  
</div>

<hr style="width:30%;margin-left:0">


**Univariate vs Multivariate Time Series**

- **Univariate Time Series**
    - When we use a single time-dependent variable
    - Analyse and forecast data based on past values of that single variable alone
    - Suitable for situations where the interest is in predicting future values of a single series and
        - The series itself contains all the necessary information for making such predictions, or
        - There are no other relevant variables available
    - $\hat{y}_{t+1} = f(y_t, y_{t-1}, y_{t-2}, \dots)$ where $f$ is some function of past and present values, and $t$ denotes the time period 


- **Multivariate Time Series**
    - When we analyse and predict the behavior of a time series based on multiple variables
    - The future value of a variable is predicted not only from its own past values but also from the past values of other related variables
    - Used when the prediction performance can potentially improve by incorporating the information from additional variables  
    - For instance, predicting economic growth might benefit from considering variables like
        - Inflation rate
        - Unemployment rate,
        - Industrial production, etc.
    - $\hat{y}_{t+1} = f(y_t, y_{t-1}, y_{t-2}, \dots, x_t, x_{t-1}, x_{t-2}, \dots, z_t, z_{t-1}, z_{t-2}, \dots)$  where $f$ is some function of past and present values 
     

We will only consider univariate models in this Introduction to Time Series

<div style="text-align: center;">
<img src="images/week11_2.png" alt="Drawing" style="width: 400px;"/>  
</div>

<hr style="width:50%;margin-left:auto">

---


## Some Basic Concepts in Time Series Data Analysis

- In time series forecasting, the future value of the target variable depends on its past and present data points
    - Linear dependence, e.g. Autoregressive - AR(1) model $\hat{y}_{t} = 1 + 0.7y_{t-1}$
    - Nonlinear dependence, e.g. Logistic Map model $\hat{y}_{t} = 0.5y_t(1 - y_{t-1})$
- How do we measure such dependence?

<span style="background-color: yellow">**Concept 1: Autocorrelations** </span>
- Autocorrelation is the correlation between a random variable and its own past values
- If a time series is correlated with its past values, its values are not independent of each other over time and there is a pattern/trend in the data  
- This information is vital for time series forecasting because it can be used to improve the accuracy of the models
- Autocorrelation between $y_t$ and $y_{t-k}$ is defined as
    - $\tau_k = \text{corr}(y_t, y_{t-k}) = \frac{\text{cov}(y_t, y_{t-k})}{\sqrt{\text{var}(y_t)\text{var}(y_{t-k})}}=\frac{\text{cov}(y_t, y_{t-k})}{\text{var}(y_t))}$ 

*Note*
- Autocorrelations lie in the $[-1, 1]$ interval, i.e. $\tau_k\in[-1,1]$
- Autocorrelations measure linear dependence (nonlinear dependence may be ignored)
- Sample autocorrelations are random variables - they depend on the sample data from which they are computed
    - We need to consider a statistical test / confidence interval around sample values to draw conclusions about true population parameters
 
<span style="background-color: yellow">**Concept 2: Autocorrelation Function (ACF)** </span>
- Compute and plot autocorrelations for diffent lags (time intervals), e.g. compute all $\tau_k$ for $k=0,1,2,3,\dots$
- ACF shows the amount of memory in a time series
- Note that for lag 0 we have $\tau_0 = \text{corr}(y_t, y_{t-0})=\text{corr}(y_t, y_{t})=1$



<span style="background-color: yellow">**Concept 3: White Noise** </span>
- A white noise process $x_t$ has the following characteristics:
1. Zero Mean: The mean of the white noise process is zero $E[x_t] = 0$
2. Constant Variance: The variance of the weak white noise process is the same for all periods $t$, i.e. $\text{Var}(x_t) =\text{Var}(x_{t-1}) = \dots = \sigma^2$
3. Uncorrelated over time: The autocorrelation function (ACF) is zero for any non-zero lag, meaning that there is no linear dependence between white noise values at different times
    - $\text{Corr}(x_t, x_{t-s})=\tau_s = 0 \quad \text{for any} \; s \neq 0$
- Note: white noise is unpredictable using its past values since the future values are uncorrelated with past values  
   

<span style="background-color: yellow">**Concept 4: Autoregressive Model of Order $p$ - AR$(p)$** </span>
- White noise processes are used to create more complicated models
- An AR$(p)$ model is a process where the current value $y_t$ depends on its own $p$ lags (past values)  
- $y_t = c + \phi_1 y_{t-1}+ \phi_2 y_{t-2} + \dots +  \phi_p y_{t-p} + \epsilon_t$ for some value $p$ 
    - Here $\epsilon_t$ is assumed to be a white noise process
- Some potential AR models (examples):
    - AR(1): $y_{t} = 1 + 0.7y_{t-1} + \epsilon_t$
    - AR(2): $y_t = 0.5y_{t-1} - 0.3y_{t-2} + \epsilon_t$
    - AR(3): $y_t = 0.4 y_{t-1} - 0.2 y_{t-2} + 0.1 y_{t-3} + \epsilon_t$

---

## Example: ACFs of White Noise and AR(1)

### Generate Synthetic Data

- Generate 500 synthetic observations of
    1. While noise - $x_t$
    2. AR(1) process $y_{t} = 1 + 0.7y_t + \epsilon_t$
- Plot their ACFs with 20 lags
- Determine which variable exhibits time dependence (memory) suitable for making predictions based on their ACFs


```

import matplotlib.pyplot as plt
from statsmodels.tsa.arima_process import ArmaProcess
import numpy as np
import pandas as pd


# Generating white noise
np.random.seed(42)  # for reproducibility
x = np.random.normal(size=500) # white noise


# Generating AR(1) process
ar = np.array([1, -0.7])  # AR(1) with phi=0.7
ma = np.array([1])
AR1_process = ArmaProcess(ar, ma) 
ar1_data = AR1_process.generate_sample(nsample=500)


# Create subplots
fig, axs = plt.subplots(2, 1, figsize=(10, 8))

# Plot the white noise in the first subplot
axs[0].plot(x)
axs[0].set_title('White Noise')
axs[0].set_ylabel('Value')

# Plot the AR(1) process in the second subplot
axs[1].plot(ar1_data, color='orange')
axs[1].set_title('AR(1) Process')
axs[1].set_ylabel('Value')
axs[1].set_xlabel('Time')

# Display the plots
plt.tight_layout()
plt.show()

```


### Plot ACFs

```
from statsmodels.graphics.tsaplots import plot_acf

fig, axes = plt.subplots(2, 1, figsize=(12, 8))

plot_acf(x, ax=axes[0], lags=20, title='ACF of White Noise $(x_t)$')
axes[0].set_ylim(-1.1, 1.1)  # Set y-axis limits
axes[0].set_xticks(np.arange(0, 21, 1))  # Set x-ticks at increments of 1
axes[0].grid(True, color='#D3D3D3')  

plot_acf(ar1_data, ax=axes[1], lags=20, title='ACF of AR(1) Model $(y_t)$')
axes[1].set_ylim(-1.1, 1.1)  # Set y-axis limits
axes[1].set_xticks(np.arange(0, 21, 1))  # Set x-ticks at increments of 1
axes[1].grid(True, color='#D3D3D3')  

axes[0].set_ylabel(r'$\tau_k$', rotation = 0)
axes[1].set_ylabel(r'$\tau_k$', rotation = 0)
axes[1].set_xlabel('Lag $(k)$')

plt.tight_layout()
plt.show()
```



### Interpreting the ACF
- The 95% confidence intervals are indicated by the blue shaded area
- **White Noise**: The ACF plot for the white noise series $x_t$​ shows no signs of time series predictability
    - No statistically significant (outside of the CI) ACFs for lags 1, 2, ..., 20
- **AR(1)**: The ACF for the given AR(1) shows signs of about 7 significant autocorrelations indicating time series memory
    - Why don't we have only one significant autocorrelation, i.e. $\text{Corr}(x_t, x_{t-k})\ne0$ for $k = 1$?

### Repeated Substitution

Consider the AR(1) process for 3 time periods, $t, t-1$ and $t-2$

 \begin{align*}
y_t &= c + \phi y_{t-1} + \epsilon_t \\
y_{t-1} &= c + \phi y_{t-2} + \epsilon_{t-1} \\
y_{t-2} &= c + \phi y_{t-3} + \epsilon_{t-2}
\end{align*}

- From the first equation we see that $y_t$ depends on $y_{t-1}$
    - This explains their correlation $\text{Corr}(x_t, x_{t-1})$
- If we substitute the equation for $y_{t-1}$ into the equation for $y_t$ we get
    - $y_t = c + \phi (c + \phi y_{t-2} + \epsilon_{t-1}) + \epsilon_t= c + \phi c + \phi^2 y_{t-2} + \phi \epsilon_{t-1} + \epsilon_t$
    - Now we see that $y_t$ equivalently depends on $y_{t-2}$ but with the coefficient $\phi^2$
    - This explains the non-zero correlation $\text{Corr}(x_t, x_{t-2})$
- Such repeated substitutions show that $y_t$ depends on all its previous values but at decreasing magnitudes for $|\phi|<1$
    - This explains non-zero $\text{Corr}(x_t, x_{t-k})$ for all $k=0, 1, 2,3, \dots$



<span style="background-color: yellow">**Concept 5: Partial Autocorrelation Function - PACF**</span>
- Measure of the relationship between a time series variable and its own lagged values, after accounting for the effects of the intervening lags 
    - Correlation between $y_t$​ and $y_{t−k}$ after controlling for the effects of $y_{t−1},y_{t−2},\dots,y_{t−(k−1)}$ 
    - From the equation for AR(1) we see that $y_t$ depends only on $y_{t-1}$ 
    - From the first equation for AR(2) the process $y_t$ does NOT depend on $y_{t−2}$ after we control for $y_{t-1}$
    - Etc...

### Using ACF and PACF to indentify an AR(p) model
- We can use ACF and PACF to identify an AR model and determine it order $p$ 
- For an AR(p) model:
    - PACF cuts off sharly to zero after $p$ lags
        - E.g. if there are two statistically significant PACFs then its an AR(2)
    - ACF: Gradual decay towards zero

### Plotting PACF 

We'll no plot the PACF for the white noise and AR(1) models we created above


```
from statsmodels.graphics.tsaplots import plot_pacf

fig, axes = plt.subplots(2, 1, figsize=(12, 8))

plot_pacf(x, ax=axes[0], lags=20, title='PACF of White Noise $(x_t)$')
axes[0].set_ylim(-1.1, 1.1)  # Set y-axis limits
axes[0].set_xticks(np.arange(0, 21, 1))  # Set x-ticks at increments of 1
axes[0].grid(True, color='#D3D3D3')  

plot_pacf(ar1_data, ax=axes[1], lags=20, title='PACF of AR(1) Model $(y_t)$')
axes[1].set_ylim(-1.1, 1.1)  # Set y-axis limits
axes[1].set_xticks(np.arange(0, 21, 1))  # Set x-ticks at increments of 1
axes[1].grid(True, color='#D3D3D3')  

plt.tight_layout()
plt.show()

```

```

```

### Interpretation of the PACF

- We can see that for the white noise $x_t$ none of the PACF coefficients are statistically significant (for $k>0$)
- For the AR(1) data PACF cuts for to zero after the first lag therefore we have an AR(1) model
    - We now know that $p=1$ and we can estimate an AR(1) model and make predictions using it


 
```


from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima_process import ArmaProcess


# Estimating the AR(1) model
model = AutoReg(ar1_data, lags=1)  # lag=1 for AR(1)
model_fit = model.fit()

# Print the estimated parameters
print(model_fit.summary())

# Plot the original data and the fitted values
plt.plot(ar1_data, label="Original Data")
plt.plot(model_fit.fittedvalues, label="Fitted/Predicted Values", color='red')
plt.title('AR(1) Model Fit')
plt.legend()
plt.show()
```

<span style="background-color: yellow">**Concept 6: Moving Average Models - MA(q)**</span>
- Besides AR(p) models we also have MA(q) models
- MA(q) model depends on $q$ values of the past error terms
    - $y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q}$
- MA processes typically have shorter memory than AR models
- Some MA examples
    - MA(1): $y_t = 1.5 + \epsilon_t + 0.7 \epsilon_{t-1}$
    - MA(2): $y_t = 2.0 + \epsilon_t + 0.6 \epsilon_{t-1} - 0.3 \epsilon_{t-2}$
    - MA(3): $y_t = 0.5 + \epsilon_t + 0.8 \epsilon_{t-1} + 0.4 \epsilon_{t-2} - 0.2 \epsilon_{t-3}$
- Using ACF and PACF to identify an MA(q) model
    - ACF: Cuts off sharply after lag q
    - PACF: Gradually decays

<span style="background-color: yellow">**Concept 7: Autoregressive Moving Average Models - ARMA(p,q)**</span>

- The ARMA(p,q) model combines AR(p) and MA(q) components, making it a very flexible time series model
    - $y_t = \mu + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q}$
- Examples:
    - ARMA(1,1): $y_t = 1.0 + 0.5 y_{t-1} + \epsilon_t + 0.3 \epsilon_{t-1}$
    - ARMA(2,1): $y_t = -0.5 + 0.6 y_{t-1} - 0.4 y_{t-2} + \epsilon_t + 0.2 \epsilon_{t-1}$
    - ARMA(2, 2): $y_t = 0.8 + 0.7 y_{t-1} - 0.2 y_{t-2} + \epsilon_t + 0.5 \epsilon_{t-1} - 0.1 \epsilon_{t-2}$
- For and ARMA(p,q) model we have
    - ACF: Gradual decay, influenced by both AR and MA terms
    - PACF: Gradual decay, with no clear cut-off
    - Difficult to identify from ACF and PACF only


### Identification of MA(q), AR(p) and ARMA(p,q) Models From Data

- We can use a combination of ACF, PACF and Information Criteria (not covered in our unit) to identify the type of model (AR, MA, or ARMA) and p and q
    - Diagnostic: In a well chosen model the residuals should behave like white noise 
- Details are outside of scope of this unit
- Below we will look at some Python liberaries that select the best ARMA type model this automatically

---
---

## Practical Forecasting of Time Series Data in Python

There are a number of statistical Python libraries for doing Time Series analysis and forecasting. Some of them are as follows:


- Nixtla
    - [https://www.nixtla.io/](https://www.nixtla.io/)
    - [https://github.com/nixtla](https://github.com/nixtla)
- GluonTS
    - [https://ts.gluon.ai/stable/index.html](https://ts.gluon.ai/stable/index.html)
    - [https://github.com/awslabs/gluonts](https://github.com/awslabs/gluonts)
- Merlion
    - [https://opensource.salesforce.com/Merlion/v1.0.0/index.html](https://opensource.salesforce.com/Merlion/v1.0.0/index.html)
    - [https://github.com/salesforce/Merlion]
- pmdarima
    - [http://alkaline-ml.com/pmdarima/](http://alkaline-ml.com/pmdarima/)
    - [https://github.com/alkaline-ml/pmdarima](https://github.com/alkaline-ml/pmdarima)
- statsmodels
    - [https://www.statsmodels.org/stable/index.html](https://www.statsmodels.org/stable/index.html)
    - [https://github.com/statsmodels/statsmodels/](https://github.com/statsmodels/statsmodels/)
- sktime
    - [https://www.sktime.net/en/stable/](https://www.sktime.net/en/stable/)
    - [https://github.com/sktime/sktime](https://github.com/sktime/sktime)
- Many other new libararies

---


## Using and Installing `sktime` Library

Click on `File`-> `New` -> `Terminal` 

and run the following commands:

```
pip install sktime

```


```
pip install pmdarima
```

```
pip install prophet
```

### Importing and Plotting Datasets from `sktime`

- We are going to look at a famous time series dataset
    - Number of monthly totals of international airline passengers from 1949 to 1960 (in thousands).


```
import warnings
warnings.filterwarnings("ignore") # hide warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sktime.utils.plotting import plot_series
from sktime.datasets import load_airline


y = load_airline()

plot_series(y)
plt.show()
print(y)
```

---

### "Fixing" `sktime`

If the above code does NOT work we need to fix a little bug that in some versions of `sktime`    
- If the code works ignore these instructions

1. Open cmd terminal
    - start -> type `cmd` 
2. In cmd terminal type `start C:\Users\USERNAME\AppData\Local\anaconda3\Lib\site-packages\sktime\datatypes\_adapter\`
   - Replace USERNAME with your own user name
3. Make a backup copy of "dask_to_pd.py" (just copy and paste it)
4. Open "dask_to_pd.py" in notepad
5. Search for ('ctrl'+f) 'import dask'
6. add `import dask.dataframe` immediately below `import dask`
   - make sure the indent is aligned
7. It should look like this
   <img src="images/fix-sktime.png" alt="Drawing" style="width: 400px;"/>
   
9. restart kernel in Jupyer


---

- `sktime` relies on `pandas` `DataFrames` and the index here is set to `PeriodIndex` type from pandas
    - If importing new data from a `pandas` `Dataframe` we must set the index to `PeriodIndex` in order for it to work with `sktime` 

```
type(y.index)
```


---

## Some Simple Univariate Time Series Models in `sktime`


- Note that in time series analysis, a **season** refers to a repeating pattern or cycle in the data that occurs at regular intervals over time.
    - This concept is crucial for analysing data that exhibits predictable fluctuations due to seasonal influences
    - Examples: day-of-the-week seasonality, weekly, monthly, quarterly, annual seasonality, etc.
 
      
We will evaluate how several time series models perform on the airline passenger data

- `NaiveForecaster` - makes forecasts using simple strategies
    - https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.naive.NaiveForecaster.html
    - We can set `strategy` parameter to the following values
        - 'last' - prediction will be the last value in the training series when sp is 1 ($\hat{y}_{t+1}=y_t$).
            - When sp is not 1, last value of each season in the last window will be forecasted for each season   
        - 'mean' - forecast the mean of last window of training series when sp is 1 ($\hat{y}_{t+1}=\frac{1}{t}\sum_{k=1}^{t} y_{t-k}$).
            - When sp is not 1, mean of all values in a season from last window will be forecasted for each season
        - 'drift' - forecast by fitting a line between the first and last point of the window and extrapolating it into the future.
- `AutoETS` implements various types of exponential smoothing
    - There are many types of exponential smoothing
    - A simple type is where forecasts are calculated using weighted averages
    - The weights are decreasing exponentially as observations go from further in the past
        - Smallest weights are associated with the oldest observations
    -  $\hat{y}_{t+1}=\alpha y_t +\alpha (1-\alpha) y_{t-1} + \alpha(1-\alpha)^2 y_{t-2} + \dots \quad$  where $0\le \alpha \le 1$
- `AutoARIMA` - AutoRegressive Integrated Moving Average models
    - [https://otexts.com/fpp3/arima.html](https://otexts.com/fpp3/arima.html)
    - [https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.arima.AutoARIMA.html](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.arima.AutoARIMA.html)
- `Prophet`
    - **Not Examinable** 
    - Prophet is open source software released by Facebook’s Core Data Science team
    - [https://facebook.github.io/prophet/](https://facebook.github.io/prophet/)
    - [https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.fbprophet.Prophet.html](https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.fbprophet.Prophet.html)
- `ThetaForecaster` - this method performed particularly well in many time series competition but it is actually equivalent to simple exponential smoothing with drift
    - **Not Examinable** 
 
---

## Model Comparisons on the Airline Passenger Dataset


To compare model perfromances we will do the following:
- Use the **holdout method** (week 7 lecture material) to fit and evaluate the performance of alternative models
    - We will fit the models to the Training dataset
    - Select the best model on the Validation set
    - Evaluate performance and the extent of overfitting on the Test dataset
- Performance Criteria
    1. Mean Square Error $\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$
    2. Mean Absolute Errors $\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|$

<div style="text-align: center;">
<img src="images/week11_3.png" alt="Drawing" style="width: 400px;"/>  
</div>

---

### Step 1: Split the data into Training, Validation and Test Datasets

- Create Training, Test and Validation Datasets
- Use `temporal_train_test_split` which splits the data into sequential train and test subsets
    - [https://www.sktime.net/en/v0.11.4/api_reference/auto_generated/sktime.forecasting.model_selection.temporal_train_test_split.html](https://www.sktime.net/en/v0.11.4/api_reference/auto_generated/sktime.forecasting.model_selection.temporal_train_test_split.html)
- `test_size` parameter can be either
    - a float (fraction of sample)
    - an integer (number of observations)


```

from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split


# ------------- create test dataset ------------------
y_train_refit, y_test = temporal_train_test_split(y, test_size=12)  # test set will have 12 observations
fh_test = ForecastingHorizon(y_test.index, is_relative=False)
print('train_refit shape', y_train_refit.shape)

print('test shape', y_test.shape)
print('test index', fh_test)


# ------------- create training and validation datasets ------------------
y_train, y_validate = temporal_train_test_split(y_train_refit, test_size=12)
fh_validate = ForecastingHorizon(y_validate.index, is_relative=False)
print('train shape', y_train.shape)

print('validate shape', y_validate.shape)
print('validate index', fh_validate)

```

---

### Step 2: List Forecasting Models in a Dictionary

```

from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.ets import AutoETS
from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.fbprophet import Prophet
from sktime.forecasting.theta import ThetaForecaster

forecasters = {}   # a dictionary of forecasters

# ----------------------------- MODELS WHERE WE DON'T USE SEASONALITY -----------------------
forecasters['f_naive_last_no_sp'] = NaiveForecaster(strategy = 'last', sp = 1)  # last value without seasonal periodicity
forecasters['f_naive_mean_no_sp'] = NaiveForecaster(strategy = 'mean', sp = 1)  # forecast the mean value
forecasters['f_naive_drift_no_sp'] = NaiveForecaster(strategy = 'drift', sp = 1)  # forecast line trend
forecasters['f_auto_ets_no_sp'] = AutoETS(auto=True, sp=1, n_jobs=-1)  # automatic exponential smoothing no seasonality
forecasters['f_auto_arima_no_sp'] = AutoARIMA(sp=1)   # automatic ARIMA no seasonality
forecasters['f_theta_no_sp'] = ThetaForecaster(sp=1)  


# ----------------------------- MODELS WHERE WE USE SEASONALITY -----------------------
forecasters['f_naive_last_with_sp'] = NaiveForecaster(strategy = 'last', sp = 12)  # last value in each season
forecasters['f_naive_mean_with_sp'] = NaiveForecaster(strategy = 'mean', sp = 12)  # forecast the mean value of seasons
forecasters['f_drift_drift_with_sp'] = NaiveForecaster(strategy = 'drift', sp = 12)  # forecast line trend
forecasters['f_auto_ets_with_sp'] = AutoETS(auto=True, sp=12, n_jobs=-1)
forecasters['f_auto_arima_with_sp'] = AutoARIMA(sp=12)   # automatic ARIMA with seasonality
forecasters['f_theta_with_sp'] = ThetaForecaster(sp=12)    
forecasters['f_prophet_sp_multiplicative']= Prophet(seasonality_mode='multiplicative')   
forecasters['f_prophet_sp_additive']= Prophet(seasonality_mode='additive')   

forecasters

```

---
### Step 3: Fit the Models, Make Predictions of the Validation Dataset, Plot Forecasts

- Fit all the Models
- Predict Validation Dataset
- Plot Forecasts
- Compare Forecasts
- Select Best Model

  
```
predictions_validate = {}

for forecaster in forecasters:
    forecasters[forecaster].fit(y_train)
    predictions_validate[forecaster] = forecasters[forecaster].predict(fh_validate) 

df_validate = pd.DataFrame(predictions_validate)

plot_output = plot_series(y, *[df_validate[col] for col in df_validate.columns], labels = ['y'] + list(df_validate.columns))


plt.show()

df_validate




```


---

### Step 4: Compute Performance Criteria and Compare Models


- Compute MSE and MAE
- Rank and select best model

```
from sktime.performance_metrics.forecasting import MeanSquaredError, MeanAbsoluteError

mse = MeanSquaredError()
mae = MeanAbsoluteError()

dict_mse = {}
dict_errors = {}

for forecaster in predictions_validate:
    # Calculate error metric for each forecaster
    forecaster_errors = {
        'mse': mse(y_validate, predictions_validate[forecaster]),
        'mae': mae(y_validate, predictions_validate[forecaster])
    }
    
    # Store the errors in the dictionary
    dict_errors[forecaster] = forecaster_errors

df_errors_validation = pd.DataFrame.from_dict(dict_errors, orient='index') # convert to a DF
df_errors_sorted_validation = df_errors_validation.sort_values('mse', ascending=True) # Sort errors by MSE
df_errors_sorted_validation.insert(0, 'Rank', np.arange(1, len(df_errors_sorted_validation)+1)) # add rankings

# Displaying the sorted DataFrame

print('\n Forecast Rankings on Validation Dataset According to MSE \n')
print(df_errors_sorted_validation)
```

---

### Step 5: Refit the models to the entire (expanded) Training dataset & evaluate on Test dataset

- Refit the models to the entire training dataset
- Re-evaluate performance on the test dataset
- Assess the consistency of ranking and the extent of overfitting

1. Refit the models

```
predictions_test = {}

for forecaster in forecasters:
    forecasters[forecaster].fit(y_train_refit)
    predictions_test[forecaster] = forecasters[forecaster].predict(fh_test) 

df_test = pd.DataFrame(predictions_test)

plot_output = plot_series(y, *[df_test[col] for col in df_test.columns], labels = ['y'] + list(df_test.columns))


plt.show()

df_test
```

2. Compare Predictions

```
dict_mse = {}
dict_errors = {}

for forecaster in predictions_test:
    # Calculate error metric for each forecaster
    forecaster_errors = {
        'mse': mse(y_test, predictions_test[forecaster]),
        'mae': mae(y_test, predictions_test[forecaster])
    }
    
    # Store the errors in the dictionary
    dict_errors[forecaster] = forecaster_errors

df_errors_test = pd.DataFrame.from_dict(dict_errors, orient='index') # convert to a DF
df_errors_sorted_test = df_errors_test.sort_values('mse', ascending=True) # Sort errors by MSE
df_errors_sorted_test.insert(0, 'Rank', np.arange(1, len(df_errors_sorted_test)+1)) # add rankings

# Displaying the sorted DataFrame

print('\n Forecast Rankings on Test Dataset According to MSE \n')
print(df_errors_sorted_test)

```


- Print again validation results

```
print('\n Forecast Rankings on Validation Dataset According to MSE \n')
print(df_errors_sorted_validation)
```

---

### Conclusions

- Model rankings change but the best models remain on the top of the list
    - Prophet, ARIMA, ETS (all with seasonality) rank high on the list
- Holdout method is not the best way to evaluate performance (as discussed in week 7)
    - Cross-validation, Rolling and Expanding Forecast Windows are better at selecting optimal models
    - These are more complicated techniques and left for more advanced units on time series forecasting