Google Colab notebook outline for Time Series Forecasting using PyCaret with the "airline" dataset, focusing on a univariate forecast without exogenous variables:

### Colab Notebook: Time Series Forecasting - Univariate (Airline Dataset)

#### 1. **Introduction**
   ### **Introduction**

Time series forecasting is a powerful technique used to predict future values based on past observations. It is widely applied in various domains such as finance, sales, inventory management, and environmental monitoring, where understanding trends and making future predictions are critical. In time series forecasting, data points are captured at consistent intervals over time, allowing models to identify patterns such as seasonality, trends, and cycles.

In this notebook, we will focus on univariate time series forecasting, which involves predicting future values of a single variable based solely on its historical data. Unlike multivariate forecasting, which considers additional external variables, univariate forecasting deals with a simpler setup, making it ideal when external factors are unknown or unavailable.

We will use PyCaret, an open-source, low-code machine learning library that simplifies the process of model training and evaluation. PyCaret's time series module provides a streamlined approach to building, comparing, and deploying forecasting models with minimal code. For this demonstration, we will use PyCaret’s built-in "airline" dataset, which consists of monthly totals of international airline passengers from 1949 to 1960. The goal is to predict the number of passengers for the next 12 months.

This notebook will guide you through the entire process of setting up the time series environment, selecting and training models, forecasting future values, and evaluating model performance—all with PyCaret's intuitive tools. By the end, you will gain hands-on experience with time series forecasting and a clear understanding of how to apply these techniques to real-world scenarios.


#### 2. **Installation and Setup**

In [2]:
# Install PyCaret
!pip install pycaret

Collecting pycaret
  Downloading pycaret-3.3.2-py3-none-any.whl.metadata (17 kB)
Collecting scipy<=1.11.4,>=1.6.1 (from pycaret)
  Downloading scipy-1.11.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting joblib<1.4,>=1.2.0 (from pycaret)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting scikit-learn>1.4.0 (from pycaret)
  Downloading scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting pyod>=1.1.3 (from pycaret)
  Downloading pyod-2.0.2.tar.gz (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.8/165.8 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting category-encoders>=2.4.0 (from pycaret)
  Downloading category_encoders-2.6.3-py2.py3-none-any.whl.metadata (8

#### 3. **Importing Required Libraries**

In [8]:
# Importing necessary libraries
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import *
e=TSForecastingExperiment()

#### 4. **Loading the Dataset**
   - Explanation: PyCaret provides built-in datasets like the "airline" dataset, which represents monthly totals of international airline passengers from 1949 to 1960.

In [9]:
# Load the built-in "airline" dataset
data = get_data('airline')
# Display the first few rows of the dataset
data.head()

Unnamed: 0_level_0,Number of airline passengers
Period,Unnamed: 1_level_1
1949-01,112.0
1949-02,118.0
1949-03,132.0
1949-04,129.0
1949-05,121.0


Unnamed: 0_level_0,Number of airline passengers
Period,Unnamed: 1_level_1
1949-01,112.0
1949-02,118.0
1949-03,132.0
1949-04,129.0
1949-05,121.0


#### 5. **Data Preparation**
   - Explanation: Basic checks to ensure the data is clean, correctly indexed by date, and ready for modeling.

In [11]:
# Convert the index to datetime if it's in period format
data.index = data.index.to_timestamp()
# check for missing values
data.isnull().sum()

0

#### 6. **Model Setup**
   - Explanation: Set up the time series environment with PyCaret, specifying necessary parameters such as frequency.

In [13]:
# Initialize the PyCaret setup for time series forecasting
ts_setup = setup(
    data=data,
    fh=12,  # Forecast horizon (how many periods ahead you want to predict)
    session_id=123,  # Random seed for reproducibility
    seasonal_period="M",  # Monthly data
)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,Number of airline passengers
2,Approach,Univariate
3,Exogenous Variables,Not Present
4,Original data shape,"(144, 1)"
5,Transformed data shape,"(144, 1)"
6,Transformed train set shape,"(132, 1)"
7,Transformed test set shape,"(12, 1)"
8,Rows with missing values,0.0%
9,Fold Generator,ExpandingWindowSplitter



#### 7. **Model Training and Selection**
   - Explanation: Comparing different forecasting models to choose the best one based on performance metrics.

In [14]:
# Compare different models to find the best performing one
best_model = compare_models()

Unnamed: 0,Model,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,R2,TT (Sec)
exp_smooth,Exponential Smoothing,0.5852,0.6105,17.1926,20.1633,0.0435,0.0439,0.8918,0.1
ets,ETS,0.5931,0.6212,17.4165,20.5102,0.044,0.0445,0.8882,0.18
et_cds_dt,Extra Trees w/ Cond. Deseasonalize & Detrending,0.6632,0.7313,19.5494,24.1881,0.0486,0.0486,0.8449,0.63
huber_cds_dt,Huber w/ Cond. Deseasonalize & Detrending,0.6813,0.7866,20.0334,25.967,0.0491,0.0499,0.8113,0.36
arima,ARIMA,0.683,0.6735,20.0069,22.2199,0.0501,0.0507,0.8677,0.19
lr_cds_dt,Linear w/ Cond. Deseasonalize & Detrending,0.7004,0.7702,20.6084,25.4401,0.0509,0.0514,0.8215,0.8733
ridge_cds_dt,Ridge w/ Cond. Deseasonalize & Detrending,0.7004,0.7703,20.6086,25.4405,0.0509,0.0514,0.8215,0.3467
en_cds_dt,Elastic Net w/ Cond. Deseasonalize & Detrending,0.7029,0.7732,20.6816,25.5362,0.0511,0.0516,0.8201,0.34
lasso_cds_dt,Lasso w/ Cond. Deseasonalize & Detrending,0.7048,0.7751,20.7373,25.6005,0.0512,0.0517,0.8193,0.3567
llar_cds_dt,Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending,0.7048,0.7751,20.7366,25.6009,0.0512,0.0517,0.8192,0.3433


Processing:   0%|          | 0/117 [00:00<?, ?it/s]

#### 8. **Finalizing the Model**
   - Explanation: Finalize the best model to lock the training process and use it for prediction.

In [15]:
# Finalize the best model for forecasting
final_model = finalize_model(best_model)

#### 9. **Forecasting**
   - Explanation: Forecast future values using the finalized model and visualize the results.

In [16]:
# Forecast the next 12 periods
forecast = predict_model(final_model)
# Display the forecasted values
print(forecast)

           y_pred
1961-01  445.2424
1961-02  418.2253
1961-03  465.3098
1961-04  494.9512
1961-05  505.4759
1961-06  573.3127
1961-07  663.5964
1961-08  654.9040
1961-09  546.7610
1961-10  488.4468
1961-11  415.7235
1961-12  460.3778


#### 10. **Visualizing the Forecast**
   - Explanation: Visualize the predicted values compared to the actual historical data.

In [18]:
# Plot the forecast
plot_model(final_model, plot='forecast')


#### 11. **Evaluating Model Performance**
   - Explanation: Review performance metrics like MAE, MSE, RMSE to evaluate how well the model performed.
   


In [None]:
#Evaluate the performance of the final model
evaluate_model(final_model)