# Predicción de series temporales univariantes


A continuación vamos a entrenar un modelo de series temporales univariantes para predecir el número de pasajeros de una aerolínea. Código basado en la documentación oficial de pycaret https://github.com/pycaret/pycaret y el artículo de Moez Ali https://moez-62905.medium.com/

# Parte I

### Instalación de librerias

In [1]:
!pip install pycaret==2.2.3
!pip install -U scikit-learn==0.23.2

Collecting pycaret==2.2.3
  Downloading pycaret-2.2.3-py3-none-any.whl (249 kB)
Collecting scikit-plot
  Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)
Collecting scikit-learn==0.23.2
  Downloading scikit_learn-0.23.2-cp38-cp38-win_amd64.whl (6.8 MB)
Installing collected packages: scikit-learn, scikit-plot, pycaret
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.23.1
    Uninstalling scikit-learn-0.23.1:
      Successfully uninstalled scikit-learn-0.23.1
  Attempting uninstall: pycaret
    Found existing installation: pycaret 2.1.2
    Uninstalling pycaret-2.1.2:
      Successfully uninstalled pycaret-2.1.2
Successfully installed pycaret-2.2.3 scikit-learn-0.23.2 scikit-plot-0.3.7


Requirement already up-to-date: scikit-learn==0.23.2 in c:\users\administrator\anaconda3\lib\site-packages (0.23.2)


In [None]:
!pip install plotly==5.1.0 
!pip install plotly-express==0.4.1

In [13]:
import pycaret
import sklearn
print(pycaret.__version__)
print(sklearn.__version__)

2.3.1
0.23.2


## Importación de datos

In [34]:
# read csv file
import pandas as pd
import numpy as np

data = pd.read_csv('data\AirPassengers.csv')
data['Date'] = pd.to_datetime(data['Date'])
data.head()

Unnamed: 0,Date,Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


In [15]:
data.head()

Unnamed: 0,Date,Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


## Análisis de datos

In [16]:
# create 12 month moving average
data['MA12'] = data['Passengers'].rolling(12).mean()
data

Unnamed: 0,Date,Passengers,MA12
0,1949-01-01,112,
1,1949-02-01,118,
2,1949-03-01,132,
3,1949-04-01,129,
4,1949-05-01,121,
...,...,...,...
139,1960-08-01,606,463.333333
140,1960-09-01,508,467.083333
141,1960-10-01,461,471.583333
142,1960-11-01,390,473.916667


In [17]:
# plot the data and MA
import plotly.express as px
fig = px.line(data, x="Date", y=["Passengers", "MA12"], template = 'plotly_dark')
fig.show()

## Transformación de datos

In [18]:
# extract month and year from dates
data['Month'] = [i.month for i in data['Date']]
data['Year'] = [i.year for i in data['Date']]

# create a sequence of numbers
data['Series'] = np.arange(1,len(data)+1)

# drop unnecessary columns and re-arrange
data.drop(['Date', 'MA12'], axis=1, inplace=True)
data = data[['Series', 'Year', 'Month', 'Passengers']] 

# check the head of the dataset
data.head()

Unnamed: 0,Series,Year,Month,Passengers
0,1,1949,1,112
1,2,1949,2,118
2,3,1949,3,132
3,4,1949,4,129
4,5,1949,5,121


### División de datos en train y test

In [19]:
# split data into train-test set
train = data[data['Year'] < 1960]
test = data[data['Year'] >= 1960]
# check shape
train.shape, test.shape


((132, 4), (12, 4))

### Preprocesamiento de datos

In [20]:
# import the regression module
from pycaret.regression import *

# initialize setup
s = setup(data = train, test_data = test, target = 'Passengers',
          fold_strategy = 'timeseries', numeric_features = ['Year', 'Series'],
          fold = 3, transform_target = True, session_id = 123)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,Passengers
2,Original Data,"(132, 4)"
3,Missing Values,False
4,Numeric Features,2
5,Categorical Features,1
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(132, 13)"


# Parte II

## Entrenamiento del modelo de regresión

In [21]:
best = compare_models(sort = 'MAE')

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lr,Linear Regression,22.398,923.874,28.2856,0.5621,0.0878,0.0746,1.0667
lar,Least Angle Regression,22.398,923.8666,28.2855,0.5621,0.0878,0.0746,0.0133
huber,Huber Regressor,22.4184,891.3113,27.9303,0.599,0.0879,0.0749,0.02
br,Bayesian Ridge,22.4783,932.2165,28.5483,0.5611,0.0884,0.0746,0.0133
ridge,Ridge Regression,23.1975,1003.936,30.0408,0.5258,0.0933,0.0764,0.8467
lasso,Lasso Regression,38.4188,2413.5109,46.8468,0.0882,0.1473,0.1241,0.8633
en,Elastic Net,40.6486,2618.8759,49.4048,-0.0824,0.1563,0.1349,0.0133
omp,Orthogonal Matching Pursuit,44.3054,3048.2658,53.8613,-0.4499,0.1713,0.152,0.0133
xgboost,Extreme Gradient Boosting,46.7192,3791.0476,59.9683,-0.5515,0.1962,0.1432,0.1067
gbr,Gradient Boosting Regressor,49.3197,3925.4366,60.5087,-0.5759,0.2002,0.1511,0.03


In [22]:
prediction_holdout = predict_model(best)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Linear Regression,25.0713,972.2656,31.1812,0.8245,0.0692,0.0571


# Parte III

## Predicción de series temporales

In [28]:
# generate predictions on the original dataset
predictions = predict_model(best, data=data)

# add a date column in the dataset
predictions['Date'] = pd.date_range(start='1949-01-01', end = '1960-12-01', freq = 'MS')


### Visualización de resultados

In [29]:
# line plot
fig = px.line(predictions, x='Date', y=["Passengers", "Label"], template = 'plotly_dark')
# add a vertical rectange for test-set separation
fig.add_vrect(x0="1960-01-01", x1="1960-12-01", fillcolor="grey", opacity=0.25, line_width=0)
fig.show()

In [24]:
final_best = finalize_model(best)

## Predicción con nuevos datos

In [30]:
future_dates = pd.date_range(start = '1961-01-01', end = '1965-01-01', freq = 'MS')
future_df = pd.DataFrame()

In [31]:
future_df['Month'] = [i.month for i in future_dates]
future_df['Year'] = [i.year for i in future_dates]    
future_df['Series'] = np.arange(145,(145+len(future_dates)))
future_df.head()

Unnamed: 0,Month,Year,Series
0,1,1961,145
1,2,1961,146
2,3,1961,147
3,4,1961,148
4,5,1961,149


In [26]:
predictions_future = predict_model(final_best, data=future_df)
predictions_future.head()

Unnamed: 0,Month,Year,Series,Label
0,1,1961,145,486.278046
1,2,1961,146,482.207642
2,3,1961,147,550.486145
3,4,1961,148,535.186584
4,5,1961,149,538.923767


In [32]:
concat_df = pd.concat([data,predictions_future], axis=0)

concat_df_i = pd.date_range(start='1949-01-01', end = '1965-01-01', freq = 'MS')
concat_df.set_index(concat_df_i, inplace=True)

In [33]:
concat_df.head()

Unnamed: 0,Series,Year,Month,Passengers,Label
1949-01-01,1,1949,1,112.0,
1949-02-01,2,1949,2,118.0,
1949-03-01,3,1949,3,132.0,
1949-04-01,4,1949,4,129.0,
1949-05-01,5,1949,5,121.0,


In [27]:
fig = px.line(concat_df, x=concat_df.index, y=["Passengers", "Label"], template = 'plotly_dark')
fig.show()