# Model Training: Inflation Prediction

This notebook focuses on training two predictive models for inflation using time series and multivariate data: 

1. **AutoARIMA**: A model designed to handle time series data by automatically selecting the best parameters for ARIMA. This approach is univariate, focusing solely on the inflation target.
2. **Multiple Linear Regression**: A multivariate regression model that leverages all stationary regressors to predict inflation.

### Objectives:
- Compare the performance of the two models using Mean Squared Error (MSE) as the evaluation metric.
- Select the best-performing model for deployment in an MLOps pipeline.
- Save the chosen model for further use and integration into the production system.

### Methodology:
- **Data Preparation**: Load the stationary dataset and split it into training and testing sets.
- **Model Training**: Train both AutoARIMA and multiple linear regression models.
- **Evaluation**: Assess model performance using the test set.
- **Model Selection**: Choose the model with the lowest MSE for deployment.
- **Output**: Save the selected model for future predictions.

This notebook prioritizes simplicity and clarity, focusing on foundational modeling steps to ensure a smooth integration into the broader MLOps framework. Future iterations may include advanced modeling techniques or periodic retraining mechanisms.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Loading the dataset:
data = pd.read_csv("../data/external/stationary_data.csv", index_col=0, parse_dates=True)

# Separate the independent variables (X) from target variable (y)
target = "inflation_diff"
X = data.drop(columns=[target])
y = data[target]

# Separate into train and test:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=False)  # No shuffle in order to keep the order of time.


In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import numpy as np

from pmdarima import auto_arima
from sklearn.metrics import mean_squared_error

# Fitting an AutoARIMA over the target variable:
arima_model = auto_arima(y_train, seasonal=True, m=12, trace=True, error_action='ignore', suppress_warnings=True)

# Predicting:
arima_forecast = arima_model.predict(n_periods=len(y_test))

# RMSE:
arima_rmse = np.sqrt(mean_squared_error(y_test, arima_forecast))
print(f"AutoARIMA RMSE: {arima_rmse:.4f}")


Performing stepwise search to minimize aic
 ARIMA(2,0,2)(1,0,1)[12] intercept   : AIC=374.895, Time=15.39 sec
 ARIMA(0,0,0)(0,0,0)[12] intercept   : AIC=402.174, Time=0.06 sec
 ARIMA(1,0,0)(1,0,0)[12] intercept   : AIC=399.746, Time=1.57 sec
 ARIMA(0,0,1)(0,0,1)[12] intercept   : AIC=397.606, Time=2.56 sec
 ARIMA(0,0,0)(0,0,0)[12]             : AIC=400.252, Time=0.03 sec
 ARIMA(2,0,2)(0,0,1)[12] intercept   : AIC=374.635, Time=10.83 sec
 ARIMA(2,0,2)(0,0,0)[12] intercept   : AIC=376.138, Time=5.28 sec
 ARIMA(2,0,2)(0,0,2)[12] intercept   : AIC=inf, Time=16.41 sec
 ARIMA(2,0,2)(1,0,0)[12] intercept   : AIC=373.471, Time=13.44 sec
 ARIMA(2,0,2)(2,0,0)[12] intercept   : AIC=inf, Time=20.17 sec
 ARIMA(2,0,2)(2,0,1)[12] intercept   : AIC=372.779, Time=22.94 sec
 ARIMA(2,0,2)(2,0,2)[12] intercept   : AIC=371.893, Time=19.84 sec
 ARIMA(2,0,2)(1,0,2)[12] intercept   : AIC=inf, Time=21.17 sec
 ARIMA(1,0,2)(2,0,2)[12] intercept   : AIC=inf, Time=18.84 sec
 ARIMA(2,0,1)(2,0,2)[12] intercept   : A

In [3]:
from sklearn.linear_model import LinearRegression

# Model Training
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Model predicting:
linear_forecast = linear_model.predict(X_test)

# RMSE:
linear_rmse = np.sqrt(mean_squared_error(y_test, linear_forecast))
print(f"Linear Regression RMSE: {linear_rmse:.4f}")

Linear Regression RMSE: 12.7477


In [4]:
if arima_rmse < linear_rmse:
    print("AutoARIMA as selected model")
    best_model = arima_model
    best_model_name = "AutoARIMA"
else:
    print("Linear Regression as selected model.")
    best_model = linear_model
    best_model_name = "Linear Regression"

print(f"Best model: {best_model_name}")

AutoARIMA as selected model
Best model: AutoARIMA


In [6]:
import joblib

# Save the best model
joblib.dump(best_model, f"../models/{best_model_name.lower().replace(' ', '_')}.pkl")
print(f"{best_model_name} saved on ../models/")

AutoARIMA saved on ../models/
