# Autoregressive Integrated Moving Average (ARIMA)

1. **Import Packages**: Necessary packages such as pandas, numpy, statsmodels, sklearn.metrics, and matplotlib are imported.

2. **Load Data**: Training and testing data are loaded from CSV files.

3. **Data Preprocessing**: The data are combined into single dataframes for training and testing. The 'date' column is converted to datetime format and set as the index. The dataset is then split into features (X) and the target variable (y).

4. **Grid Search Function**: A function named `grid_search_arima` is defined to perform a grid search to find the best parameters (p, d, q) for the ARIMA model.

5. **Grid Search**: The grid search function is called with ranges for p, d, and q values to find the best order.

    - `p_values`: The range of values for the AR (AutoRegressive) parameter. It is set to range from 0 to 4, inclusive. This parameter represents the number of lag observations included in the model, also called the lag order.

    - `d_values`: The range of values for the I (Integrated) parameter. It is set to range from 0 to 2, inclusive. This parameter represents the number of times that the raw observations are differenced, also called the degree of differencing.

    - `q_values`: The range of values for the MA (Moving Average) parameter. It is set to range from 0 to 4, inclusive. This parameter represents the size of the moving average window, also called the order of moving average.

6. **Train ARIMA Model**: An ARIMA model is trained with the best order found from the grid search.

7. **Forecasting**: The test data are forecasted using the trained ARIMA model.

8. **Model Evaluation**: The model's performance is evaluated using mean squared error (MSE), root mean square error (RMSE), mean absolute error (MAE), and R-squared score. These metrics are printed to the console.

In [1]:
# Import packages
import pandas as pd
import numpy as np
from itertools import product
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from math import sqrt

# Load the split data from csv
X_train = pd.read_csv("X_train.csv")
X_test = pd.read_csv("X_test.csv")
y_train = pd.read_csv("y_train.csv")
y_test = pd.read_csv("y_test.csv")

# combine X and y into a single dataframe
train = pd.concat([X_train, pd.DataFrame(y_train)], axis=1)
test = pd.concat([X_test, pd.DataFrame(y_test)], axis=1)

# Convert date column to datetime and set it as the index
train["date"] = pd.to_datetime(train["date"])
test["date"] = pd.to_datetime(test["date"])
train.set_index("date", inplace=True)
test.set_index("date", inplace=True)
train.index = pd.DatetimeIndex(train.index).to_period("D")
test.index = pd.DatetimeIndex(test.index).to_period("D")
train = train.sort_index()
test = test.sort_index()

# Split the dataset into features (X) and the target variable (y)
X_train = train.drop(columns=["unit_sales"])
X_test = test.drop(columns=["unit_sales"])
y_train = train["unit_sales"]
y_test = test["unit_sales"]


# Define a function for grid search
def grid_search_arima(X_train, y_train, p_values, d_values, q_values):
    """
    Perform grid search to find the best ARIMA order for time series forecasting.

    Parameters:
    - X_train (array-like): Training data for the exogenous variables.
    - y_train (array-like): Training data for the target variable.
    - p_values (list): List of values to try for the AR parameter.
    - d_values (list): List of values to try for the differencing parameter.
    - q_values (list): List of values to try for the MA parameter.

    Returns:
    - best_order (tuple): The best order (p, d, q) found during the grid search.
    """

    best_rmse = float("inf")
    best_order = None

    for p, d, q in product(p_values, d_values, q_values):
        order = (p, d, q)
        try:
            arima_model = ARIMA(y_train, order=order)
            arima_fit = arima_model.fit()
            y_pred = arima_fit.forecast(steps=len(X_test))
            rmse = np.sqrt(mean_squared_error(y_test, y_pred))
            if rmse < best_rmse:
                best_rmse = rmse
                best_order = order
        except:
            continue

    return best_order


# Define ranges for p, d, and q values
p_values = range(5)
d_values = range(3)
q_values = range(5)

# Perform grid search
best_order = grid_search_arima(X_train, y_train, p_values, d_values, q_values)

# Train ARIMA model with the best parameters
model = ARIMA(y_train, order=best_order)

# Fit the ARIMA model
model_fitted = model.fit()

# Make predictions on the test set
predictions = model_fitted.forecast(steps=len(X_test))

# Calculate the evaluation metrics
mse = mean_squared_error(y_test, predictions)
rmse = sqrt(mse)
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

# Print the evaluation metrics
print(f"Model: {model.__class__.__name__}")
print(f"- Best Parameters: {best_order}")
print(f"- MSE: {mse}")
print(f"- RMSE: {rmse}")
print(f"- MAE: {mae}")
print(f"- R2 Score: {r2}")

  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  warn('Non-invertible starting MA parameters found.'
  w

Model: ARIMA
- Best Parameters: (0, 0, 0)
- MSE: 372.09933140358584
- RMSE: 19.289876396793886
- MAE: 7.801479291950249
- R2 Score: -1.5717690651229077e-06
