<a href="https://colab.research.google.com/github/arkeodev/time-series/blob/main/Statistical_Time_Series_Analysis/10-ARIMAX_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ARIMAX Models

The ARIMAX model stands for Autoregressive Integrated Moving Average with eXogenous inputs model. It is an extension of the ARIMA model, which includes external factors.

## Incorporating Exogenous Variables - X

In ARIMAX, the ‘X’ refers to the exogenous variables, which are external factors or covariates that might influence the time series. These variables can include anything relevant that could impact the response variable you are trying to forecast but is not directly part of the series. For instance, if you were predicting demand for umbrellas, an exogenous variable might be rainfall. Unlike endogenous variables in ARIMA, exogenous variables are not predicted by the model but are included in the model to help improve the accuracy of predictions.

## Mathematical Notation

In mathematical terms, the ARIMAX model is written as:

$$
\Delta P_t = c + \phi_1 \Delta P_{t-1} + \theta_1 \varepsilon_{t-1} + \beta X_t + \varepsilon_t
$$

Where:
- $ \Delta P_t $ is the differenced time series value at time t.
- $ c $ is a constant.
- $ \phi_1 $ is the coefficient for the first lag of the differenced series.
- $ \theta_1 $ is the coefficient for the first lag of the error term.
- $ \beta $ is the coefficient for the exogenous variable $ X_t $.
- $ \varepsilon_t $ is the error term (residuals of the model).

## Practical Considerations

1. The exogenous variable $ X $ can be:
- A time-varying measurement (e.g., temperature, economic indicators).
- A categorical variable (e.g., day of the week, event occurrence).
- A Boolean value (e.g., presence or absence of an event).
- A combination of several different factors.

2. It's important that for ARIMAX, you have data for the exogenous variables for every period you're trying to model.

3. When incorporating exogenous variables, it’s crucial to ensure that these factors are indeed influential to the model and are not just noise. Overfitting the model with irrelevant X variables can lead to poor predictive performance.

## Implementation

In [4]:
import numpy as np
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
import matplotlib.pyplot as plt

# Set the random seed for reproducibility
np.random.seed(42)

# Generate a series of 100 observations as our dependent variable following an ARIMA(1, 1, 1) process
ar_params = np.array([0.5])  # AR(1) parameter
ma_params = np.array([-0.3])  # MA(1) parameter
ar = np.r_[1, -ar_params]  # add zero-lag and negate AR params
ma = np.r_[1, ma_params]  # add zero-lag for MA params
y = np.random.normal(loc=0, scale=1, size=100).cumsum()  # Simulating an integrated process

# Generate a synthetic exogenous variable (e.g., temperature, economic index)
x = np.random.normal(loc=0, scale=1, size=100).cumsum()  # Simulating an external factor

# Fit an ARIMAX model (with the order of ARIMA(1, 1, 1) and exogenous variable)
model = SARIMAX(y, order=(1, 1, 1), exog=x)
results = model.fit()

# Summarize the model results
print(results.summary())

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            4     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.30539D+00    |proj g|=  1.17406D-02

At iterate    5    f=  1.30527D+00    |proj g|=  1.49366D-03

At iterate   10    f=  1.30520D+00    |proj g|=  6.14288D-03

At iterate   15    f=  1.30518D+00    |proj g|=  1.05106D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    4     15     27      1     0     0   1.051D-05   1.305D+00
  F =   1.3051781967717191     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             
        

 This problem is unconstrained.


In this example:
- `y` represents the dependent variable simulated to follow an ARIMA process.
- `x` is the exogenous variable we have generated, which could represent any external factor that might affect `y`.
- We fit an ARIMAX model to `y` with `x` as the exogenous input.
- The `SARIMAX` class is used, which is a more generalized class in `statsmodels` capable of ARIMA, Seasonal ARIMA, and ARIMAX modeling.

## Conclusion


In summary, ARIMAX is an ARIMA model with the added capability of including external factors that might influence the time series you are modeling. By incorporating these exogenous inputs, ARIMAX can provide more accurate predictions when these external factors have significant effects on the model outcomes.