# ARIMA Modelling

An ARIMA (AutoRegressive Integrated Moving Average) model is a statistical method used for time series forecasting. It combines autoregressive (AR), differencing (I), and moving average (MA) components to capture the underlying patterns and dependencies in a time series dataset, based on historical values of the target.

## Stationarity

ARIMA can only be used to model stationary data. If the data is not stationary then it must be converted to stationary. This can be done by calculating the difference.

The right order of differencing is the minimum differencing required to get a near-stationary series which roams around a defined mean and the ACF plot reaches to zero fairly quickly. If the autocorrelations are positive for many number of lags (10 or more), then the series needs further differencing. On the other hand, if the lag 1 autocorrelation itself is too negative, then the series is probably over-differenced.

In [1]:
# Import required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

In [2]:
# Read in the LOB Data
lob_df = pd.read_csv('data/output/EDA_lob_output_data_sample.csv')

# Check df

lob_df.head()

Unnamed: 0,Timestamp,Exchange,Bid,Ask,Date,Mid_Price
0,0.0,Exch0,[],[],2025-01-02,
1,0.279,Exch0,"[[1, 6]]",[],2025-01-02,
2,1.333,Exch0,"[[1, 6]]","[[800, 1]]",2025-01-02,400.5
3,1.581,Exch0,"[[1, 6]]","[[799, 1]]",2025-01-02,400.0
4,1.643,Exch0,"[[1, 6]]","[[798, 1]]",2025-01-02,399.5


In [4]:
lob_df.sort_values(by=['Date','Timestamp'], ascending=True, inplace=True)