# Notebook 03A — Diagnostics & Model Justification

**Adaptive Pair Trading Project**  
Ayush Arora (MQMS2404)

---

### Why this notebook exists

This notebook is added **after Notebook 03 (Spread Construction)** in response to supervisor feedback.

It answers three questions:
1. Are the series stationary or non-stationary?
2. Are covariates influencing the togetherness of the pair?
3. Is ARCH/GARCH modeling actually required?

No prior results or code are modified.

## Cell 1: Import required libraries

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import yfinance as yf
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import het_arch

## Cell 2: Load outputs from previous notebooks

We load the cleaned price data and the spread constructed earlier.

In [4]:
prices = pd.read_csv('data/prices.csv', index_col=0, parse_dates=True)
spread = pd.read_csv('data\spread_tatasteel_hindalco.csv', index_col=0, parse_dates=True).iloc[:,0]

prices.head(), spread.head()

(                 ABB.NS  ADANIENT.NS  ADANIPORTS.NS  AMBUJACEM.NS  \
 Date                                                                
 2015-01-01  1131.143555    70.761368     301.772858    195.773010   
 2015-01-02  1119.410767    71.107956     301.584015    198.523697   
 2015-01-05  1119.671143    72.284912     305.786499    198.265808   
 2015-01-06  1109.502930    71.736145     303.944946    190.959244   
 2015-01-07  1093.510986    71.100731     303.236633    189.197098   
 
             APOLLOHOSP.NS  ASHOKLEY.NS  ASIANPAINT.NS  AUROPHARMA.NS  \
 Date                                                                   
 2015-01-01    1084.179443    21.544609     684.931519     532.220825   
 2015-01-02    1086.056763    22.124655     708.611450     534.770630   
 2015-01-05    1089.667969    23.947659     708.565674     534.510925   
 2015-01-06    1057.216675    23.429758     691.651550     513.663208   
 2015-01-07    1065.498047    24.714153     705.548523     521.808777 

## Cell 3: Stationarity check — Prices vs Spread

Cointegration implies non-stationary prices but stationary spread.

In [5]:
def adf_test(series, name):
    stat, pvalue, *_ = adfuller(series.dropna())
    print(f"{name}: ADF = {stat:.4f}, p-value = {pvalue:.6f}")

adf_test(prices['TATASTEEL.NS'], 'TATASTEEL Price')
adf_test(prices['HINDALCO.NS'], 'HINDALCO Price')
adf_test(spread, 'Spread')

TATASTEEL Price: ADF = -0.6029, p-value = 0.870315
HINDALCO Price: ADF = -0.4836, p-value = 0.895186
Spread: ADF = -4.6887, p-value = 0.000088


## Cell 4: Market covariate check (NIFTY)

We test whether the spread is driven by overall market movements.

In [6]:
nifty = yf.download('^NSEI', start='2015-01-01', end='2025-01-01')['Close']
spread_ret = spread.diff().dropna()
nifty_ret = nifty.pct_change().dropna()

df = pd.concat([spread_ret, nifty_ret], axis=1).dropna()
df.columns = ['spread_ret', 'nifty_ret']

X = sm.add_constant(df['nifty_ret'])
market_model = sm.OLS(df['spread_ret'], X).fit()
market_model.summary()

  nifty = yf.download('^NSEI', start='2015-01-01', end='2025-01-01')['Close']
[*********************100%***********************]  1 of 1 completed


0,1,2,3
Dep. Variable:,spread_ret,R-squared:,0.002
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,5.838
Date:,"Fri, 16 Jan 2026",Prob (F-statistic):,0.0158
Time:,07:43:59,Log-Likelihood:,-4252.1
No. Observations:,2458,AIC:,8508.0
Df Residuals:,2456,BIC:,8520.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0011,0.028,0.039,0.969,-0.053,0.055
nifty_ret,-6.3518,2.629,-2.416,0.016,-11.507,-1.197

0,1,2,3
Omnibus:,930.545,Durbin-Watson:,2.026
Prob(Omnibus):,0.0,Jarque-Bera (JB):,36325.171
Skew:,1.09,Prob(JB):,0.0
Kurtosis:,21.706,Cond. No.,95.5


## Cell 5: Stationarity of market-adjusted residual

In [7]:
residuals = market_model.resid
adf_test(residuals, 'Market-adjusted spread residual')

Market-adjusted spread residual: ADF = -9.9631, p-value = 0.000000


## Cell 6: ARCH test on spread returns

This determines whether GARCH modeling is justified.

In [8]:
arch_stat, arch_pvalue, _, _ = het_arch(spread_ret)
print(f'ARCH test p-value: {arch_pvalue:.6f}')

ARCH test p-value: 0.000000


## Final Interpretation Guide

- Prices non-stationary + spread stationary → valid cointegration
- Market-adjusted residual stationary → not purely market-driven
- ARCH p-value < 0.05 → volatility clustering → GARCH justified
- ARCH p-value ≥ 0.05 → skip GARCH and proceed to ML