# 🧠 Project: Statistical Arbitrage Benchmarking Using Pair Trading (SBI & IDFC First Bank)

This project benchmarks a basic statistical arbitrage strategy between SBI and IDFC First Bank by analyzing spread deviations and applying Z-score thresholds to identify trading signals. We evaluate the effectiveness using Sharpe Ratio, CAGR, and Max Drawdown over a 5-year historical period.

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Step 1: Load 5-Year Historical Data

In [None]:
tickers = ['SBIN.NS', 'IDFCFIRSTB.NS']
data = yf.download(tickers, start="2020-01-01", end="2025-01-01")["Adj Close"].dropna()
data.columns = ['SBI', 'IDFC']
data.head()

## Step 2: Linear Regression to Model Price Relationship

In [None]:
X = sm.add_constant(data['IDFC'])
model = sm.OLS(data['SBI'], X).fit()
data['Predicted_SBI'] = model.predict(X)
data['Spread'] = data['SBI'] - data['Predicted_SBI']
data['Z_Score'] = (data['Spread'] - data['Spread'].mean()) / data['Spread'].std()
data[['SBI', 'IDFC', 'Predicted_SBI', 'Spread', 'Z_Score']].head()

## Step 3: Identify Signal Dates Based on Z-Score Thresholds

In [None]:
signal_dates = data[(data['Z_Score'] > 1) | (data['Z_Score'] < -1)].copy()
signal_dates['Next_Spread'] = data['Spread'].shift(-5)  # Holding for 5 trading days
signal_dates['PnL'] = -(signal_dates['Spread'] - signal_dates['Next_Spread'])
signal_dates.dropna(inplace=True)
signal_dates[['Z_Score', 'PnL']].head()

## Step 4: Evaluate Backtest Metrics

In [None]:
returns = signal_dates['PnL']
sharpe_ratio = returns.mean() / returns.std() * np.sqrt(252)
cumulative_returns = (1 + returns).cumprod()
cagr = cumulative_returns.iloc[-1] ** (1 / 5) - 1
max_drawdown = (cumulative_returns / cumulative_returns.cummax() - 1).min()

print(f"Sharpe Ratio: {sharpe_ratio:.2f}")
print(f"CAGR: {cagr:.2%}")
print(f"Max Drawdown: {max_drawdown:.2%}")