# Pair Trading Strategy Research

Armaan Gandhara | agandhara243@gmail.com | armaangandhara.me

09/2025

## Config & Utils

*Purpose* : Centeralized parameters, imports, styles, and small helpers reused across the notebook. This keeps later sections focused on research and backtesting logic, not boilerplate

Whats inside:
- Project config (Config dataclass): dates, universe, paths, risk-free, frequency
- Reproducibility: seed setter
- Plot style: consistent figures 
- Helpers: annualizer factor, returns, rolling z-score, drawdown and risk metrics, alignment utilities
- Lightweight disk cache utility for later data ingest

### Usage Example

### Code

In [None]:
# =======================
# Config & Utils
# =======================


from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import os
import json
import hashlib
import warnings
from typing import Iterable, Tuple, Optional, Dict
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

warnings.filterwarnings("ignore")

# ---------- Config ----------

@dataclass
class Config:
    start:str
    end:str
    tickers:Iterable[str]
    data_dir: str = "data"
    freq: str = "D"
    trading_days: int = 252
    rf_annual: float = 0.00

    def path(self) -> Path:
        p = Path(self.data_dir)
        p.mkdir(parents=True, exist_ok=True)
        (p/"cache").mkdir(parents=True, exist_ok=True)
        return p
    
# ---------- Repro/Style ----------

def set_seed(seed: int = 42):
    np.random.seed(seed)
    random.seed(seed)

def set_plot_style():
    plt.rcParams.update({
        "figure.figsize": (10,5),
        "axes.grid":True,
        "grid.alpha": 0.3,
        "font.size": 11,
        "axes.spines.top": False,
        "axes.spines.right": False,
    })

# ---------- Frequencies/Annualization ----------

_ANNUALIZE = {
    "D": 252,
    "B": 252,
    "W": 52,
    "M": 12,
}

def annualization_factor(freq:str):
    return _ANNUALIZE.get(freq.upper(),252)

# ---------- Returns & Z-Score ----------

def compute_returns(prices:pd.DataFrame, method:str="log"):
    """
    Compute log or simple returns from price levels
    """
    if method not in {"log", "simple"}:
        raise ValueError("method must be 'log' or 'simple'")
    px = prices.sort_index()
    if method == "log":
        rets = np.log(px).diff()
    else:
        rets   = px.pct_change()
    return rets.replace([np.inf, -np.inf], np.nan)

def zscore_rolling(x: pd.Series, window:int):
    mu = x.rolling(window).mean()
    sigma = x.rolling(window).std(ddof=0)
    z = (x - mu) / sigma
    return z

# ---------- Drawdowns & Risk Metrics ----------

# ---------- Alignment/Cleaning ----------

# ---------- Lightweight Disk Cache ----------




## Data Ingest

## Pair Selection

## Hedge Ratio & Spread

## OU Check

## Signals & Sizing

## Cost and Execution

## Walk-Forard Backtest

## Results

## Factor Neautrality

## Sensitivity Sweeps

## Regime Splits

## OOS Holdout

## Beta Stability

## Structural Breaks

## Cost Stress Test

## ML Ranker for Pairs

## Intraday Extensions