# **Statistical Arbitrage - Pairs Trading**

## **Theory**

### **From Arbitrage to Statistical Arbitrage**

At its core, **arbitrage** means exploiting price discrepancies between identical or similar assets to earn a *risk-free* profit.
In reality, truly risk-free opportunities are rare and short-lived. So, traders began extending the idea probabilistically: instead of guaranteed profits, they seek **expected** profits with **controlled risk** - this is **statistical arbitrage (stat-arb)**.

**Statistical arbitrage** is a quantitative trading strategy that uses statistical models to identify mispricings between related securities. It’s based not on perfect replication (as in classical arbitrage) but on the statistical expectation that price relationships will revert to some equilibrium over time. 
The key ingredients are:

* **A model** for the “fair” relationship between asset prices.
* **A deviation measure** (how far the actual prices stray from that model).
* **A trading rule** to exploit mean reversion.

The simplest and most famous form of stat-arb is **pairs trading**.

---

### **Pairs Trading: The Intuition**

In **pairs trading**, we pick two assets (say, two stocks in the same sector) whose prices tend to move together - perhaps Shell and BP.

1. We monitor their *relative price* - for example, the price difference or a linear combination $S_t = P_{1,t} - \beta P_{2,t}$.
2. When this spread deviates “too far” from its historical average, we bet on *convergence*:

   * Go long the underperformer (buy it).
   * Go short the outperformer (sell it).
3. When the spread returns to normal, we close both positions, ideally making a profit.

This relies on **mean reversion** - the assumption that relative prices don’t drift apart indefinitely but fluctuate around some equilibrium relationship.

But how can we **verify** whether two price series are genuinely tied together in the long run?
That’s where **cointegration** comes in.

---

### **Cointegration: The Statistical Foundation**

Most financial time series, like stock prices, are **non-stationary** - they follow something like a random walk.
That means they can drift arbitrarily far from their initial value, and standard correlation is unreliable because the level of the series doesn’t have a stable mean or variance.

However, sometimes two non-stationary series move *together* in such a way that a particular linear combination is **stationary** (mean-reverting).
Formally, if:
$$
X_t \sim I(1), \quad Y_t \sim I(1),
$$
but there exists some ($\beta$) such that:
$$
Z_t = Y_t - \beta X_t \sim I(0),
$$
then ($X_t$) and ($Y_t$) are **cointegrated** with cointegrating vector $(1, -\beta)$.

In words: although each price series wanders, their long-run equilibrium relation keeps them tethered.
The deviation $Z_t$ is the “spread” - the variable we expect to revert to its mean. This is the signal that drives a pairs trade.

---

### **Testing and Modeling Cointegration**

Several tools exist to detect and exploit cointegration:

* **Engle–Granger two-step test**:

  1. Regress $Y_t$ on $X_t$ to estimate $\hat{\beta}$.
  2. Test residuals $Z_t = Y_t - \hat{\beta} X_t$ for stationarity using, e.g., the Augmented Dickey-Fuller (ADF) test.

* **Johansen test**:
  For multiple assets, it finds the rank and cointegrating relationships within a vector error-correction model (VECM).

If cointegration holds, we can model the short-run deviations using:
$$
\Delta Y_t = \alpha (Y_{t-1} - \beta X_{t-1}) + \varepsilon_t,
$$
where $\alpha$ measures the **speed of adjustment** back to equilibrium.

---

### **From Cointegration to Trading**

Once we have a stationary spread $Z_t = Y_t - \beta X_t$, we treat it as a *mean-reverting process*, often modeled as:
$$
dZ_t = \kappa (\mu - Z_t)dt + \sigma dW_t,
$$
where:

* $\kappa$: mean-reversion speed,
* $\mu$: long-run mean (usually zero),
* $\sigma$: volatility of the spread.

The trading rule is straightforward:

* Go **long** when $Z_t$ is below its mean (expect it to rise).
* Go **short** when $Z_t$ is above its mean (expect it to fall).
* Close the position when $Z_t$ reverts.

Profit comes not from predicting direction, but from betting on *relative convergence*.

## **Data**