# GGR Distance Method - Pair Trading Backtester

This notebook implements the **Gatev, Goetzmann, and Rouwenhorst (GGR) Distance Method** for statistical pair trading. The GGR method is one of the foundational approaches to pairs trading, first published in their 2006 paper "Pairs Trading: Performance of a Relative-Value Arbitrage Rule."

## The GGR Distance Method

The strategy operates in two phases:

### 1. Formation Period (12 months)
- Normalize price series (divide by initial price)
- Calculate **Sum of Squared Differences (SSD)** between all pairs
- Select pairs with the **lowest SSD** (most similar historical behavior)
- Calculate **static σ** (standard deviation) of the spread for each pair

### 2. Trading Period (6 months)
- Calculate **spread** between paired stocks (normalized)
- Compute **distance** from parity using the **static formation σ**
- **Entry**: Open position when |distance| > 2σ (spread diverged significantly)
- **Exit**: Close when spread **crosses zero** (prices converge/cross)

**Key GGR Rules:**
- σ is calculated **once** during formation and remains **fixed** during trading
- Exit occurs when normalized prices **cross** (spread = 0), not at an arbitrary threshold
- This differs from Bollinger-style rolling Z-score approaches

---

## 1. Setup & Configuration

In [1]:
# Standard imports
import sys
import warnings
from datetime import datetime
from pathlib import Path

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Import our modules
from src.data import fetch_or_load, get_close_prices, get_open_prices
from src.pairs import normalize_prices, calculate_ssd_matrix, select_top_pairs, rank_all_pairs
from src.signals import (
    calculate_spread,
    calculate_formation_stats,
    calculate_distance,
    generate_signals_ggr,
)
from src.backtest import run_backtest, BacktestConfig, combine_results
from src.analysis import (
    calculate_metrics, print_metrics, trades_to_dataframe,
    plot_equity_curve, plot_trade, plot_ssd_heatmap,
    plot_pair_prices,
)

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

print("Modules loaded successfully!")

Modules loaded successfully!


In [None]:
# Configuration
CONFIG = {
    # Universe
    "symbols": ['DHT', 'FRO', 'ASC', 'ECO', 'NAT', 'TNK', 'INSW', 'TRMD', 'TOPS', 'TORO', 'PSHG'],
    
    # Date range
    "start_date": "2024-01-01",
    "end_date": "2026-01-01",
    
    # Formation period (for pair selection AND static σ calculation)
    "formation_days": 252,  # ~1 year
    
    # GGR Trading parameters
    "entry_threshold": 2.0,   # Enter when |distance| > 2σ (from formation)
    "max_holding_days": 126,   # Max days per trade (fallback exit)
    # Note: Exit occurs when spread crosses zero (GGR rule)
    
    # Portfolio
    "top_n_pairs": 15,            # Number of pairs to trade
    "capital_per_trade": 10000,  # $ per pair trade
    "commission": 0.001,         # 0.1% per trade
}

print("Configuration:")
for k, v in CONFIG.items():
    print(f"  {k}: {v}")

Configuration:
  symbols: ['DHT', 'FRO', 'ASC', 'ECO', 'NAT', 'TNK', 'INSW', 'TRMD', 'TOPS', 'TORO', 'PSHG']
  start_date: 2024-01-01
  end_date: 2026-01-01
  formation_days: 252
  entry_threshold: 2.0
  max_holding_days: 126
  top_n_pairs: 5
  capital_per_trade: 10000
  commission: 0.001


## 2. Data Loading

We fetch daily OHLC data from Polygon.io and cache it locally for faster subsequent runs.

In [3]:
# Fetch or load price data
prices = fetch_or_load(
    symbols=CONFIG["symbols"],
    start_date=CONFIG["start_date"],
    end_date=CONFIG["end_date"],
    cache_dir="data"
)

# Extract close and open prices
close_prices = get_close_prices(prices)
open_prices = get_open_prices(prices)

print(f"\nData shape: {close_prices.shape}")
print(f"Date range: {close_prices.index[0].date()} to {close_prices.index[-1].date()}")
print(f"\nSample data (last 5 rows):")
close_prices.tail()

Loaded 11 symbols from cache

Data shape: (501, 11)
Date range: 2024-01-02 to 2025-12-31

Sample data (last 5 rows):


Unnamed: 0_level_0,DHT,FRO,ASC,ECO,NAT,TNK,INSW,TRMD,TOPS,TORO,PSHG
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-12-24 05:00:00,12.11,21.43,10.71,32.68,3.39,53.86,48.01,19.42,4.92,5.56,2.32
2025-12-26 05:00:00,12.22,21.81,10.8,33.48,3.43,54.34,48.54,19.53,5.06,5.63,2.31
2025-12-29 05:00:00,12.3,22.13,10.87,33.58,3.48,54.4,49.15,19.96,4.73,5.49,2.12
2025-12-30 05:00:00,12.14,21.72,10.51,33.36,3.42,53.17,48.36,19.54,4.53,5.3,2.16
2025-12-31 05:00:00,12.21,21.82,10.59,33.84,3.44,53.42,48.55,19.58,4.5105,5.28,2.13


In [4]:
# Data Quality Report - Check for gaps and issues
from src.data import print_data_quality_report, find_data_gaps

print_data_quality_report(close_prices, CONFIG["start_date"], CONFIG["end_date"])

# Show any significant gaps
gaps = find_data_gaps(close_prices, max_gap_days=5)
if gaps:
    print("\nWARNING: Significant data gaps detected!")
    print("Consider investigating these gaps before trusting backtest results.")

DATA QUALITY REPORT

Date Range: 2024-01-01 to 2026-01-01
Expected trading days: 524
Actual days in data: 501
Coverage: 95.6%

No significant data gaps found.

No data quality issues found.


## 3. Pair Formation

We use the **formation period** to identify pairs with similar historical price behavior. The **Sum of Squared Differences (SSD)** measures how closely two normalized price series track each other.

$$SSD(A, B) = \sum_{t=1}^{T} (P_A^{norm}(t) - P_B^{norm}(t))^2$$

Lower SSD = More similar historical behavior = Better pair candidate

In [5]:
# Split data into formation and trading periods
formation_end_idx = CONFIG["formation_days"]

formation_prices = close_prices.iloc[:formation_end_idx]
trading_prices = close_prices.iloc[formation_end_idx:]
trading_open_prices = open_prices.iloc[formation_end_idx:]

print(f"Formation period: {formation_prices.index[0].date()} to {formation_prices.index[-1].date()}")
print(f"  Days: {len(formation_prices)}")
print(f"\nTrading period: {trading_prices.index[0].date()} to {trading_prices.index[-1].date()}")
print(f"  Days: {len(trading_prices)}")

Formation period: 2024-01-02 to 2025-01-02
  Days: 252

Trading period: 2025-01-03 to 2025-12-31
  Days: 249


In [6]:
# Normalize prices for the formation period
normalized_formation = normalize_prices(formation_prices)

# Calculate SSD matrix
ssd_matrix = calculate_ssd_matrix(normalized_formation)

print("SSD Matrix (lower = more similar):")
ssd_matrix.round(2)

SSD Matrix (lower = more similar):


Unnamed: 0,DHT,FRO,ASC,ECO,NAT,TNK,INSW,TRMD,TOPS,TORO,PSHG
DHT,0.0,2.32,10.1,1.75,14.82,3.0,1.96,5.03,45.09,28.49,11.66
FRO,2.32,0.0,4.93,1.26,18.14,1.04,0.65,3.11,49.73,32.3,16.93
ASC,10.1,4.93,0.0,8.1,35.79,4.43,5.67,9.82,75.94,54.88,32.7
ECO,1.75,1.26,8.1,0.0,13.2,1.78,1.3,2.46,42.15,27.24,11.91
NAT,14.82,18.14,35.79,13.2,0.0,16.57,15.39,10.22,9.04,5.2,2.65
TNK,3.0,1.04,4.43,1.78,16.57,0.0,0.38,2.81,46.65,30.39,15.51
INSW,1.96,0.65,5.67,1.3,15.39,0.38,0.0,2.65,44.9,28.49,14.28
TRMD,5.03,3.11,9.82,2.46,10.22,2.81,2.65,0.0,35.1,21.48,11.31
TOPS,45.09,49.73,75.94,42.15,9.04,46.65,44.9,35.1,0.0,6.14,15.21
TORO,28.49,32.3,54.88,27.24,5.2,30.39,28.49,21.48,6.14,0.0,12.44


In [7]:
# Visualize SSD matrix as heatmap
fig = plot_ssd_heatmap(ssd_matrix)
fig.show()

In [8]:
# Rank all pairs by SSD
pairs_ranking = rank_all_pairs(normalized_formation)
print(f"All {len(pairs_ranking)} pairs ranked by SSD:")
pairs_ranking.head(10)

All 55 pairs ranked by SSD:


Unnamed: 0,symbol_a,symbol_b,ssd,correlation,spread_mean,spread_std,rank
0,TNK,INSW,0.38206,0.977781,0.001977,0.038964,1
1,FRO,INSW,0.646603,0.956652,0.018228,0.047355,2
2,FRO,TNK,1.037658,0.932095,0.016251,0.062201,3
3,FRO,ECO,1.264212,0.938465,0.042168,0.057022,4
4,ECO,INSW,1.296899,0.896895,-0.02394,0.067761,5
5,DHT,ECO,1.746072,0.90149,0.022494,0.080302,6
6,ECO,TNK,1.77666,0.886539,-0.025917,0.080025,7
7,DHT,INSW,1.964614,0.914877,-0.001446,0.088459,8
8,DHT,FRO,2.321469,0.940793,-0.019674,0.094129,9
9,ECO,TRMD,2.460022,0.867883,0.03738,0.091641,10


In [9]:
# Select top N pairs for trading
top_pairs = select_top_pairs(ssd_matrix, n=CONFIG["top_n_pairs"])

print(f"\nTop {CONFIG['top_n_pairs']} pairs selected for trading:")
for i, pair in enumerate(top_pairs, 1):
    ssd = ssd_matrix.loc[pair[0], pair[1]]
    print(f"  {i}. {pair[0]}/{pair[1]} - SSD: {ssd:.4f}")


Top 5 pairs selected for trading:
  1. TNK/INSW - SSD: 0.3821
  2. FRO/INSW - SSD: 0.6466
  3. FRO/TNK - SSD: 1.0377
  4. FRO/ECO - SSD: 1.2642
  5. ECO/INSW - SSD: 1.2969


In [10]:
# Visualize top pair's price relationship during formation period
best_pair = top_pairs[0]
fig = plot_pair_prices(formation_prices, best_pair, title=f"{best_pair[0]} vs {best_pair[1]} - Formation Period")
fig.show()

## 4. Signal Generation (GGR Methodology)

For each pair, we use the **static σ** calculated during the formation period:

1. **Calculate Spread**: Difference between normalized prices
   $$Spread = P_A^{norm} - P_B^{norm}$$

2. **Calculate Distance**: Using the **fixed** formation period σ
   $$Distance = \frac{Spread}{\sigma_{formation}}$$

**Trading signals (GGR rules):**
- **Long spread** (buy A, sell B): when Distance < -2 (spread too low)
- **Short spread** (sell A, buy B): when Distance > 2 (spread too high)
- **Exit**: when spread **crosses zero** (prices converge/cross)

**Key differences from Bollinger-style approaches:**
- σ is calculated **once** during formation, not rolling
- Exit on **spread crossing zero**, not at an arbitrary threshold like |Z| < 0.5

In [11]:
# Example: Signal generation for the top pair using GGR methodology
sym_a, sym_b = top_pairs[0]

# Calculate formation period spread and statistics (STATIC σ)
formation_spread = calculate_spread(formation_prices[sym_a], formation_prices[sym_b], normalize=True)
formation_stats = calculate_formation_stats(formation_spread)
formation_std = formation_stats['std']

# Calculate trading period spread and distance (using FIXED formation σ)
trading_spread = calculate_spread(trading_prices[sym_a], trading_prices[sym_b], normalize=True)
distance = calculate_distance(trading_spread, formation_std)

print(f"Pair: {sym_a}/{sym_b}")
print(f"\nFormation period statistics (FIXED for trading):")
print(f"  σ (formation): {formation_std:.6f}")
print(f"  Mean (formation): {formation_stats['mean']:.6f}")

print(f"\nTrading period spread:")
print(f"  Min:  {trading_spread.min():.4f}")
print(f"  Max:  {trading_spread.max():.4f}")

print(f"\nDistance statistics (spread / σ_formation):")
print(f"  Min:  {distance.min():.2f}σ")
print(f"  Max:  {distance.max():.2f}σ")
print(f"  Times |Distance| > 2σ: {(abs(distance) > 2).sum()}")

Pair: TNK/INSW

Formation period statistics (FIXED for trading):
  σ (formation): 0.038964
  Mean (formation): 0.001977

Trading period spread:
  Min:  -0.0989
  Max:  0.1303

Distance statistics (spread / σ_formation):
  Min:  -2.54σ
  Max:  3.34σ
  Times |Distance| > 2σ: 29


In [12]:
# Visualize distance with entry threshold (GGR methodology)
fig = go.Figure()

# Distance series
fig.add_trace(go.Scatter(
    x=distance.index,
    y=distance.values,
    mode='lines',
    name='Distance (σ)',
    line=dict(color='blue', width=1.5)
))

# Entry thresholds (±2σ)
fig.add_hline(y=CONFIG["entry_threshold"], line_dash="dash", line_color="red", 
              annotation_text=f"+{CONFIG['entry_threshold']}σ (Short Entry)")
fig.add_hline(y=-CONFIG["entry_threshold"], line_dash="dash", line_color="green",
              annotation_text=f"-{CONFIG['entry_threshold']}σ (Long Entry)")

# Zero line (exit trigger)
fig.add_hline(y=0, line_dash="solid", line_color="gray", line_width=2,
              annotation_text="0 (Exit on crossing)")

fig.update_layout(
    title=f"{sym_a}/{sym_b} Distance Series (GGR: Static σ from Formation)",
    xaxis_title="Date",
    yaxis_title="Distance (σ units)",
    height=400,
    showlegend=True,
)
fig.show()

In [13]:
# Generate trading signals (GGR methodology)
signals = generate_signals_ggr(
    trading_spread,
    formation_std,
    entry_threshold=CONFIG["entry_threshold"],
)

long_entries = (signals == 1).sum()
short_entries = (signals == -1).sum()

print(f"Signals for {sym_a}/{sym_b} (GGR methodology):")
print(f"  Long spread entries (Distance < -2σ): {long_entries}")
print(f"  Short spread entries (Distance > 2σ): {short_entries}")
print(f"  Total entry signals: {long_entries + short_entries}")
print(f"\nNote: Exits occur when spread crosses zero (not at a threshold)")

Signals for TNK/INSW (GGR methodology):
  Long spread entries (Distance < -2σ): 2
  Short spread entries (Distance > 2σ): 2
  Total entry signals: 4

Note: Exits occur when spread crosses zero (not at a threshold)


## 5. Backtesting

We now run the backtest for all selected pairs. Key implementation details:
- **Trade execution**: At the OPEN of the day AFTER the signal (prevents lookahead bias)
- **Position sizing**: Equal dollar allocation to each leg ($5000 per side)
- **Commission**: 0.1% on entry and exit

In [14]:
# Create backtest configuration (GGR methodology)
backtest_config = BacktestConfig(
    entry_threshold=CONFIG["entry_threshold"],
    max_holding_days=CONFIG["max_holding_days"],
    capital_per_trade=CONFIG["capital_per_trade"],
    commission=CONFIG["commission"],
)

print("Backtest Configuration (GGR Methodology):")
print(f"  Entry threshold: {backtest_config.entry_threshold}σ (from formation period)")
print(f"  Exit rule: Spread crosses zero (GGR paper)")
print(f"  Max holding: {backtest_config.max_holding_days} days (fallback)")
print(f"  Capital per trade: ${backtest_config.capital_per_trade:,}")
print(f"  Commission: {backtest_config.commission:.2%}")

Backtest Configuration (GGR Methodology):
  Entry threshold: 2.0σ (from formation period)
  Exit rule: Spread crosses zero (GGR paper)
  Max holding: 126 days (fallback)
  Capital per trade: $10,000
  Commission: 0.10%


In [15]:
# Run backtest for all selected pairs (GGR methodology)
# Note: formation_prices is used for calculating static σ
results = run_backtest(
    formation_close=formation_prices,
    trading_close=trading_prices,
    trading_open=trading_open_prices,
    pairs=top_pairs,
    config=backtest_config,
)

print(f"Backtest complete for {len(results)} pairs.")
for pair, result in results.items():
    n_trades = len(result.trades)
    total_pnl = sum(t.pnl for t in result.trades)
    print(f"  {pair[0]}/{pair[1]}: {n_trades} trades, P&L: ${total_pnl:,.2f}")

Backtest complete for 5 pairs.
  TNK/INSW: 4 trades, P&L: $1,624.64
  FRO/INSW: 2 trades, P&L: $-85.90
  FRO/TNK: 2 trades, P&L: $-105.38
  FRO/ECO: 4 trades, P&L: $935.07
  ECO/INSW: 2 trades, P&L: $511.89


In [16]:
# Combine results from all pairs
initial_capital = CONFIG["capital_per_trade"] * CONFIG["top_n_pairs"]
all_trades, combined_equity = combine_results(results, initial_capital)

print(f"Total trades across all pairs: {len(all_trades)}")
print(f"Initial capital: ${initial_capital:,}")
print(f"Final equity: ${combined_equity.iloc[-1]:,.2f}")

Total trades across all pairs: 14
Initial capital: $50,000
Final equity: $52,880.32


## 6. Results & Analysis

In [17]:
# Calculate and display performance metrics
metrics = calculate_metrics(all_trades, combined_equity)
print_metrics(metrics)

BACKTEST RESULTS
Total Trades:     14
Total Return:     $2,880.32 (5.76%)
Sharpe Ratio:     1.18
Max Drawdown:     $-684.65 (-1.32%)
--------------------------------------------------
Win Rate:         64.29%
Avg Win:          $468.32
Avg Loss:         $266.91
Profit Factor:    3.16
Avg Holding Days: 63.0
--------------------------------------------------
Long Trades:      3 (66.67% win rate)
Short Trades:     11 (63.64% win rate)


In [18]:
# Plot equity curve with drawdown
fig = plot_equity_curve(combined_equity, title="GGR Distance Strategy - Equity Curve")
fig.show()

In [19]:
# Trade-by-trade analysis
trades_df = trades_to_dataframe(all_trades)
print(f"\nTrade-by-Trade Results ({len(trades_df)} trades):")
trades_df


Trade-by-Trade Results (14 trades):


Unnamed: 0,pair,direction,entry_date,exit_date,entry_price_a,entry_price_b,exit_price_a,exit_price_b,pnl,pnl_pct,holding_days,entry_distance,exit_distance,exit_reason
0,TNK/INSW,Long,2025-02-13 05:00:00,2025-03-03 05:00:00,42.25,40.23,38.0,33.83,283.766104,0.028377,10,-2.215196,0.099202,crossing
1,FRO/ECO,Short,2025-01-15 05:00:00,2025-03-31 04:00:00,18.82,25.22,14.5,21.9,481.313348,0.048131,50,2.077465,-0.323943,crossing
2,ECO/INSW,Short,2025-04-25 04:00:00,2025-06-05 04:00:00,22.72,33.01,22.62,38.0,767.104784,0.07671,27,2.020543,-0.316723,crossing
3,TNK/INSW,Short,2025-04-25 04:00:00,2025-07-03 04:00:00,40.79,33.01,43.51,38.74,523.302426,0.05233,46,2.10039,-0.254885,crossing
4,FRO/INSW,Short,2025-01-14 05:00:00,2025-07-18 04:00:00,17.96,40.99,18.65,39.6,-371.669631,-0.037167,126,2.428486,3.856129,max_holding
5,FRO/TNK,Short,2025-01-16 05:00:00,2025-07-22 04:00:00,18.25,46.8,18.16,43.73,-312.98127,-0.031298,126,2.04435,2.918732,max_holding
6,TNK/INSW,Long,2025-08-07 04:00:00,2025-09-16 04:00:00,45.5,43.98,55.11,49.29,440.700736,0.04407,26,-2.468578,0.365281,crossing
7,FRO/ECO,Short,2025-04-14 04:00:00,2025-10-15 04:00:00,15.39,21.01,22.61,29.4,-363.353106,-0.036335,126,2.060483,3.160101,max_holding
8,FRO/ECO,Short,2025-10-16 04:00:00,2025-11-14 05:00:00,23.08,30.26,24.8,37.72,848.428112,0.084843,20,3.279434,-1.494261,crossing
9,TNK/INSW,Short,2025-10-21 04:00:00,2025-11-25 05:00:00,55.0,45.94,60.14,53.8,376.868185,0.037687,24,2.072967,-0.286863,crossing


In [20]:
# Per-pair breakdown
print("\nPer-Pair Performance:")
print("=" * 70)
for pair, result in results.items():
    pair_metrics = calculate_metrics(result.trades, result.equity_curve)
    pair_name = f"{pair[0]}/{pair[1]}"
    print(f"\n{pair_name}:")
    print(f"  Trades: {pair_metrics['total_trades']}")
    print(f"  Return: ${pair_metrics['total_return']:,.2f} ({pair_metrics['total_return_pct']:.2%})")
    print(f"  Win Rate: {pair_metrics['win_rate']:.2%}")
    print(f"  Sharpe: {pair_metrics['sharpe_ratio']:.2f}")


Per-Pair Performance:

TNK/INSW:
  Trades: 4
  Return: $1,624.64 (15.85%)
  Win Rate: 100.00%
  Sharpe: 1.68

FRO/INSW:
  Trades: 2
  Return: $-85.90 (-1.06%)
  Win Rate: 50.00%
  Sharpe: -0.62

FRO/TNK:
  Trades: 2
  Return: $-105.38 (-1.25%)
  Win Rate: 50.00%
  Sharpe: -0.84

FRO/ECO:
  Trades: 4
  Return: $935.07 (8.95%)
  Win Rate: 50.00%
  Sharpe: 0.70

ECO/INSW:
  Trades: 2
  Return: $511.89 (4.92%)
  Win Rate: 50.00%
  Sharpe: 0.39


## 7. Trade Visualization

Let's visualize individual trades to understand the entry/exit logic.

In [21]:
# Visualize a sample trade
if all_trades:
    # Pick a trade with significant P&L
    sample_trade = max(all_trades, key=lambda t: abs(t.pnl))
    
    # Get formation period stats for this pair
    pair_formation_spread = calculate_spread(
        formation_prices[sample_trade.pair[0]],
        formation_prices[sample_trade.pair[1]],
        normalize=True
    )
    pair_formation_stats = calculate_formation_stats(pair_formation_spread)
    
    # Get the distance series for this pair (using static formation σ)
    pair_trading_spread = calculate_spread(
        trading_prices[sample_trade.pair[0]],
        trading_prices[sample_trade.pair[1]],
        normalize=True
    )
    pair_distance = calculate_distance(pair_trading_spread, pair_formation_stats['std'])
    
    print(f"Visualizing trade: {sample_trade.pair[0]}/{sample_trade.pair[1]}")
    print(f"  Direction: {'Long' if sample_trade.direction == 1 else 'Short'} spread")
    print(f"  Entry: {sample_trade.entry_date.date()} at Distance={sample_trade.entry_distance:.2f}σ")
    print(f"  Exit: {sample_trade.exit_date.date()} at Distance={sample_trade.exit_distance:.2f}σ")
    print(f"  P&L: ${sample_trade.pnl:.2f} ({sample_trade.pnl_pct:.2%})")
    print(f"  Exit reason: {sample_trade.exit_reason}")
    
    fig = plot_trade(trading_prices, sample_trade, pair_distance)
    fig.show()
else:
    print("No trades to visualize.")

Visualizing trade: FRO/ECO
  Direction: Short spread
  Entry: 2025-10-16 at Distance=3.28σ
  Exit: 2025-11-14 at Distance=-1.49σ
  P&L: $848.43 (8.48%)
  Exit reason: crossing


## 8. Verification Section

This section verifies the backtest has no lookahead bias and P&L calculations are correct.

In [22]:
# Verification 1: One Trade Inspection
if all_trades:
    trade = all_trades[0]  # First trade
    
    print("=" * 60)
    print("VERIFICATION: One Trade Inspection")
    print("=" * 60)
    print(f"\nTrade #{1}: {trade.pair[0]}/{trade.pair[1]}")
    print(f"Direction: {'Long spread (buy A, sell B)' if trade.direction == 1 else 'Short spread (sell A, buy B)'}")
    print(f"\nEntry:")
    print(f"  Date: {trade.entry_date.date()}")
    print(f"  {trade.pair[0]} price: ${trade.entry_price_a:.2f}")
    print(f"  {trade.pair[1]} price: ${trade.entry_price_b:.2f}")
    print(f"  Shares {trade.pair[0]}: {trade.shares_a:.4f}")
    print(f"  Shares {trade.pair[1]}: {trade.shares_b:.4f}")
    print(f"  Entry Distance: {trade.entry_distance:.4f}σ")
    print(f"\nExit:")
    print(f"  Date: {trade.exit_date.date()}")
    print(f"  {trade.pair[0]} price: ${trade.exit_price_a:.2f}")
    print(f"  {trade.pair[1]} price: ${trade.exit_price_b:.2f}")
    print(f"  Exit Distance: {trade.exit_distance:.4f}σ")
    print(f"  Exit reason: {trade.exit_reason}")
    print(f"\nP&L Calculation:")
    
    # Manual P&L calculation
    if trade.direction == 1:  # Long spread
        pnl_a = (trade.exit_price_a - trade.entry_price_a) * trade.shares_a
        pnl_b = (trade.entry_price_b - trade.exit_price_b) * trade.shares_b
        print(f"  {trade.pair[0]} P&L: (${trade.exit_price_a:.2f} - ${trade.entry_price_a:.2f}) x {trade.shares_a:.4f} = ${pnl_a:.2f}")
        print(f"  {trade.pair[1]} P&L: (${trade.entry_price_b:.2f} - ${trade.exit_price_b:.2f}) x {trade.shares_b:.4f} = ${pnl_b:.2f}")
    else:  # Short spread
        pnl_a = (trade.entry_price_a - trade.exit_price_a) * trade.shares_a
        pnl_b = (trade.exit_price_b - trade.entry_price_b) * trade.shares_b
        print(f"  {trade.pair[0]} P&L: (${trade.entry_price_a:.2f} - ${trade.exit_price_a:.2f}) x {trade.shares_a:.4f} = ${pnl_a:.2f}")
        print(f"  {trade.pair[1]} P&L: (${trade.exit_price_b:.2f} - ${trade.entry_price_b:.2f}) x {trade.shares_b:.4f} = ${pnl_b:.2f}")
    
    gross_pnl = pnl_a + pnl_b
    print(f"  Gross P&L: ${gross_pnl:.2f}")
    print(f"  Reported P&L (net of commission): ${trade.pnl:.2f}")
    print(f"  Holding days: {trade.holding_days}")
else:
    print("No trades to verify.")

VERIFICATION: One Trade Inspection

Trade #1: TNK/INSW
Direction: Long spread (buy A, sell B)

Entry:
  Date: 2025-02-13
  TNK price: $42.25
  INSW price: $40.23
  Shares TNK: 118.3432
  Shares INSW: 124.2854
  Entry Distance: -2.2152σ

Exit:
  Date: 2025-03-03
  TNK price: $38.00
  INSW price: $33.83
  Exit Distance: 0.0992σ
  Exit reason: crossing

P&L Calculation:
  TNK P&L: ($38.00 - $42.25) x 118.3432 = $-502.96
  INSW P&L: ($40.23 - $33.83) x 124.2854 = $795.43
  Gross P&L: $292.47
  Reported P&L (net of commission): $283.77
  Holding days: 10


In [23]:
# Verification 2: Lookahead Bias Check
if all_trades:
    trade = all_trades[0]
    
    print("=" * 60)
    print("VERIFICATION: Lookahead Bias Check")
    print("=" * 60)
    
    # Get the signal date (day before entry, since we enter on next day open)
    entry_loc = trading_prices.index.get_loc(trade.entry_date)
    if entry_loc > 0:
        signal_date = trading_prices.index[entry_loc - 1]
        
        # Get formation stats for this pair
        form_spread = calculate_spread(
            formation_prices[trade.pair[0]],
            formation_prices[trade.pair[1]],
            normalize=True
        )
        form_stats = calculate_formation_stats(form_spread)
        
        # Recalculate spread and distance up to signal date
        prices_to_signal = trading_prices.loc[:signal_date]
        spread_to_signal = calculate_spread(
            prices_to_signal[trade.pair[0]],
            prices_to_signal[trade.pair[1]],
            normalize=True
        )
        distance_to_signal = calculate_distance(spread_to_signal, form_stats['std'])
        
        print(f"\nSignal date: {signal_date.date()}")
        print(f"Entry date: {trade.entry_date.date()} (next trading day)")
        print(f"\nDistance at signal date: {distance_to_signal.iloc[-1]:.4f}σ")
        print(f"Entry Distance recorded: {trade.entry_distance:.4f}σ")
        print(f"\nThe signal was generated using data available at close on {signal_date.date()},")
        print(f"and the trade was executed at open on {trade.entry_date.date()}.")
        print(f"\n[OK] No lookahead bias - entry is at OPEN of day AFTER signal.")
else:
    print("No trades to verify.")

VERIFICATION: Lookahead Bias Check

Signal date: 2025-02-12
Entry date: 2025-02-13 (next trading day)

Distance at signal date: -2.2152σ
Entry Distance recorded: -2.2152σ

The signal was generated using data available at close on 2025-02-12,
and the trade was executed at open on 2025-02-13.

[OK] No lookahead bias - entry is at OPEN of day AFTER signal.


In [24]:
# Verification 3: Signal Timing Check
if all_trades:
    trade = all_trades[0]
    
    print("=" * 60)
    print("VERIFICATION: Signal Timing Check")
    print("=" * 60)
    
    entry_loc = trading_prices.index.get_loc(trade.entry_date)
    if entry_loc > 0:
        signal_date = trading_prices.index[entry_loc - 1]
        
        # Get prices at different points
        signal_close_a = trading_prices.loc[signal_date, trade.pair[0]]
        signal_close_b = trading_prices.loc[signal_date, trade.pair[1]]
        entry_open_a = trading_open_prices.loc[trade.entry_date, trade.pair[0]]
        entry_open_b = trading_open_prices.loc[trade.entry_date, trade.pair[1]]
        
        print(f"\nSignal generated at CLOSE of {signal_date.date()}:")
        print(f"  {trade.pair[0]} close: ${signal_close_a:.2f}")
        print(f"  {trade.pair[1]} close: ${signal_close_b:.2f}")
        
        print(f"\nTrade executed at OPEN of {trade.entry_date.date()}:")
        print(f"  {trade.pair[0]} open: ${entry_open_a:.2f}")
        print(f"  {trade.pair[1]} open: ${entry_open_b:.2f}")
        
        print(f"\nActual entry prices in trade:")
        print(f"  {trade.pair[0]}: ${trade.entry_price_a:.2f}")
        print(f"  {trade.pair[1]}: ${trade.entry_price_b:.2f}")
        
        # Verify entry prices match open prices
        if abs(trade.entry_price_a - entry_open_a) < 0.01 and abs(trade.entry_price_b - entry_open_b) < 0.01:
            print(f"\n[OK] Entry prices match next-day OPEN prices (realistic execution).")
        else:
            print(f"\n[WARNING] Entry prices don't match open prices - investigate!")
else:
    print("No trades to verify.")

VERIFICATION: Signal Timing Check

Signal generated at CLOSE of 2025-02-12:
  TNK close: $42.00
  INSW close: $40.30

Trade executed at OPEN of 2025-02-13:
  TNK open: $42.25
  INSW open: $40.23

Actual entry prices in trade:
  TNK: $42.25
  INSW: $40.23

[OK] Entry prices match next-day OPEN prices (realistic execution).


## 9. Summary

This notebook demonstrated the complete **GGR Distance Method** for pair trading:

1. **Data Loading**: Fetched OHLC data from Polygon.io
2. **Pair Formation**: Used SSD to identify most similar pairs + calculated static σ
3. **Signal Generation**: GGR distance-based entry/exit signals
   - Entry: |distance| > 2σ (using fixed formation σ)
   - Exit: spread crosses zero (prices converge)
4. **Backtesting**: Realistic execution at next-day open prices
5. **Analysis**: Performance metrics and visualizations
6. **Verification**: Confirmed no lookahead bias

### Key GGR Methodology Points
- **Static σ**: Standard deviation is calculated ONCE during formation period and remains fixed
- **Crossing-zero exit**: Positions exit when normalized prices cross (spread = 0)
- **No rolling adaptation**: Unlike Bollinger-style approaches, σ doesn't adapt to new volatility
- Results vary significantly by pair and market conditions

In [25]:
# Final summary
print("\n" + "=" * 60)
print("BACKTEST SUMMARY")
print("=" * 60)
print(f"\nUniverse: {', '.join(CONFIG['symbols'])}")
print(f"Period: {CONFIG['start_date']} to {CONFIG['end_date']}")
print(f"Formation: {CONFIG['formation_days']} days")
print(f"\nPairs Traded: {CONFIG['top_n_pairs']}")
print(f"Total Trades: {metrics['total_trades']}")
print(f"\nPerformance:")
print(f"  Total Return: {metrics['total_return_pct']:.2%}")
print(f"  Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
print(f"  Max Drawdown: {metrics['max_drawdown_pct']:.2%}")
print(f"  Win Rate: {metrics['win_rate']:.2%}")
print("\n" + "=" * 60)


BACKTEST SUMMARY

Universe: DHT, FRO, ASC, ECO, NAT, TNK, INSW, TRMD, TOPS, TORO, PSHG
Period: 2024-01-01 to 2026-01-01
Formation: 252 days

Pairs Traded: 5
Total Trades: 14

Performance:
  Total Return: 5.76%
  Sharpe Ratio: 1.18
  Max Drawdown: -1.32%
  Win Rate: 64.29%



In [26]:
# =============================================================================
# PAIR-BY-PAIR ANALYSIS (GGR Methodology)
# =============================================================================
# Generate detailed analysis for each pair using static σ from formation period

from src.analysis import generate_pair_report, print_pair_report, plot_pair_analysis

# Analyze each pair
pair_reports = {}

for pair, result in results.items():
    # Calculate formation period spread and stats (STATIC σ)
    form_spread = calculate_spread(
        formation_prices[pair[0]],
        formation_prices[pair[1]],
        normalize=True
    )
    form_stats = calculate_formation_stats(form_spread)
    
    # Calculate trading period spread and distance (using FIXED formation σ)
    trading_spread_pair = calculate_spread(
        trading_prices[pair[0]],
        trading_prices[pair[1]],
        normalize=True
    )
    pair_distance = calculate_distance(trading_spread_pair, form_stats['std'])

    # Generate report
    report = generate_pair_report(
        close_prices=trading_prices,
        pair=pair,
        trades=result.trades,
        distance=pair_distance,
        config=CONFIG,
    )
    pair_reports[pair] = report

    # Print summary and show chart
    print_pair_report(report)
    report['figure'].show()
    print("\n")

PAIR ANALYSIS: TNK/INSW

Total Trades:      4
Total P&L:         $1,624.64
Win Rate:          100.0%
Avg Win:           $406.16
Avg Loss:          $0.00
Avg Holding Days:  26.5
Long Trades:       2
Short Trades:      2

Trades:
----------------------------------------------------------------------
 #      Entry       Exit Direction  Days Entry σ Exit σ     P&L Return Exit Reason
 1 2025-02-13 2025-03-03      Long    10   -2.22   0.10 $283.77  2.84%    crossing
 2 2025-04-25 2025-07-03     Short    46    2.10  -0.25 $523.30  5.23%    crossing
 3 2025-08-07 2025-09-16      Long    26   -2.47   0.37 $440.70  4.41%    crossing
 4 2025-10-21 2025-11-25     Short    24    2.07  -0.29 $376.87  3.77%    crossing




PAIR ANALYSIS: FRO/INSW

Total Trades:      2
Total P&L:         $-85.90
Win Rate:          50.0%
Avg Win:           $285.77
Avg Loss:          $-371.67
Avg Holding Days:  120.0
Long Trades:       0
Short Trades:      2

Trades:
----------------------------------------------------------------------
 #      Entry       Exit Direction  Days Entry σ Exit σ      P&L Return Exit Reason
 1 2025-01-14 2025-07-18     Short   126    2.43   3.86 $-371.67 -3.72% max_holding
 2 2025-07-21 2025-12-31     Short   114    3.97   3.37  $285.77  2.86% end_of_data




PAIR ANALYSIS: FRO/TNK

Total Trades:      2
Total P&L:         $-105.38
Win Rate:          50.0%
Avg Win:           $207.60
Avg Loss:          $-312.98
Avg Holding Days:  119.0
Long Trades:       0
Short Trades:      2

Trades:
----------------------------------------------------------------------
 #      Entry       Exit Direction  Days Entry σ Exit σ      P&L Return Exit Reason
 1 2025-01-16 2025-07-22     Short   126    2.04   2.92 $-312.98 -3.13% max_holding
 2 2025-07-23 2025-12-31     Short   112    3.10   3.10  $207.60  2.08% end_of_data




PAIR ANALYSIS: FRO/ECO

Total Trades:      4
Total P&L:         $935.07
Win Rate:          50.0%
Avg Win:           $664.87
Avg Loss:          $-197.34
Avg Holding Days:  54.0
Long Trades:       1
Short Trades:      3

Trades:
----------------------------------------------------------------------
 #      Entry       Exit Direction  Days Entry σ Exit σ      P&L Return Exit Reason
 1 2025-01-15 2025-03-31     Short    50    2.08  -0.32  $481.31  4.81%    crossing
 2 2025-04-14 2025-10-15     Short   126    2.06   3.16 $-363.35 -3.63% max_holding
 3 2025-10-16 2025-11-14     Short    20    3.28  -1.49  $848.43  8.48%    crossing
 4 2025-12-02 2025-12-31      Long    20   -2.15  -1.19  $-31.32 -0.31% end_of_data




PAIR ANALYSIS: ECO/INSW

Total Trades:      2
Total P&L:         $511.89
Win Rate:          50.0%
Avg Win:           $767.10
Avg Loss:          $-255.21
Avg Holding Days:  41.0
Long Trades:       0
Short Trades:      2

Trades:
----------------------------------------------------------------------
 #      Entry       Exit Direction  Days Entry σ Exit σ      P&L Return Exit Reason
 1 2025-04-25 2025-06-05     Short    27    2.02  -0.32  $767.10  7.67%    crossing
 2 2025-10-13 2025-12-31     Short    55    2.01   3.36 $-255.21 -2.55% end_of_data




