GOLD TIER SELECTION

This section applies stricit quality filters to select only the most tradable pairs

IMPUT: 02_final_tradable_pairs.csv — pairs with calculated tradability metrics

Filter criterias:
1.Hurst Exponent < 0.50: Confirms mean-reverting behavior (not random walk or trending)

2.Half-Life < 30 days: Spread reverts quickly enough for practical trading

3.Zero Crossings > 10: Spread crosses the mean frequently, providing trade opportunities

OUTPUT: 03_gold_tier_pairs.csv in /data/processed/ — the final filtered list of high-quality pairs read for model training

In [52]:
import pandas as pd

# Load the ranked results for tradable paires
df = pd.read_csv('../data/processed/02_final_tradable_pairs.csv')

# Filter for the best pairs
gold_tier = df[
    (df['Hurst_Exponent'] < 0.50) & 
    (df['Half_Life'] < 30) & 
    (df['Zero_Crossings'] > 10)
].sort_values('Hurst_Exponent')

gold_tier.to_csv('../data/processed/03_gold_tier_pairs.csv', index=False)
print(f"Filtered pairs down to {len(gold_tier)} Gold Tier pairs.")

Filtered pairs down to 5 Gold Tier pairs.


QUALITY CHECK
consule output only, no files saved, just a diagnosis for manual review before commiting to model training

In [53]:
#gold pair quality check

import pandas as pd

# Check your gold tier pairs
gold_pairs = pd.read_csv('../data/processed/03_gold_tier_pairs.csv')

print("GOLD TIER PAIRS QUALITY CHECK")
print("="*70)
print(gold_pairs[['Stock1', 'Stock2', 'P-Value', 'Hurst_Exponent', 'Half_Life', 'Zero_Crossings']])

print("\n Statistics:")
print(f"P-Values (should be < 0.10): {gold_pairs['P-Value'].values}")
print(f"Hurst (should be < 0.5): {gold_pairs['Hurst_Exponent'].values}")
print(f"Half-Life (should be < 30): {gold_pairs['Half_Life'].values}")
print(f"Zero Crossings (should be > 10): {gold_pairs['Zero_Crossings'].values}")

print("\n  Warning signs:")
for idx, row in gold_pairs.iterrows():
    if row['P-Value'] > 0.08:
        print(f"    {row['Stock1']}-{row['Stock2']}: P-value = {row['P-Value']:.4f} (borderline)")
    if row['Hurst_Exponent'] > 0.45:
        print(f"    {row['Stock1']}-{row['Stock2']}: Hurst = {row['Hurst_Exponent']:.4f} (close to random walk)")
    if row['Half_Life'] > 20:
        print(f"    {row['Stock1']}-{row['Stock2']}: Half-Life = {row['Half_Life']:.2f} (slow reversion)")

GOLD TIER PAIRS QUALITY CHECK
  Stock1 Stock2  P-Value  Hurst_Exponent  Half_Life  Zero_Crossings
0    CMS    DUK  0.03037          0.2871      19.04              83
1    AME    ITW  0.00124          0.3139      25.82             108
2   FITB    PNC  0.04318          0.3182      27.60              70
3    AIG     CB  0.06507          0.3454      24.36              66
4     MS    STT  0.00099          0.3526      29.01              69

 Statistics:
P-Values (should be < 0.10): [0.03037 0.00124 0.04318 0.06507 0.00099]
Hurst (should be < 0.5): [0.2871 0.3139 0.3182 0.3454 0.3526]
Half-Life (should be < 30): [19.04 25.82 27.6  24.36 29.01]
Zero Crossings (should be > 10): [ 83 108  70  66  69]

    AME-ITW: Half-Life = 25.82 (slow reversion)
    FITB-PNC: Half-Life = 27.60 (slow reversion)
    AIG-CB: Half-Life = 24.36 (slow reversion)
    MS-STT: Half-Life = 29.01 (slow reversion)


In [54]:
#save the gold tier pairs and quality check table to outputs folder as well
import os

# 1. Define the paths
source_path = '../data/processed/03_gold_tier_pairs.csv'
output_dir = '../results'
output_path = os.path.join(output_dir, 'gold_tier_pairs.csv')

# 2. Load the CSV 
df_final = pd.read_csv(source_path)

# 4. Save the table to the output folder
df_final.to_csv(output_path, index=False)

print(f"Table successfully saved to: {output_path}")

Table successfully saved to: ../results\gold_tier_pairs.csv
