# Pairs Trading Strategy

Pairs trading is a classic example of a strategy based on mathematical analysis. The principle is as follows. Let's say you have a pair of securities X and Y that have some underlying economic link. An example might be two companies that manufacture the same product, or two companies in one supply chain. If we can model this economic link with a mathematical model, we can make trades on it. We'll start by constructing a toy example.

Useful link: 
https://github.com/quantopian/research_public/blob/master/notebooks/lectures/Introduction_to_Pairs_Trading/notebook.ipynb
https://github.com/quantopian/research_public/blob/master/notebooks/lectures/Integration_Cointegration_and_Stationarity/notebook.ipynb

## **** Utils Functions ****
#### *Run this cell before the others, do not modify unless for specific needs*

In [9]:
pip install qstrader

Note: you may need to restart the kernel to use updated packages.


In [15]:
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import coint
import statsmodels
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import coint, adfuller
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from datetime import datetime, timedelta
import itertools



## **** Asset Pairs and Data ****
To conduct our analysis, we are using minute-by-minute price data from January 1, 2024, to May 31, 2024.

In [11]:
pairs = [
    ('USDJPY=X', '^N225'),  # USD/JPY and Nikkei 225
    ('USDCAD=X', 'CL=F'),   # USD/CAD and Crude Oil Futures
    ('CL=F', 'USO'),        # Crude Oil Futures and United States Oil Fund
    ('GC=F', 'SI=F'),       # Gold Futures and Silver Futures
    ('AUDJPY=X', '^GSPC'),  # AUD/JPY and S&P 500
    ('USDCHF=X', 'GC=F'),   # USD/CHF and Gold Futures
    ('C', 'GS'),            # Citigroup and Goldman Sachs
    ('AAPL', 'META')        # Apple and Meta Platforms (formerly Facebook)
]

# Define the start and end dates for downloading data
start_date = '2024-01-01'
end_date = '2024-06-01'

## **** Choosing Cointegrated Assets with ADF and CADF Tests ****
Before implementing the strategy, it is crucial to select assets that exhibit
cointegration with one another. To achieve this, we utilize two models: the
Augmented Dickey-Fuller (ADF) test and the Cointegrated Augmented Dickey-
Fuller (CADF) test, as detailed below:

### ADF test:

The Augmented Dickey-Fuller (ADF) test is a statistical tool used to assess the stationarity of a time series. Stationarity is a key concept in time series analysis, indicating whether the statistical properties of the series, such as mean and variance, remain constant over time. Non-stationary series can display trends or seasonal patterns, potentially influencing the results of various time series models and analyses. We are conducting the ADF test on each asset to evaluate its stationarity, which is crucial for further analysis, including cointegration testing and the development of robust trading strategies. In this section, we will verify if each asset is non-stationary individually, as the concept of cointegration applies only to non-stationary time series.

In [12]:
# Function to perform the ADF test and print the results
def adf_test(series, name):
    series = series.dropna()
    result = adfuller(series)
    test_statistic = f'{result[0]:.3f}'
    p_value = f'{result[1]:.6f}'  # Display p-value with higher precision
    critical_values_str = ', '.join([f'{key}: {value:.3f}' for key, value in result[4].items()])

    # Determine if the series is stationary
    status = 'stationary' if result[1] < 0.05 else 'non-stationary'

    # Print the result
    print(f'{name} Test Statistic: {test_statistic}, p-value: {p_value}. Critical Values: {critical_values_str}. The series is likely {status}.')

# Extract unique tickers from pairs
unique_tickers = set(ticker for pair in pairs for ticker in pair)

# Loop through each ticker and perform the ADF test
for ticker in unique_tickers:
    data = yf.download(ticker, start=start_date, end=end_date)['Adj Close']
    adf_test(data, ticker)

[*********************100%***********************]  1 of 1 completed
^GSPC Test Statistic: -1.986, p-value: 0.292718. Critical Values: 1%: -3.496, 5%: -2.890, 10%: -2.582. The series is likely non-stationary.
[*********************100%***********************]  1 of 1 completed
USDJPY=X Test Statistic: -2.043, p-value: 0.268162. Critical Values: 1%: -3.492, 5%: -2.888, 10%: -2.581. The series is likely non-stationary.
[*********************100%***********************]  1 of 1 completed
AUDJPY=X Test Statistic: 1.303, p-value: 0.996628. Critical Values: 1%: -3.497, 5%: -2.891, 10%: -2.582. The series is likely non-stationary.
[*********************100%***********************]  1 of 1 completed
SI=F Test Statistic: -0.525, p-value: 0.887131. Critical Values: 1%: -3.495, 5%: -2.890, 10%: -2.582. The series is likely non-stationary.
[*********************100%***********************]  1 of 1 completed
USO Test Statistic: -2.128, p-value: 0.233589. Critical Values: 1%: -3.495, 5%: -2.890, 10%

### CADF test:

The Cointegrated Augmented Dickey-Fuller (CADF) test is a statistical method used to determine whether two or more time series are cointegrated, implying a long-term equilibrium relationship despite being non-stationary on their own. In time series analysis, identifying cointegration is crucial because it allows for the modeling of long-term relationships between non-stationary series that share a common stochastic drift. We are conducting the CADF test on each pair of assets to assess their cointegration, which is fundamental for further analysis, such as developing pairs trading strategies. In this section, we will verify if each pair of assets is cointegrated, as cointegration analysis applies specifically to non-stationary time series that move together over time.

In [13]:
# Function to perform the CADF test and print the results
def cadf_test(series1, series2, name1, name2):
    series1 = series1.dropna()
    series2 = series2.dropna()

    # Ensure both series have the same length
    if len(series1) != len(series2):
        min_len = min(len(series1), len(series2))
        series1 = series1[-min_len:]
        series2 = series2[-min_len:]

    score, p_value, critical_values = coint(series1, series2)
    test_statistic = f'{score:.3f}'
    p_value_formatted = f'{p_value:.6f}'

    critical_values_str = ', '.join([f'{key}: {value:.3f}' for key, value in zip(['1%', '5%', '10%'], critical_values)])

    # Print critical values only once at the start
    if name1 == pairs[0][0] and name2 == pairs[0][1]:  # Assuming the first pair to print critical values
        print('======================================================')
        print(f'Critical Values: {critical_values_str}')
        print('======================================================')

    # Determine if the series are cointegrated
    status = 'cointegrated' if p_value < 0.05 else 'not cointegrated'

    # Print the result in one line
    print(f'{name1} & {name2} Test Statistic: {test_statistic}, p-value: {p_value_formatted}. The series are likely {status}.')

# Loop through each pair and perform the CADF test
for name1, name2 in pairs:
    # Download the data for each pair
    data1 = yf.download(name1, start=start_date, end=end_date)['Adj Close']
    data2 = yf.download(name2, start=start_date, end=end_date)['Adj Close']
    cadf_test(data1, data2, name1, name2)


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Critical Values: 1%: -4.010, 5%: -3.399, 10%: -3.088
USDJPY=X & ^N225 Test Statistic: -1.019, p-value: 0.899169. The series are likely not cointegrated.
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
USDCAD=X & CL=F Test Statistic: -0.650, p-value: 0.951786. The series are likely not cointegrated.
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
CL=F & USO Test Statistic: -0.820, p-value: 0.931990. The series are likely not cointegrated.
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
GC=F & SI=F Test Statistic: -1.244, p-value: 0.845952. The series are likely not cointegrated.
[******

Most of the pairs tested do not show significant evidence of cointegration, with the exception of the USD/CHF and Gold Futures pair, which demonstrates a strong long-term equilibrium relationship. This result is important for the development of trading strategies based on cointegration, as only cointegrated pairs offer opportunities to exploit such stable relationships. Therefore, for most of the pairs analyzed, other trading methodologies may be more appropriate.

## **** Optimized Mean Reversion Pairs Trading - BACKTESTING ****
In this section, we will explore a bactesting using optimal timing strategies for trading a mean-reverting price spread. We'll optimize the positions of the previously identified pairs to ensure the intraday portfolio value aligns closely with an Ornstein-Uhlenbeck (OU) process, using maximum likelihood estimation for the best fit.
For our intraday trading strategy, which operates on medium-frequency data to accurately estimate the model's parameters, we did a backtest with the previous 10 days data.


In [7]:
tickers = ["USDCHF", "GC"] # Define the stock symbols for ConocoPhillips and ExxonMobil

In [None]:
import nest_asyncio
import asyncio
from ib_insync import *
import pandas as pd
import numpy as np
from scipy.optimize import minimize
import time

nest_asyncio.apply()

# Connect to IBKR TWS or Gateway
ib = IB()
ib.connect('127.0.0.1', 7497, clientId=2)

def Now():
    """Return the current time as a string."""
    return time.strftime('%Y-%m-%d %H:%M:%S')

def fetch_historical_data(contract, duration='10 D', bar_size='1 min'):
    """Fetch historical data for a given contract."""
    bars = ib.reqHistoricalData(
        contract,
        endDateTime='',
        durationStr=duration,
        barSizeSetting=bar_size,
        whatToShow='MIDPOINT',
        useRTH=True
    )
    if bars:
        df = util.df(bars)
        print("%s Historical data fetched: %s" % (Now(), df.head()))
        return df
    else:
        print("%s No historical data fetched" % Now())
        return None

def estimate_ou_parameters(df):
    """Estimate parameters of the Ornstein-Uhlenbeck process."""
    dt = 1 / (24 * 60)  # 1-minute data
    x = df['close'].values
    n = len(x)

    # Estimate parameters
    def neg_log_likelihood(params):
        mu, theta, sigma = params
        if sigma <= 0 or mu <= 0:
            return np.inf
        likelihoods = -np.log(sigma) - 0.5 * (((x[1:] - x[:-1] * np.exp(-mu*dt) - theta * (1 - np.exp(-mu*dt))) / sigma) ** 2)
        return -np.sum(likelihoods)

    initial_params = [0.1, np.mean(x), np.std(x)]
    bounds = ((1e-5, None), (None, None), (1e-5, None))  # Avoid zero or negative values
    result = minimize(neg_log_likelihood, initial_params, bounds=bounds)
    
    if not result.success:
        raise ValueError("Parameter estimation failed")

    mu, theta, sigma = result.x
    print("%s Estimated parameters - mu: %s, theta: %s, sigma: %s" % (Now(), mu, theta, sigma))
    return mu, theta, sigma

def calculate_entry_exit_levels(mu, theta, sigma, r, c, stop_loss):
    """Calculate optimal entry and exit levels based on OU parameters and constraints."""
    def value_function_exit(x):
        return (x - c) * np.exp(-r * (1 / mu) * np.log(x / theta))

    def value_function_entry(x):
        return value_function_exit(x) - x - c

    b_star = minimize(lambda x: -value_function_exit(x), x0=theta, bounds=((stop_loss, None),)).x[0]
    a_star = minimize(lambda x: -value_function_entry(x), x0=theta, bounds=((stop_loss, b_star),)).x[0]
    print("%s Calculated levels - a_star: %s, b_star: %s" % (Now(), a_star, b_star))
    return a_star, b_star

def place_market_order(contract, action, quantity):
    """Place a market order for a specified quantity."""
    order = MarketOrder(action, quantity)
    trade = ib.placeOrder(contract, order)
    print(f"{Now()} Placing {action} market order for {quantity} units")

    while not trade.isDone():
        ib.waitOnUpdate()

    price = trade.fills[0].execution.avgPrice if trade.fills else None
    if price:
        print(f"{Now()} Placed {action} market order for {quantity} units at price {price}")
    else:
        print(f"{Now()} {action} market order for {quantity} units was not filled")
    return price

def get_account_balance():
    """Retrieve account balance."""
    account_values = ib.accountValues()
    net_liquidation = next((x for x in account_values if x.tag == 'NetLiquidation'), None)
    if net_liquidation:
        return float(net_liquidation.value)
    return 0.0

async def main():
    # Parameters
    check_window = "5 D"
    bar_size = "1 min"
    contract = Stock('USDCHF', 'GC')
    r = 0.01  # Example discount rate
    c = 0.01  # Example transaction cost
    stop_loss_percent = 0.01  # Example stop-loss level as a percentage
    trade_amount = 200  # Number of USDCHF shares to trade
    hold_seconds = 300  # Hold for 5 minutes (300 seconds)
    wait = 60  # Check every 60 seconds

    # Fetch historical data and estimate parameters
    df = fetch_historical_data(contract, duration=check_window, bar_size=bar_size)
    if df is not None:
        mu, theta, sigma = estimate_ou_parameters(df)
        a_star, b_star = calculate_entry_exit_levels(mu, theta, sigma, r, c, stop_loss_percent)
        initial_balance = get_account_balance()  # Get the initial account balance
        print("%s Initial account balance: $%s" % (Now(), initial_balance))
    else:
        print("%s Historical data could not be fetched. Exiting." % Now())
        return

    while True:
        # Data
        bars = await ib.reqHistoricalDataAsync(contract, endDateTime='', durationStr=check_window, barSizeSetting=bar_size, whatToShow='MIDPOINT', useRTH=True)
        df = util.df(bars)
        print("%s Data downloaded" % Now())

        # Check condition for mean-reversion strategy
        p_movement = round(df["close"].iloc[-1] / df["open"].iloc[0] - 1, 6)
        condition = p_movement > 0.0001 / 100  # 0.01%

        if condition:
            print(f"{Now()} Condition met ({p_movement}), open trade with market order")
            ib.qualifyContracts(contract)
            price_sell = place_market_order(contract, "SELL", trade_amount)
            if price_sell:
                print("%s Trade opened: Sold at %s" % (Now(), price_sell))

                # Hold the trade
                print("%s Holding the trade" % Now())
                await asyncio.sleep(hold_seconds)

                # Close the trade
                print(f"{Now()} Closing the trade with market order")
                price_buy_back = place_market_order(contract, "BUY", trade_amount)
                if price_buy_back:
                    print("%s Trade closed: Bought back at %s" % (Now(), price_buy_back))

                    profit = "%s %%" % round(100 * (price_sell / price_buy_back - 1), 6)
                    print("%s Gross profit from trade: %s" % (Now(), profit))
                else:
                    print("%s Failed to buy back the shares" % Now())
            else:
                print("%s Failed to sell the shares" % Now())
        else:
            print("%s Condition not met (%s), do not open any trade" % (Now(), p_movement))

        # Wait for the next iteration
        print("%s ........ wait for %s secs ........." % (Now(), wait))
        await asyncio.sleep(wait)

# Run the main function using asyncio
asyncio.run(main())


#### *References *
Hull, J.C. (2017) Option, Futures, and Other Derivatives (Pearson), 9th edition

Leung T., Xin L. (2015) Optimal Mean Reversion Trading with Transaction Costs and Stop-Loss Exit

Lee D., Leung T. (2020) On the E cacy of Optimized Exit Rule for Mean Reversion Trading