<a href="https://colab.research.google.com/github/SidS12345/Quant-projects/blob/main/Cointegrated_Pairs_Trading_Strategy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project implements a statistical arbitrage strategy using cointegrated pairs of equities. By identifying pairs with long-run equilibrium relationships, we construct mean-reverting spreads and generate trading signals based on z-score thresholds. The strategy will be backtested on historical data, evaluating profitability, risk-adjusted returns, and robustness against market fluctuations and transaction costs.

In [None]:
import numpy as np
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
from datetime import datetime, timedelta
from statsmodels.tsa.stattools import coint

In [None]:
# Function to display everything in 2dp

def format_floats(obj, decimals=2):
    if isinstance(obj, dict):
        return {k: format_floats(v, decimals) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [format_floats(item, decimals) for item in obj]
    elif isinstance(obj, (float, np.floating)):
        return f"{obj:.{decimals}f}"
    else:
        return obj

We are going to analyse stocks from the same sector. This gives us a higher chance of identifying closely cointegrated pairs of equities, as we'd expect roughly the same general price trend from stocks in the same sector

In [None]:
tech_tickers = [
    "AAPL",  # Apple
    "MSFT",  # Microsoft
    "GOOGL", # Alphabet
    "AMZN",  # Amazon
    "META",  # Meta Platforms
    "NVDA",  # Nvidia
    "AMD",   # Advanced Micro Devices
    "INTC",  # Intel
    "QCOM",  # Qualcomm
    "CSCO",  # Cisco Systems
    "ORCL",  # Oracle
    "CRM",   # Salesforce
    "ADBE",  # Adobe
    "IBM",   # IBM
    "AVGO",  # Broadcom
    "TXN",   # Texas Instruments
    "MU",    # Micron Technology
    "PYPL",  # PayPal
    "SHOP",  # Shopify
    "SNOW",  # Snowflake
    "PLTR",  # Palantir
    "ZM",    # Zoom Video
    "TWLO",  # Twilio
    "ROKU",  # Roku
    "UBER",  # Uber Technologies
    "SQ",    # Block (Square)
    "DDOG",  # Datadog
    "TEAM",  # Atlassian
    "NET",   # Cloudflare
    "NOW"    # ServiceNow
]


# tech_tickers = ['AAPL','MSFT',"GOOGL"]    -   Taking a smaller tech_tickers to read data easily and ensure program is working


# choosing start and end date, taking today's date to be the end date and giving us 1500 days of stock data
end_date = datetime.today()
start_date = end_date - timedelta(days = 1500)

In [None]:
closing_prices = pd.DataFrame()
for ticker in tech_tickers:
  data = yf.download(ticker, start = start_date, end = end_date)
  closing_prices[ticker] = data["Close"]

# Now cleaning the data so that we can operate on it successfully

closing_prices = closing_prices.dropna(how='any')
closing_prices = closing_prices.drop_duplicates()

  data = yf.download(ticker, start = start_date, end = end_date)
[*********************100%***********************]  1 of 1 completed
  data = yf.download(ticker, start = start_date, end = end_date)
[*********************100%***********************]  1 of 1 completed
  data = yf.download(ticker, start = start_date, end = end_date)
[*********************100%***********************]  1 of 1 completed


In [None]:
cointegration_matrix = []
cointegrated_pairs = []
coint_coeff = 0.5
for i in range(0,len(tech_tickers)):
  lst_cointegrations = []
  for j in range(0, len(tech_tickers)):
    if i == j:
      p_value = 0
    else:
      score, p_value, crit_values = coint(closing_prices[tech_tickers[i]], closing_prices[tech_tickers[j]])
      if p_value <= coint_coeff:
        cointegrated_pairs.append((i,j))
    lst_cointegrations.append(p_value)

  cointegration_matrix.append(lst_cointegrations)