# Pair Trading Analysis
  
Given a pair of stocks that are Conintegrated, grab their PricePredict Objects from the
ppo/ directory and plot a median/spread graph of the two stocks. This will allow us to see
how the two stocks are moving relative to each other and if there are any opportunities to
trade the spread between the two stocks.

Run a simple trading simulation that uses a strategy of trading the 2 stocks at the same time
as follows:
* Opening a Pairs Trade...
    * When the 2 stocks diverge and the stock that is overvalued is more than the historical standard deviation
      from the mean (based on N most profitable moves) and the undervalued stock is more than the historical 
      standard deviation from the mean (etc.). 
        * Sell the overvalued stock and buy the undervalued stock.
          Only do these trades if the prediction for the next day indicates that the overvalued stock
          is going to go down and the undervalued stock is going to go up on on the current timeframe,
          and on the next higher timeframe (weekly, given a daily trading timeframe).
          Then buy/sell the stocks at the opening price of the next day.
        * Use additional indicators that indicate a reversal in the spread (as needed). 
    * Do we trade from convergence?
        * It does not make sense to trade from convergence, as it is difficult to determine which
          stock to go long on and which stock to go short on.
        * We should only trade from divergence, as we can determine which stock is overvalued and which
          stock is undervalued. 
* Closing a Pairs Trade...
    * Given an open Pairs Trade that started from a Spread Divergence... 
        * We wait for the spread to converge to the median and then close the trade.
        * We then calculate the profit/loss of the trade and add it to the total profit/loss.
* How to choose going long vs going short (one stock will be long and the other short)...
    * The Spread calculation is essentially the difference between the two stocks. Stock_A - Stock_B.
        * As Stock_A goes up and Stock_B goes down, the spread will increase (go up in the spread graph).
        * As Stock_A goes down and Stock_B goes up, the spread will decrease (go down in the spread graph).    
    * If the spread is above 2 standard deviations from the mean, we go short on Stock_A and long on Stock_B.
    * If the spread is below 2 standard deviations from the mean, we go long on Stock_A and short on Stock_B.
* How much to trade...
    * We use the Hedge Ratio to determine how much of each stock to trade.
        * The Hedge Ratio is the Beta from the OLS regression of Stock_A on Stock_B.
    * The Hedge Ratio is the amount of Stock_B that is needed to hedge the risk of Stock_A.
        * If the Hedge Ratio is 1.5, then we would trade 1.5 shares of Stock_B for every 1 share of Stock_A.
        * If the Hedge Ratio is negative (-1.5), then for every 1 share of Stock_B, we would trade 1.5 shares of Stock_A.

## Questions...

* Which is a stronger indicator of Cointegration...
    * Weekly or Daily?
    * Both are useful much the way a weekly prediction indicates the longer term trend and a daily prediction
      indicates the shorter term trend.

## Insights...

* Different start-end periods result in different Hedge Ratios and Spread Deviations.
    * Longer timeframes (5yrs vs 30days) result in lower Hedge Ratios and Spread Deviations. It takes longer for the spread to converge.
* It probably makes sense to hold on to the data for a given trading pair trade entry so that one can
  continue to track the move to the originally calculate median (exit point), sticking to the original
  trading plan. We add more data to the close columns and we calculate the spread and the current spread
  the original Beta/Hedge Ratio, while the median (our exit point) stays the same.


    


In [1]:
import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Helper function to get the size of an object (Curiosity)
# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size 


In [2]:
import pandas as pd
import statsmodels.api as sm
from decimal import Decimal
from pandas_decimal import DecimaldDtype

def get_trading_pair_spread(ppos: tuple, beta: Decimal = None, 
                            prev_days: int = None,
                            start_period: int = None, end_period: int = None,
                            start_date: str = None, end_date: str = None):
    
    # Create a DataFrame of the closing prices from the PPO[0 and 1].orig_data dataframes
    closes1 = ppos[0].orig_data['Close'].astype(DecimaldDtype(5))
    closes2 = ppos[1].orig_data['Close'].astype(DecimaldDtype(5))
    # Make closes1 and closes2 the same length
    min_len = min(len(closes1), len(closes2))
    if prev_days is None:
        prev_days = min_len
    elif prev_days > min_len:
        prev_days = min_len
    if start_period is not None and end_period is not None:
        # Gather closes based numeric index    
        closes1 = closes1[start_period:end_period]
        closes2 = closes2[start_period:end_period]
    elif start_date is not None and end_date is not None:
        # Gather closes based on the date index column
        closes1 = closes1.loc[start_date:end_date]
        closes2 = closes2.loc[start_date:end_date]
    else:
        # Default to the last prev_days    
        closes1 = closes1.tail(prev_days)
        closes2 = closes2.tail(prev_days)
    df_closes = pd.DataFrame({'Stock_A': closes1, 'Stock_B': closes2})
    # df_closes.replace([np.inf, -np.inf], None, inplace=True)
    df_closes = df_closes.bfill().ffill()
    try:
        if beta is None:
            # Perform OLS to find beta
            X = df_closes['Stock_B']
            X = sm.add_constant(X)  # Adds a constant term to the predictor
            model = sm.OLS(df_closes['Stock_A'], X).fit()
            beta = model.params['Stock_B']
    except Exception as e:
        print(f"Error: {e}")
        beta = np.float32(1.0)
        
    # Detrend the closes
    # closes1m = (closes1 - closes1.rolling(window=3)).mean()
    closes1m = closes1.rolling(window=3).apply(lambda x: (x - x.mean()).mean())
    # closes2m = (closes2 - closes2.rolling(window=3)).mean()
    closes2m = closes2.rolling(window=3).apply(lambda x: (x - x.mean()).mean())
    df_detrend = pd.DataFrame({'Stock_A': closes1m, 'Stock_B': closes2m})
    df_detrend = df_detrend.bfill().ffill()
    # Calculate the spread and its mean using the Hedge-Ratio beta 
    df_detrend['Spread'] = df_closes['Stock_A'] - beta * df_closes['Stock_B']
    spread_mean = df_detrend['Spread'].mean()
    # Create a line that is 1 standard deviation above from the spread-mean
    df_detrend['Mean_1std_a'] = spread_mean + df_detrend['Spread'].std()
    # Create a line that is 2 standard deviation above from the spread-mean
    df_detrend['Mean_2std_a'] = spread_mean + 2 * df_detrend['Spread'].std()
    # Create a line that is 1 standard deviation below from the spread-mean
    df_detrend['Mean_1std_b'] = spread_mean - df_detrend['Spread'].std()
    # Create a line that is 2 standard deviation below from the spread-mean
    df_detrend['Mean_2std_b'] = spread_mean - 2 * df_detrend['Spread'].std()

    return ppos, df_closes, df_detrend, spread_mean, beta 


In [3]:
import matplotlib.pyplot as plt

def show_annotation(sel):
    x, y = sel.target
    ind = sel.index
    sel.annotation.set_text(f'{x:.0f}, {y:.0f}: {labels[ind]}')
    
def plot_spread(ppos: tuple, beta: Decimal = None, 
                prev_days: int = None,
                title: str = None,   
                spread_name: str = 'Spread',
                spread_color: str = 'black',
                start_period: int = None, end_period: int = None,
                start_date: str = None, end_date: str = None):
    
    ppos, df_closes, df_detrend, spread_mean, beta = get_trading_pair_spread(ppos, beta, 
                                                                             prev_days, 
                                                                             start_period, end_period,
                                                                             start_date, end_date)    
    # Save the plot data to the PPO objects
    pair = (ppos[0].ticker, ppos[1].ticker)
    sp = spread_mean
    cl = df_closes.copy(deep=True)
    cl.reset_index(inplace=True)
    cl = cl.to_json()
    dc = df_detrend.copy(deep=True)
    dc.reset_index(inplace=True)
    dc = dc.to_json()
    spread_analysis = {'pair': (ppos[0].ticker, ppos[1].ticker),
                       'spread_mean': sp, 
                       'beta': beta,
                       'closes': cl,
                       'detrended_closes': dc
                       }
    ppos[0].spread_analysis[pair] = spread_analysis
    ppos[1].spread_analysis[pair] = spread_analysis
    
    # Plot the spread with mean line
    plt.plot(df_detrend['Spread'], marker='o', label=spread_name, color=spread_color)
    plt.plot(df_detrend['Mean_2std_a'], label='2std_a', color='green')
    plt.plot(df_detrend['Mean_1std_a'], label='1std_a', color='blue')
    plt.plot(df_detrend['Mean_1std_b'], label='1std_b', color='blue')
    plt.plot(df_detrend['Mean_2std_b'], label='2std_b', color='green')
    plt.axhline(spread_mean, color='red', linestyle='--', label='Mean Spread')
    plt.legend()
    if title is None:
        title = 'Spread Between Stock A and Stock B'
    plt.title(title)
    plt.xlabel('Time')
    plt.ylabel(spread_name)
    # Enable x, y grid lines
    plt.grid(True)
    plt.show()

    return plt, beta


In [41]:
%matplotlib notebook

# Import Libraries
import os.path
import numpy as np
import pandas as pd
import logging
import sys
import json
import dill
import pandas as pd
import matplotlib.pyplot as plt
import copy
import berkeleydb as bdb

from pricepredict import PricePredict
from datetime import datetime, timedelta

    
# Use an Object Cache to reduce the prep time for creating and loading the PricePredict objects.
if 'ObjCache' not in globals():
    global ObjCache
    ObjCache = bdb.btopen('ppo_cache.db', 'c')

DirPPO = '../ppo/'
def get_ppo(symbol: str, period: str):
    
    global ObjCache

    # print(f'Type of ObjCache: {type(ObjCache)}')

    ppo_name = symbol + '_' + period

    if bytes(ppo_name, 'latin1') in ObjCache.keys():
        print(f"Using Cached PPO: {ppo_name}")    
        ppo = PricePredict.unserialize(ObjCache[bytes(ppo_name, 'latin1')])
        return 'None', ppo
    
    file_name_starts_with = symbol + '_' + period
    # Find all PPO files for the symbol in the PPO directory
    ppo_files = [f for f in os.listdir(DirPPO) if f.startswith(file_name_starts_with) and f.endswith('.dilz')]
    ppo = None
    if len(ppo_files) > 0:
        # Sort the files by date
        ppo_files.sort()
        # Get the latest PPO file
        ppo_file = ppo_files[-1]
        # Unpickle the PPO file using dilz
        print(f"Reading PPO File: {ppo_file}")
        with open(DirPPO + ppo_file, 'rb') as f:
            f_obj = f.read()
            ppo = PricePredict.unserialize(f_obj)
            
    if ppo is None:
        ppo_file = ppo_name
        print(f"Creating PPO: {ppo_file}")
        ppo = PricePredict(symbol,
                           model_dir='../models/',
                           chart_dir='../charts/',
                           preds_dir='../predictions/',
                           period=period)
        # Train the models on 5 yeas of data...
        end_dt = datetime.now()
        start_dt = end_dt - timedelta(days=5*400)
        end_date = end_dt.strftime('%Y-%m-%d')
        start_date = start_dt.strftime('%Y-%m-%d')
        ppo.fetch_train_and_predict(ppo.ticker, 
                                    start_date, end_date, 
                                    start_date, end_date,
                                    period=PricePredict.PeriodWeekly,
                                    force_training=False,
                                    use_curr_model=True,
                                    save_model=False)
        
    # Cache the ppo
    ObjCache[bytes(ppo_name, 'latin1')] = ppo.serialize_me()

    return ppo_file, ppo

def get_tradingpair_ppos(trading_pair: tuple):
    tp1_weekly_ppo_file, tp1_weekly_ppo = get_ppo(trading_pair[0], PricePredict.PeriodWeekly)
    tp1_daily_ppo_file, tp1_daily_ppo = get_ppo(trading_pair[0], PricePredict.PeriodDaily)
    tp2_weekly_ppo_file, tp2_weekly_ppo = get_ppo(trading_pair[1], PricePredict.PeriodWeekly)
    tp2_daily_ppo_file, tp2_daily_ppo = get_ppo(trading_pair[1], PricePredict.PeriodDaily)
    # print(f'{trading_pair[0]} Weekly PPO: {tp1_weekly_ppo_file}[{tp1_weekly_ppo.period}] {tp1_weekly_ppo}:[{round(getsize(tp1_weekly_ppo)/1024/1024, 2)}]M')
    # print(f'{trading_pair[0]} Daily PPO: {tp1_daily_ppo_file}[{tp1_daily_ppo.period}] {tp1_daily_ppo}:[{round(getsize(tp1_daily_ppo)/1024/1024, 2)}]M')
    # print(f'{trading_pair[1]} Weekly PPO: {tp2_weekly_ppo_file}[{tp2_weekly_ppo.period}] {tp2_weekly_ppo}:[{round(getsize(tp2_weekly_ppo)/1024/1024, 2)}]M')
    # print(f'{trading_pair[1]} Daily PPO: {tp2_daily_ppo_file}[{tp2_daily_ppo.period}] {tp2_daily_ppo}:[{round(getsize(tp2_daily_ppo)/1024/1024, 2)}]M')
    return tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo    

def check_ppo_orig_data(ppo: PricePredict, msg: str = None):
    is_index_datetime = isinstance(ppo.orig_data.index, pd.DatetimeIndex)
    is_date_in_index = 'Date' in ppo.orig_data.index.names    
    if msg is not None and (is_date_in_index is True or is_index_datetime is True):
        print(msg)    
    if is_index_datetime is False:
        print(f"orig_data index is not a DatetimeIndex: {ppo.ticker} {ppo.period}")
    if is_date_in_index is False:
        print(f"orig_data index does not have a 'Date' column: {ppo.ticker} {ppo.period}")

def create_ppos(trading_pair: tuple):
    global ObjCache
    
    model_dir = '../models/'
    chart_dir = '../charts/'
    preds_dir = '../predictions/'

    tp1_weekly_ppo = PricePredict(ticker=trading_pair[0], period=PricePredict.PeriodWeekly,
                                  model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp1_daily_ppo = PricePredict(ticker=trading_pair[0], period=PricePredict.PeriodDaily,
                                 model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp2_weekly_ppo = PricePredict(ticker=trading_pair[1], period=PricePredict.PeriodWeekly,
                                  model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp2_daily_ppo = PricePredict(ticker=trading_pair[1], period=PricePredict.PeriodDaily,
                                 model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
        
    # Train the models on 5 yeas of data...
    end_dt = datetime.now()
    start_dt = end_dt - timedelta(days=5*400)
    end_date = end_dt.strftime('%Y-%m-%d')
    start_date = start_dt.strftime('%Y-%m-%d')
    
    print(f"ObjCache: {ObjCache.keys()}")
    
    # Load 2 years of data for the trading pair
    ppo_name = trading_pair[0] + '_weekly_ppo'
    if ppo_name not in ObjCache.keys():
        tp1_weekly_ppo.fetch_train_and_predict(tp1_weekly_ppo.ticker, 
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodWeekly,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        check_ppo_orig_data(tp1_weekly_ppo, f"After Yahoo Fetch {trading_pair[0]} Weekly PPO")
        ObjCache[ppo_name] = tp1_weekly_ppo.serialize_me()
    else:
        tp1_weekly_ppo = PricePredict.unserialize(ObjCache[ppo_name])
        check_ppo_orig_data(tp1_weekly_ppo, f"After loading from ObjCache {trading_pair[0]} Weekly PPO")

    ppo_name = trading_pair[0] + '_daily_ppo'
    if ppo_name not in ObjCache.keys():
        tp1_daily_ppo.fetch_train_and_predict(tp1_daily_ppo.ticker, 
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodDaily,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        check_ppo_orig_data(tp1_daily_ppo, f"After Yahoo Fetch {trading_pair[0]} Daily PPO")
        ObjCache[ppo_name] = tp1_daily_ppo.serialize_me()
    else:
        tp1_daily_ppo = PricePredict.unserialize(ObjCache[ppo_name])
        check_ppo_orig_data(tp1_daily_ppo, f"After loading from ObjCache {trading_pair[0]} Daily PPO")

    ppo_name = trading_pair[1] + '_weekly_ppo'
    if ppo_name not in ObjCache.keys():
        tp2_weekly_ppo.fetch_train_and_predict(tp2_weekly_ppo.ticker,
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodWeekly,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        check_ppo_orig_data(tp2_weekly_ppo, f"After Yahoo Fetch {trading_pair[1]} Weekly PPO")
        ObjCache[ppo_name] = tp2_weekly_ppo.serialize_me()
    else:
        tp2_weekly_ppo = PricePredict.unserialize(ObjCache[ppo_name])
        check_ppo_orig_data(tp2_weekly_ppo, f"After loading from ObjCache {trading_pair[1]} Weekly PPO")

    ppo_name = trading_pair[1] + '_daily_ppo'
    if ppo_name not in ObjCache.keys():
        tp2_daily_ppo.fetch_train_and_predict(tp2_daily_ppo.ticker,
                                               start_date, end_date, 
                                               start_date, end_date,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        check_ppo_orig_data(tp2_daily_ppo, f"After Yahoo Fetch {trading_pair[1]} Daily PPO")
        ObjCache[ppo_name] = tp2_daily_ppo.serialize_me()
    else:
        tp2_daily_ppo = PricePredict.unserialize(ObjCache[ppo_name])
        check_ppo_orig_data(tp2_daily_ppo, f"After loading from ObjCache {trading_pair[1]} Daily PPO")

    return tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo

def determine_start_end_dates(start_date: str = None, period_len: int = None):
    end_date = None
    if start_date is None and period_len is None:
        # Use the start_date and use today as the end_date
        start_date = tp1_daily_ppo.orig_data.index[0].strftime('%Y-%m-%d')
        end_date = datetime.now().strftime('%Y-%m-%d')
    elif start_date is not None and  period_len is None:
        # Use the start_date and get the end_date from the ppos' orig_data dataframe.    
        start_date = start_date
        end_date = datetime.now().strftime('%Y-%m-%d')
    elif start_date is None and period_len is not None:
        # Make the start_date period_len days before the today
        end_dt = datetime.now()
        end_date = end_dt.strftime('%Y-%m-%d')
        start_dt = end_dt - timedelta(days=period_len)
        start_date = start_dt.strftime('%Y-%m-%d')
    elif start_date is not None and period_len is not None:
        # Use the start_date and make end_date period_len days from the start_date    
        start_date = start_date
        end_dt = datetime.strptime(start_date, '%Y-%m-%d') + timedelta(days=period_len)
        end_date = end_dt.strftime('%Y-%m-%d')

    return start_date, end_date
    
def analyze_trading_pair(trading_pair: tuple, start_date: str = None, period_len: int = None, mpl_plt: plt = None):

    start_date, end_date = determine_start_end_dates(start_date=start_date, period_len=period_len)
    
    end_dt = datetime.strptime(end_date, '%Y-%m-%d')
    
    # Allows the reuse of the matplotlib.pyplot plt object.
    if mpl_plt is not None:
        # Close the current plot so that it can be reused.
        mpl_plt.close()

    # Gather the Weekly and Daily PPOs for the trading pair from the ./ppo/ dir.
    tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = get_tradingpair_ppos(trading_pair)
    
    # Creates ppo objects and caches them to ObjCache.
    # tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = create_ppos(trading_pair)
        
    # Plot the median & spread of the trading pair given the daily PPOs)
    # Plot the Weekly Spread using the Weekly calculated Beta
    # plt, beta = plot_spread((tp1_weekly_ppo, tp2_weekly_ppo), 
    #                         title=f"Weekly Spread [{trading_pair[0]} vs {trading_pair[1]}]",
    #                         spread_name='Weekly')
    # print(f"Weekly Hedge Ratio: {beta}")

    # # Plot the Daily Spread, Using the Weekly Beta
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo), beta, 60, 
    #             title=f"Daily Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #             spread_name='Daily (Wkly Beta)', spread_color='grey')
    # print(f"Daily using Weekly Hedge Ratio: {beta}")
    # # Plot the Daily Spread, Using the Daily calculated Beta
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo), None, 60,
    #                         title=f"Daily Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily', spread_color='orange')
    # print(f"Daily Hedge Ratio: {beta}")
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
    #                         title=f"Daily[1:37] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily [1:37]', spread_color='orange',
    #                         start_period=1, end_period=37)
    # print(f"Daily[1:37] Hedge Ratio {beta}")
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
    #                         title=f"Daily[4/1/21 to 8/1/21] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily [4/1/21 to 8/1/21]', spread_color='orange',
    #                         start_date='4/1/2021', end_date='7/30/2021')
    # print(f"Daily[4/1/21 to 8/1/21] Hedge Ratio {beta}")
    
    print(f"Analyze Trading Pair Start Date: {start_date},  End Date: {end_date}")
    
    plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
                            title=f"Daily[{start_date} to {end_date}] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
                            spread_name='Daily [{start_date} to {end_date}]', spread_color='orange',
                            start_date=start_date, end_date=end_date)
    print(f"Daily[{start_date} to {end_dt}] Hedge Ratio {beta}")
    
    return plt


In [42]:
getsize(ObjCache)

336

# Pair Trading Simulation

* Given the current Trading Pair...
    * From the beginning of the data...
        * Perform the Spread Analysis on an 30day window, moving weekly through the data. 
            * When the spread goes above 2 standard deviations, open a pairs trade.
              Be sure not to trade, trades that have already occurred. 
                * Immediatly move forward in time until the spread converges to the mean.
                  Use the beta and append to the dataset (if needed) to calculate the spread 
                  and to keep the mean stable.
                    * Calculate the profit/loss for each period. Are the draw-downs acceptable?
                    * Hold on to the final profit/loss of the trade upon exit.
    * Throw out open trades and calculate the total profit/loss.

In [43]:

def simulate_pairs_trading(ppos, start_date: str = None, period_len: int = 30):
    """
    Simulate a Pairs Trading Strategy on the given Trading Pair PricePredict Objects.
    - We will begin withe the start_date and move weekly through the data to the end_date
      which is defined by period_len days from the start_date.
    - First we look for a divergence in the spread of the two stocks that is 2 standard deviations
      from the mean in either direction.
    - Then we look for the spread to converge to the mean and close the trade.
    - When we look for the convergence, we look for it 1 day at a time.
    - We should fix the spread mean and beta for the trade.
    - But we should also examine mean and betas as generate from the new data so we can do a study
      of trades that do not converge. And understand what to look for with regard to strategies for
      stopping-out or reorienting the based on the new beta to keep it in play.        
    """

    start_date, end_date = determine_start_end_dates(start_date=start_date, period_len=period_len)
    
    # Get or create the required Trading Pair PPOs
    tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = get_tradingpair_ppos(ppos)

    tp1_daily_ppo.fetch_data_yahoo(ticker=tp1_daily_ppo.ticker, date_start=start_date, date_end=end_date)
    tp2_daily_ppo.fetch_data_yahoo(ticker=tp2_daily_ppo.ticker, date_start=start_date, date_end=end_date)

    start_date1 = tp1_daily_ppo.orig_data.index[0]
    end_date1 = tp1_daily_ppo.orig_data.index[-1]
    start_date2 = tp2_daily_ppo.orig_data.index[0]
    end_date2 = tp2_daily_ppo.orig_data.index[-1]
        
    # Align the start and end dates
    start_date = min(start_date1, start_date2)
    end_date = max(end_date1, end_date2)
    
    # Check the begin and end dates of the data...
    print(f"Start Date: {start_date},  End Date: {end_date}")

    # Create an iterable date range from start to end date
    date_range = pd.date_range(start=start_date, end=end_date, freq='W')
    
    traded_dates = []
    trade_counter = 0
    last_trade_exit_dt = None
    last_trade_exit_date = None
    trade_date = None   # The date of a trade
    exit_date = None
    df_all_trades = pd.DataFrame()
    df_all_convergence = pd.DataFrame()
    in_trade = False
    short_a_long_b = None
    
    # Each trade exists within the windows data range.
    # This loop finds the start-date of a trade and the end-date of a trade.
    # Start and end dates of trades should not overlap.
    # Each cycle of the loop is a trade.
    for win_date in date_range:
        # print(f"Window Date: {win_date}")
        # Calculate the spread for the 30 days prior to the win_date
        win_date_start = win_date
        if period_len is None:
            period_len = 30
        win_date_end = win_date_start + timedelta(days=period_len)
        print(f"Divergence Window Start Date: {win_date_start},  End Date: {win_date_end}")
        ppos, df_closes, df_detrend, spread_mean, beta = get_trading_pair_spread((tp1_daily_ppo, tp2_daily_ppo), start_date=win_date_start, end_date=win_date_end)    
        trade_dt = None
        
        if last_trade_exit_date is None:
            # Make the last trade exit date df_trend.index[0] - 1day
            last_trade_exit_dt = df_detrend.index[0] - timedelta(days=1)
            last_trade_exit_date = last_trade_exit_dt.strftime('%Y-%m-%d')
            
        # Get the dates when the spread is below 2 standard deviations
        df_detrend['Spread'].bfill().ffill()
        df_detrend['Mean_2std_a'].bfill().ffill()
        dates_over_mean_2std = df_detrend[(df_detrend.index > last_trade_exit_date) & (df_detrend['Spread'] >= df_detrend['Mean_2std_a'])].copy()
        # if in_trade is False and len(dates_over_mean_2std) > 0:
        #     # Remove rows in dates_over_mean where Stock_A and Stock_B are 0
        #     dates_over_mean_2std = dates_over_mean_2std[dates_over_mean_2std['Stock_A'] != 0]
        #     dates_over_mean_2std = dates_over_mean_2std[dates_over_mean_2std['Stock_B'] != 0]
        if in_trade is False and len(dates_over_mean_2std) > 0:
            over_mean_trade_dt = dates_over_mean_2std.index[0]
        
        # Get the dates when the spread is below 2 standard deviations
        dates_under_mean_2std = df_detrend[(df_detrend.index > last_trade_exit_date) & (df_detrend['Spread'] <= df_detrend['Mean_2std_b'])].copy()
        # if in_trade is False and len(dates_under_mean_2std) > 0:
        #     # Remove rows in dates_over_mean where Stock_A and Stock_B a
        #     dates_under_mean_2std = dates_under_mean_2std[dates_under_mean_2std['Stock_A'] != 0]
        #     dates_under_mean_2std = dates_under_mean_2std[dates_under_mean_2std['Stock_B'] != 0]
        if in_trade is False and len(dates_under_mean_2std) > 0:
            under_mean_trade_dt = dates_under_mean_2std.index[0]

        if len(dates_over_mean_2std) > 0 and len(dates_under_mean_2std) > 0:
            # Process the smaller date period    
            if over_mean_trade_dt < under_mean_trade_dt:
                if in_trade is False and len(dates_over_mean_2std) > 0:
                    trade_dt = dates_over_mean_2std.index[0]
                    in_trade = True    
                    short_a_long_b = True
                    print(f"[{trade_dt}] Getting Into Trade[{trade_counter}]: Sort A Long B: {short_a_long_b}")
            elif over_mean_trade_dt > under_mean_trade_dt:
                if in_trade is False and len(dates_under_mean_2std) > 0:
                    trade_dt = dates_under_mean_2std.index[0]
                    in_trade = True    
                    short_a_long_b = False
                    print(f"[{trade_dt}] Getting Into Trade[{trade_counter}]: Sort A Long B: {short_a_long_b}")
        elif len(dates_over_mean_2std) > 0:
                if in_trade is False and len(dates_over_mean_2std) > 0:
                    trade_dt = dates_over_mean_2std.index[0]
                    in_trade = True    
                    short_a_long_b = True
                    print(f"[{trade_dt}] Getting Into Trade[{trade_counter}]: Sort A Long B: {short_a_long_b}")
        elif len(dates_under_mean_2std) > 0:            
                if in_trade is False and len(dates_under_mean_2std) > 0:
                    trade_dt = dates_under_mean_2std.index[0]
                    in_trade = True    
                    short_a_long_b = False
                    print(f"[{trade_dt}] Getting Into Trade[{trade_counter}]: Sort A Long B: {short_a_long_b}")
        
        if trade_dt is not None:
            trade_date = trade_dt.strftime('%Y-%m-%d')
            
        if (in_trade and
                (trade_dt is not None and last_trade_exit_date is not None 
                 and  trade_dt <= last_trade_exit_dt)):
            continue

        saved_spread_mean = spread_mean
        
        # Go forward in time until the spread converges to the mean, using the same beta.
        # Check if the current window has a future date where the spread converges to the mean.
        # Get the first row where the spread is above the mean from dates_over_mean
        
        if short_a_long_b and len(dates_over_mean_2std) > 0:
            # Get the first row where the spread is above the mean
            df_trade = dates_over_mean_2std.iloc[0].copy()
            # Get the first date where the spread is above the mean
            trade_date = dates_over_mean_2std.index[0].strftime('%Y-%m-%d')
            # Find when the spread converges to the mean
            spread_converges = df_detrend[(df_detrend.index > trade_date) & (df_detrend['Spread'] <= saved_spread_mean)].copy()
            if len(spread_converges) > 0:
                exit_date = spread_converges.index[0].strftime('%Y-%m-%d')
                print(f"Short A Long B (Spread Converges): {trade_date} to {exit_date}")
            else:
                trade_date = None    
                exit_date = None
        elif short_a_long_b is False and len(dates_under_mean_2std) > 0:
            # Get the first row where the spread is under the mean
            df_trade = dates_under_mean_2std.iloc[0].copy()
            # Get the first date where the spread is above the mean
            trade_date = dates_under_mean_2std.index[0].strftime('%Y-%m-%d')
            spread_converges = df_detrend[(df_detrend.index > trade_date) & (df_detrend['Spread'] >= saved_spread_mean)].copy()
            if len(spread_converges) > 0:
                exit_date = spread_converges.index[0].strftime('%Y-%m-%d')
                print(f"Long A Short B (Spread Converges): {trade_date} to {exit_date}")
            else:
                trade_date = None    
                exit_date = None
        
        if trade_date is None:
            print(f"No more Tradeable Spreads found... Exiting")    
            break
        # Make sure that we are paste the last trade exit date
        if last_trade_exit_date is not None and trade_date <= last_trade_exit_date:
            continue
        
        # Get the current actual price of the Stocks from the df_closes DataFrame
        if trade_date in df_closes.index:
            stock_a_entry = df_closes.loc[trade_date]['Stock_A']
            stock_b_entry = df_closes.loc[trade_date]['Stock_B']
        else:
            # Actual stock prices are needed to calculate the trade
            print(f"Error: Could not get actual stock prices from df_closes Date: {trade_date} not in df_closes")
            break
            
        # if beta is < 0, reverse the trade
        if beta < 0:
            short_a_long_b = not short_a_long_b
            
        # Calculate the exit price of the Stocks
        stock_a_exit = stock_b_entry * beta
        stock_b_exit = stock_a_entry / beta
        # Calculate the profit/loss of the trade
        # We Short Stock_A and Long Stock_B
        if short_a_long_b:
            expected_profit = (stock_a_entry - stock_a_exit) + (stock_b_exit - stock_b_exit)
        else:
            expected_profit = (stock_a_exit - stock_a_entry) + (stock_b_entry - stock_b_exit)
        # Calculate the quantity of Stock_A and Stock_B to trade
        if beta > 0:
            stock_a_quantity = 1
            stock_b_quantity = beta * stock_a_quantity
        else:
            stock_b_quantity = 1
            stock_a_quantity = beta * stock_b_quantity
            
        # Has this date been traded on before?
        if (in_trade and trade_date not in traded_dates and exit_date is not None
            and (last_trade_exit_date is None or trade_date > last_trade_exit_date)):

            # Add stock_a_exit and stock_b_exit and expected_profit to the trade_entry DataFrame
            # Calculate Stock Quantity to trade
            df_trade['Trade_Entry'] = trade_date
            df_trade['Spread_Mean'] = spread_mean
            df_trade['Beta_HedgeRatio'] = beta
            df_trade['ShortA_LongB'] = short_a_long_b
            df_trade['Stock_A_Entry'] = stock_a_entry
            df_trade['Stock_B_Entry'] = stock_b_entry
            df_trade['Stock_A_Quantity'] = stock_a_quantity
            df_trade['Stock_B_Quantity'] = stock_b_quantity
            df_trade['Stock_A_Exit'] = stock_a_exit
            df_trade['Stock_B_Exit'] = stock_b_exit
            df_trade['Expected_Profit'] = expected_profit
            df_trade['Trade_Counter'] = trade_counter
            
            # Perform the detrended spread analysis from the trade entry date to the end_dt
            # simulating the trade as it evolves.
            traded_dates.append(trade_date)
            trade_dt = datetime.strptime(trade_date, '%Y-%m-%d')
            end_dt = trade_dt + pd.Timedelta(days=1)
            exit_dt = datetime.strptime(exit_date, '%Y-%m-%d')
            short_a_conv = False
            long_a_conv = False
            while True:
                end_dt_str = end_dt.strftime('%Y-%m-%d')
                print(f"Checking convergence between Start Date: {win_date_start},  End Date: {win_date_end} || [{end_dt_str}] ||")
                ppos, df_anl_closes, df_anl_detrend, spread_mean, beta = get_trading_pair_spread((tp1_daily_ppo, tp2_daily_ppo), beta=beta, start_date=win_date_start, end_date=end_dt_str)
                ppos_n, df_anl_closes_n, df_anl_detrend_n, spread_mean_n, beta_n = get_trading_pair_spread((tp1_daily_ppo, tp2_daily_ppo), beta=None, start_date=win_date_start, end_date=end_dt_str)

                # Check if the current window has a future date where the spread converges to the mean.
                if short_a_long_b:
                    short_a_conv = df_anl_detrend.iloc[-1]['Spread'] <= spread_mean
                elif short_a_long_b is False:
                    long_a_conv = df_anl_detrend.iloc[-1]['Spread'] >= spread_mean
                    
                if short_a_conv or long_a_conv:
                    convergence = pd.DataFrame(df_anl_detrend.iloc[-1])
                    print(f"Found Convergence: {end_dt_str}")    
                    break

                end_dt = end_dt + timedelta(days=1)
            
            # Grab the first row in Convergence
            df_convergence = {}
            if len(convergence) > 0 and in_trade:
                # Get the date of the convergence
                exit_dt = convergence.iloc[-1].index[0]
                exit_date = exit_dt.strftime('%Y-%m-%d')
                # Get the current actual price of the Stocks from the df_closes DataFrame
                stock_a_exit = df_closes[df_closes.index == exit_dt]['Stock_A']
                stock_b_exit = df_closes[df_closes.index == exit_dt]['Stock_B']
                
                # Add the saved_mean to the df_convergence DataFrame
                df_convergence['Trade_Entry'] = trade_date
                df_convergence['Trade_Exit'] = exit_date
                df_convergence['Spread_Mean'] = saved_spread_mean
                df_convergence['New_Spread_Mean'] = spread_mean
                df_convergence['Beta_HedgeRatio'] = beta
                df_convergence['ShortA_LongB'] = short_a_long_b
                
                # Add the exit prices to the df_convergence DataFrame
                df_convergence['Stock_A_Exit'] = stock_a_exit
                df_convergence['Stock_B_Exit'] = stock_b_exit
                # Get entry value of Stock_A
                entry_value_a = df_trade['Stock_A_Entry'] * df_trade['Stock_A_Quantity']
                # Calculate exit value of Stock_B
                entry_value_b = df_trade['Stock_B_Entry'] * df_trade['Stock_B_Quantity']
                # Calculate the exit value of Stock_A
                exit_value_a = df_convergence['Stock_A_Exit'] * df_trade['Stock_A_Quantity']
                # Calculate the exit value of Stock_B
                exit_value_b= df_convergence['Stock_B_Exit'] * df_trade['Stock_B_Quantity']
                # Calculate the profit/loss of the trade
                if short_a_long_b:
                    profit_loss = (entry_value_a - exit_value_a) + (exit_value_b - entry_value_b)
                else:
                    profit_loss = (exit_value_a - entry_value_a) + (entry_value_b - exit_value_b)    
                # Add the entry_value, exit_value, and profit_loss to the df_convergence DataFrame
                df_convergence['Entry_Value'] = entry_value_a + entry_value_b
                df_convergence['Exit_Value'] = exit_value_a + exit_value_b
                df_convergence['Profit_Loss'] = profit_loss
                df_convergence['Trade_Counter'] = trade_counter
                
                # Add the df_trade row to the df_all_trades DataFrame
                if len(df_all_trades) == 0:
                    df_all_trades = df_trade
                else:
                    df_all_trades = pd.concat([df_all_trades, df_trade], axis=1)
                # Add the df_convergence row to the df_all_convergence DataFrame
                if len(df_all_convergence) == 0:
                    df_all_convergence = pd.Series(df_convergence)
                else:
                    df_all_convergence = pd.concat([df_all_convergence, pd.Series(df_convergence)], axis=1)
                
                print(f"[{trade_date}] Trade Exit[{trade_counter}]: Short A Long B: {short_a_long_b}")
                trade_counter += 1
                exit_date = None
                last_trade_exit_date = trade_date
                in_trade = False
                short_a_long_b = None
            pass
        else:
            # Trade never converges to the mean
            df_trade['Converges'] = False
            # TODO: Analyze for stop loss
            print(f"[{trade_date}] Trade Never Converges")
            trade_counter += 1
            exit_date = None
            in_trade = False
            short_a_long_b = None
            last_trade_exit_date = trade_date
            traded_dates.append(trade_date)
            continue
            
    final_trades = None
    if len(df_all_trades) == 0:
        print(f"No Trades were made during the simulation")
    else:
        print(f"Total Trades: {trade_counter}")
        # Clean up the all_trades dataframe
        df_all_trades = pd.DataFrame(df_all_trades.transpose())
        excluded_columns = ['Stock_A', 'Stock_B', 'Trade_Entry', 'Trade_Exit', df_all_trades.columns[0]]
        df_all_trades.loc[:, ~df_all_trades.columns.isin(excluded_columns)] = df_all_trades.loc[:, ~df_all_trades.columns.isin(excluded_columns)].astype(float)
        # Clean up the all_convergence dataframe
        df_all_convergence = pd.DataFrame(df_all_convergence.transpose())
        excluded_columns = ['Stock_A', 'Stock_B', 'Trade_Entry', 'Trade_Exit', df_all_convergence.columns[0]]
        df_all_convergence.loc[:, ~df_all_convergence.columns.isin(excluded_columns)] = df_all_convergence.loc[:, ~df_all_convergence.columns.isin(excluded_columns)].astype(float)
    
        # Merge the all_trades and all_convergence dataframes on the Trade_Counter column, resulting in just the unique columns between the two dataframes.
        final_trades = pd.merge(df_all_trades, df_all_convergence, on='Trade_Counter', how='inner')
        # Remove columns that end in _a
        final_trades = final_trades.loc[:, ~final_trades.columns.str.endswith('_a')]
        # Remove columns that end in _b
        final_trades = final_trades.loc[:, ~final_trades.columns.str.endswith('_b')]
        # Remove columns that end in _x
        final_trades = final_trades.loc[:, ~final_trades.columns.str.endswith('_x')]
        # Remove columns that end in _y
        final_trades = final_trades.loc[:, ~final_trades.columns.str.endswith('_y')]
        # Reindex the dataframe
        final_trades.reindex(sorted(final_trades['Trade_Counter']), axis=1)

    return final_trades


trading_pair = ('ACN', 'ZM')
trading_days = None
# Plot the spread of the trading pair
plt = analyze_trading_pair(trading_pair, start_date='2020-01-01', period_len=trading_days, mpl_plt=plt)

# Simulate the pairs trading strategy
df_all_trades = simulate_pairs_trading(trading_pair, start_date='2020-01-01', period_len=trading_days)


Using Cached PPO: ACN_W
Using Cached PPO: ACN_D
Using Cached PPO: ZM_W
Using Cached PPO: ZM_D
Analyze Trading Pair Start Date: 2020-01-01,  End Date: 2025-01-21


<IPython.core.display.Javascript object>

Daily[2020-01-01 to 2025-01-21 00:00:00] Hedge Ratio 2.13959566779732
Using Cached PPO: ACN_W
Using Cached PPO: ACN_D
Using Cached PPO: ZM_W
Using Cached PPO: ZM_D


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Start Date: 2020-01-02 00:00:00,  End Date: 2025-01-17 00:00:00
Divergence Window Start Date: 2020-01-05 00:00:00,  End Date: 2020-02-04 00:00:00
No more Tradeable Spreads found... Exiting
No Trades were made during the simulation





In [None]:

print(f"Toal Profit/Loss: {df_all_trades['Profit_Loss'].sum()}")
df_all_trades


In [None]:
df_all_convergence['Profit_Loss'].sum()

In [1]:
# Analyze PP Objects
ppos = ('UPS', 'PTCT')
# Get or create the required Trading Pair PPOs
ppo1_w, ppo1_d, ppo2_w, ppo2_d = get_tradingpair_ppos(ppos)

ObjCache.keys()


NameError: name 'get_tradingpair_ppos' is not defined