# Pair Trading Analysis
  
Given a pair of stocks that are Conintegrated, grab their PricePredict Objects from the
ppo/ directory and plot a median/spread graph of the two stocks. This will allow us to see
how the two stocks are moving relative to each other and if there are any opportunities to
trade the spread between the two stocks.

Run a simple trading simulation that uses a strategy of trading the 2 stocks at the same time
as follows:
* Opening a Pairs Trade...
    * When the 2 stocks diverge and the stock that is overvalued is more than the historical standard deviation
      from the mean (based on N most profitable moves) and the undervalued stock is more than the historical 
      standard deviation from the mean (etc.). 
        * Sell the overvalued stock and buy the undervalued stock.
          Only do these trades if the prediction for the next day indicates that the overvalued stock
          is going to go down and the undervalued stock is going to go up on on the current timeframe,
          and on the next higher timeframe (weekly, given a daily trading timeframe).
          Then buy/sell the stocks at the opening price of the next day.
        * Use additional indicators that indicate a reversal in the spread (as needed). 
    * Do we trade from convergence?
        * It does not make sense to trade from convergence, as it is difficult to determine which
          stock to go long on and which stock to go short on.
        * We should only trade from divergence, as we can determine which stock is overvalued and which
          stock is undervalued. 
* Closing a Pairs Trade...
    * Given an open Pairs Trade that started from a Spread Divergence... 
        * We wait for the spread to converge to the median and then close the trade.
        * We then calculate the profit/loss of the trade and add it to the total profit/loss.
* How to choose going long vs going short (one stock will be long and the other short)...
    * The Spread calculation is essentially the difference between the two stocks. Stock_A - Stock_B.
        * As Stock_A goes up and Stock_B goes down, the spread will increase (go up in the spread graph).
        * As Stock_A goes down and Stock_B goes up, the spread will decrease (go down in the spread graph).    
    * If the spread is above 2 standard deviations from the mean, we go short on Stock_A and long on Stock_B.
    * If the spread is below 2 standard deviations from the mean, we go long on Stock_A and short on Stock_B.
* How much to trade...
    * We use the Hedge Ratio to determine how much of each stock to trade.
        * The Hedge Ratio is the Beta from the OLS regression of Stock_A on Stock_B.
    * The Hedge Ratio is the amount of Stock_B that is needed to hedge the risk of Stock_A.
        * If the Hedge Ratio is 1.5, then we would trade 1.5 shares of Stock_B for every 1 share of Stock_A.
        * If the Hedge Ratio is negative (-1.5), then for every 1 share of Stock_B, we would trade 1.5 shares of Stock_A.

## Questions...

* Which is a stronger indicator of Cointegration...
    * Weekly or Daily?
    * Both are useful much the way a weekly prediction indicates the longer term trend and a daily prediction
      indicates the shorter term trend.

## Insights...

* Different start-end periods result in different Hedge Ratios and Spread Deviations.
    * Longer timeframes (5yrs vs 30days) result in lower Hedge Ratios and Spread Deviations. It takes longer for the spread to converge.
* It probably makes sense to hold on to the data for a given trading pair trade entry so that one can
  continue to track the move to the originally calculate median (exit point), sticking to the original
  trading plan. We add more data to the close columns and we calculate the spread and the current spread
  the original Beta/Hedge Ratio, while the median (our exit point) stays the same.


    


In [1]:
import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Helper function to get the size of an object (Curiosity)
# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size 


In [74]:
import pandas as pd
import statsmodels.api as sm
from decimal import Decimal
from pandas_decimal import DecimaldDtype

def get_trading_pair_spread(ppos: tuple, beta: Decimal = None, 
                            prev_days: int = None,
                            start_period: int = None, end_period: int = None,
                            start_date: str = None, end_date: str = None):
    
    # Create a DataFrame of the closing prices from the PPO[0 and 1].orig_data dataframes
    closes1 = ppos[0].orig_data['Close'].astype(DecimaldDtype(5))
    closes2 = ppos[1].orig_data['Close'].astype(DecimaldDtype(5))
    # Make closes1 and closes2 the same length
    min_len = min(len(closes1), len(closes2))
    if prev_days is None:
        prev_days = min_len
    elif prev_days > min_len:
        prev_days = min_len
    if start_period is not None and end_period is not None:
        # Gather closes based numeric index    
        closes1 = closes1[start_period:end_period]
        closes2 = closes2[start_period:end_period]
    elif start_date is not None and end_date is not None:
        # Gather closes based on the date index column
        closes1 = closes1.loc[start_date:end_date]
        closes2 = closes2.loc[start_date:end_date]
    else:
        # Default to the last prev_days    
        closes1 = closes1.tail(prev_days)
        closes2 = closes2.tail(prev_days)
    df_closes = pd.DataFrame({'Stock_A': closes1, 'Stock_B': closes2})
    # df_closes.replace([np.inf, -np.inf], None, inplace=True)
    df_closes = df_closes.bfill().ffill()
    try:
        if beta is None:
            # Perform OLS to find beta
            X = df_closes['Stock_B']
            X = sm.add_constant(X)  # Adds a constant term to the predictor
            model = sm.OLS(df_closes['Stock_A'], X).fit()
            beta = model.params['Stock_B']
    except Exception as e:
        print(f"Error: {e}")
        beta = np.float32(1.0)
        
    # Detrend the closes
    # closes1m = (closes1 - closes1.rolling(window=3)).mean()
    closes1m = closes1.rolling(window=3).apply(lambda x: (x - x.mean()).mean())
    # closes2m = (closes2 - closes2.rolling(window=3)).mean()
    closes2m = closes2.rolling(window=3).apply(lambda x: (x - x.mean()).mean())
    df_detrend = pd.DataFrame({'Stock_A': closes1m, 'Stock_B': closes2m})
    df_detrend = df_detrend.bfill().ffill()
    # Calculate the spread and its mean using the Hedge-Ratio beta 
    df_detrend['Spread'] = df_closes['Stock_A'] - beta * df_closes['Stock_B']
    spread_mean = df_detrend['Spread'].mean()
    # Create a line that is 1 standard deviation above from the spread-mean
    df_detrend['Mean_1std_a'] = spread_mean + df_detrend['Spread'].std()
    # Create a line that is 2 standard deviation above from the spread-mean
    df_detrend['Mean_2std_a'] = spread_mean + 2 * df_detrend['Spread'].std()
    # Create a line that is 1 standard deviation below from the spread-mean
    df_detrend['Mean_1std_b'] = spread_mean - df_detrend['Spread'].std()
    # Create a line that is 2 standard deviation below from the spread-mean
    df_detrend['Mean_2std_b'] = spread_mean - 2 * df_detrend['Spread'].std()

    return ppos, df_closes, df_detrend, spread_mean, beta 


In [3]:
import matplotlib.pyplot as plt

def show_annotation(sel):
    x, y = sel.target
    ind = sel.index
    sel.annotation.set_text(f'{x:.0f}, {y:.0f}: {labels[ind]}')
    
def plot_spread(ppos: tuple, beta: Decimal = None, 
                prev_days: int = None,
                title: str = None,   
                spread_name: str = 'Spread',
                spread_color: str = 'black',
                start_period: int = None, end_period: int = None,
                start_date: str = None, end_date: str = None):
    
    ppos, df_closes, df_detrend, spread_mean, beta = get_trading_pair_spread(ppos, beta, 
                                                                             prev_days, 
                                                                             start_period, end_period,
                                                                             start_date, end_date)    
    # Save the plot data to the PPO objects
    pair = (ppos[0].ticker, ppos[1].ticker)
    sp = spread_mean.copy()
    cl = df_closes.copy(deep=True)
    cl.reset_index(inplace=True)
    cl = cl.to_json()
    dc = df_detrend.copy(deep=True)
    dc.reset_index(inplace=True)
    dc = dc.to_json()
    spread_analysis = {'pair': (ppos[0].ticker, ppos[1].ticker),
                       'spread_mean': sp, 
                       'beta': beta,
                       'closes': cl,
                       'detrended_closes': dc
                       }
    ppos[0].spread_analysis[pair] = spread_analysis
    ppos[1].spread_analysis[pair] = spread_analysis
    
    # Plot the spread with mean line
    plt.plot(df_detrend['Spread'], marker='o', label=spread_name, color=spread_color)
    plt.plot(df_detrend['Mean_2std_a'], label='2std_a', color='green')
    plt.plot(df_detrend['Mean_1std_a'], label='1std_a', color='blue')
    plt.plot(df_detrend['Mean_1std_b'], label='1std_b', color='blue')
    plt.plot(df_detrend['Mean_2std_b'], label='2std_b', color='green')
    plt.axhline(spread_mean, color='red', linestyle='--', label='Mean Spread')
    plt.legend()
    if title is None:
        title = 'Spread Between Stock A and Stock B'
    plt.title(title)
    plt.xlabel('Time')
    plt.ylabel(spread_name)
    # Enable x, y grid lines
    plt.grid(True)
    plt.show()

    return plt, beta


In [87]:
%matplotlib notebook

# Import Libraries
import os.path
import numpy as np
import pandas as pd
import logging
import sys
import json
import dill
import pandas as pd
import matplotlib.pyplot as plt
import copy
from pricepredict import PricePredict
from datetime import datetime, timedelta

# Use an Object Cache to reduce the prep time for creating and loading the PricePredict objects.
if 'ObjCache' not in globals():
    global ObjCache
    ObjCache = {}

DirPPO = '../ppo/'
def get_ppo(symbol: str, period: str):
    file_name_starts_with = symbol + '_' + period
    # Find all PPO files for the symbol in the PPO directory
    ppo_files = [f for f in os.listdir(DirPPO) if f.startswith(file_name_starts_with)]
    # Sort the files by date
    ppo_files.sort()
    # Get the latest PPO file
    ppo_file = ppo_files[-1]
    # Unpickle the PPO file using dill
    with open(DirPPO + ppo_file, 'rb') as f:
        ppo = dill.load(f)
    return ppo_file, ppo

def get_tradingpair_ppos(trading_pair: tuple):
    tp1_weekly_ppo_file, tp1_weekly_ppo = get_ppo(trading_pair[0], PricePredict.PeriodWeekly)
    tp1_daily_ppo_file, tp1_daily_ppo = get_ppo(trading_pair[0], PricePredict.PeriodDaily)
    tp2_weekly_ppo_file, tp2_weekly_ppo = get_ppo(trading_pair[1], PricePredict.PeriodWeekly)
    tp2_daily_ppo_file, tp2_daily_ppo = get_ppo(trading_pair[1], PricePredict.PeriodDaily)
    print(f'{trading_pair[0]} Weekly PPO: {tp1_weekly_ppo_file} {tp1_weekly_ppo}:[{round(getsize(tp1_weekly_ppo)/1024/1024, 2)}]M')
    print(f'{trading_pair[0]} Daily PPO: {tp1_daily_ppo_file} {tp1_daily_ppo}:[{round(getsize(tp1_daily_ppo)/1024/1024, 2)}]M')
    print(f'{trading_pair[1]} Weekly PPO: {tp2_weekly_ppo_file} {tp2_weekly_ppo}:[{round(getsize(tp2_weekly_ppo)/1024/1024, 2)}]M')
    print(f'{trading_pair[1]} Daily PPO: {tp2_daily_ppo_file} {tp2_daily_ppo}:[{round(getsize(tp2_daily_ppo)/1024/1024, 2)}]M')
    return tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo    

def get_prop_ppos(trading_pair: tuple):
    global ObjCache
    
    model_dir = '../models/'
    chart_dir = '../charts/'
    preds_dir = '../predictions/'

    tp1_weekly_ppo = PricePredict(ticker=trading_pair[0], period=PricePredict.PeriodWeekly,
                                  model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp1_daily_ppo = PricePredict(ticker=trading_pair[0], period=PricePredict.PeriodDaily,
                                 model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp2_weekly_ppo = PricePredict(ticker=trading_pair[1], period=PricePredict.PeriodWeekly,
                                  model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
    tp2_daily_ppo = PricePredict(ticker=trading_pair[1], period=PricePredict.PeriodDaily,
                                 model_dir=model_dir, chart_dir=chart_dir, preds_dir=preds_dir)
        
    # Train the models on 5 yeas of data...
    end_dt = datetime.now()
    start_dt = end_dt - timedelta(days=5*400)
    end_date = end_dt.strftime('%Y-%m-%d')
    start_date = start_dt.strftime('%Y-%m-%d')
    
    print(f"ObjCache: {ObjCache.keys()}")
    
    # Load 2 years of data for the trading pair
    ppo_name = trading_pair[0] + '_weekly_ppo'
    if ppo_name not in ObjCache.keys():
        tp1_weekly_ppo.fetch_train_and_predict(tp1_weekly_ppo.ticker, 
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodWeekly,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        ObjCache[ppo_name] = tp1_weekly_ppo.serialize_me()
    else:
        tp1_weekly_ppo = PricePredict.unserialize(ObjCache[ppo_name])
    if ppo_name not in ObjCache.keys():
        tp1_daily_ppo.fetch_train_and_predict(tp1_daily_ppo.ticker, 
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodDaily,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        ObjCache[ppo_name] = tp1_daily_ppo.serialize_me()
    else:
        tp1_daily_ppo = PricePredict.unserialize(ObjCache[ppo_name])   
    ppo_name = trading_pair[1] + '_weekly_ppo'
    if ppo_name not in ObjCache.keys():
        tp2_weekly_ppo.fetch_train_and_predict(tp2_weekly_ppo.ticker,
                                               start_date, end_date, 
                                               start_date, end_date,
                                               period=PricePredict.PeriodWeekly,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        ObjCache[ppo_name] = tp2_weekly_ppo.serialize_me()
    else:
        tp2_weekly_ppo = PricePredict.unserialize(ObjCache[ppo_name])
    ppo_name = trading_pair[1] + '_daily_ppo'
    if ppo_name not in ObjCache.keys():
        tp2_daily_ppo.fetch_train_and_predict(tp2_daily_ppo.ticker,
                                               start_date, end_date, 
                                               start_date, end_date,
                                               force_training=False,
                                               use_curr_model=True,
                                               save_model=False)
        ObjCache[ppo_name] = tp2_daily_ppo.serialize_me()
    else:
        tp2_daily_ppo = PricePredict.unserialize(ObjCache[ppo_name])

    return tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo
    
def analyze_trading_pair(trading_pair: tuple):    
    # Gather the Weekly and Daily PPOs for the trading pair.
    # tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = get_tradingpair_ppos(trading_pair)
    
    tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = get_prop_ppos(trading_pair)
        
    # Plot the median & spread of the trading pair given the daily PPOs)
    # Plot the Weekly Spread using the Weekly calculated Beta
    plt, beta = plot_spread((tp1_weekly_ppo, tp2_weekly_ppo), 
                            title=f"Weekly Spread [{trading_pair[0]} vs {trading_pair[1]}]",
                            spread_name='Weekly')
    print(f"Weekly Hedge Ratio: {beta}")
    # # Plot the Daily Spread, Using the Weekly Beta
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo), beta, 60, 
    #             title=f"Daily Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #             spread_name='Daily (Wkly Beta)', spread_color='grey')
    # print(f"Daily using Weekly Hedge Ratio: {beta}")
    # # Plot the Daily Spread, Using the Daily calculated Beta
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo), None, 60,
    #                         title=f"Daily Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily', spread_color='orange')
    # print(f"Daily Hedge Ratio: {beta}")
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
    #                         title=f"Daily[1:37] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily [1:37]', spread_color='orange',
    #                         start_period=1, end_period=37)
    # print(f"Daily[1:37] Hedge Ratio {beta}")
    # plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
    #                         title=f"Daily[4/1/21 to 8/1/21] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
    #                         spread_name='Daily [4/1/21 to 8/1/21]', spread_color='orange',
    #                         start_date='4/1/2021', end_date='7/30/2021')
    # print(f"Daily[4/1/21 to 8/1/21] Hedge Ratio {beta}")

    start_date = '2019-08-11 '
    end_date = '2019-10-10'
    plt, beta = plot_spread((tp1_daily_ppo, tp2_daily_ppo),
                            title=f"Daily[4/1/21 to 8/1/21] Spread [{trading_pair[0]} vs {trading_pair[1]}]", 
                            spread_name='Daily [4/1/21 to 8/1/21]', spread_color='orange',
                            start_date=start_date, end_date=end_date)
    print(f"Daily[4/1/21 to 8/1/21] Hedge Ratio {beta}")
    
    return plt

if 'plt' in locals():
    plt.close()

plt = analyze_trading_pair(('AAPL', 'AMX'))


ObjCache: dict_keys(['AAPL_weekly_ppo', 'AMX_weekly_ppo', 'AMX_daily_ppo'])


<IPython.core.display.Javascript object>

Weekly Hedge Ratio: 8.190224536561281
Daily[4/1/21 to 8/1/21] Hedge Ratio 2.584146410815829


In [5]:
getsize(ObjCache)

3394843

# Pair Trading Simulation

* Given the current Trading Pair...
    * From the beginning of the data...
        * Perform the Spread Analysis on an 30day window, moving weekly through the data. 
            * When the spread goes above 2 standard deviations, open a pairs trade.
              Be sure not to trade, trades that have already occurred. 
                * Immediatly move forward in time until the spread converges to the mean.
                  Use the beta and append to the dataset (if needed) to calculate the spread 
                  and to keep the mean stable.
                    * Calculate the profit/loss for each period. Are the draw-downs acceptable?
                    * Hold on to the final profit/loss of the trade upon exit.
    * Throw out open trades and calculate the total profit/loss.

In [128]:
#
def simulate_pairs_trading(ppos):
    # Get or create the required Trading Pair PPOs
    tp1_weekly_ppo, tp1_daily_ppo, tp2_weekly_ppo, tp2_daily_ppo = get_prop_ppos(ppos)
    
    # Check the begin and end dates of the data...
    start_date1 = tp1_daily_ppo.orig_data.index[0]
    end_date1 = tp1_daily_ppo.orig_data.index[-1]
    start_date2 = tp2_daily_ppo.orig_data.index[0]
    end_date2 = tp2_daily_ppo.orig_data.index[-1]
    
    # Align the start and end dates
    start_date = min(start_date1, start_date2)
    end_date = max(end_date1, end_date2)
    
    print(f"Start Date: {start_date},  End Date: {end_date}")

    # Create an iterable date range from start to end date
    date_range = pd.date_range(start=start_date, end=end_date, freq='D')
    
    traded_dates = []
    trade_counter = 0
    last_trade_exit_date = None
    df_all_trades = pd.DataFrame()
    df_all_convergence = pd.DataFrame()
    
    for win_date in date_range:
        # print(f"Window Date: {win_date}")
        # Calculate the spread for the 30 days prior to the win_date
        win_date_start = win_date 
        win_date_end = win_date + timedelta(days=60)
        print(f"Divergence Window Start Date: {win_date_start},  End Date: {win_date_end}")
        ppos, df_closes, df_detrend, spread_mean, beta = get_trading_pair_spread((tp1_daily_ppo, tp2_daily_ppo), start_date=win_date_start, end_date=win_date_end)    

        # Get the dates when the spread is below 2 standard deviations
        dates_over_mean = df_detrend[df_detrend['Spread'] > df_detrend['Mean_2std_a']].copy()
        # Remove rows in dates_over_mean where Stock_A and Stock_B are 0
        dates_over_mean = dates_over_mean[dates_over_mean['Stock_A'] != 0]
        dates_over_mean = dates_over_mean[dates_over_mean['Stock_B'] != 0]
        saved_spread_mean = spread_mean
        if len(dates_over_mean) > 0:
            # Go forward in time until the spread converges to the mean, using the same beta.
            # Check if the current window has a future date where the spread converges to the mean.
            # Get the first row where the spread is above the mean from dates_over_mean
            
            # Get the first row where the spread is above the mean
            df_trade = dates_over_mean.iloc[0].copy()
            # Get the first date where the spread is above the mean
            ind_date = dates_over_mean.index[0]
            end_dt = ind_date
            
            # Make sure that we are paste the last trade exit date
            if last_trade_exit_date is not None and ind_date < last_trade_exit_date:
                continue
            
            # Get the current actual price of the Stocks from the df_closes DataFrame
            stock_a_entry = df_closes.loc[ind_date]['Stock_A']
            stock_b_entry = df_closes.loc[ind_date]['Stock_B']
            # Calculate the exit price of the Stocks
            stock_a_exit = stock_b_entry * beta
            stock_b_exit = stock_a_entry / beta
            # Calculate the profit/loss of the trade
            # We Short Stock_A and Long Stock_B 
            expected_profit = (stock_a_entry - stock_a_exit) + (stock_b_exit - stock_b_exit)
            # Calculate the quantity of Stock_A and Stock_B to trade
            if beta > 0:
                stock_a_quantity = 1
                stock_b_quantity = beta * stock_a_quantity
            else:
                stock_b_quantity = 1
                stock_a_quantity = beta * stock_b_quantity
            # Add stock_a_exit and stock_b_exit and expected_profit to the trade_entry DataFrame
            # Calculate Stock Quantity to trade
            df_trade['Spread_Mean'] = spread_mean
            df_trade['Beta_HedgeRatio'] = beta
            df_trade['Stock_A_Entry'] = stock_a_entry
            df_trade['Stock_B_Entry'] = stock_b_entry
            df_trade['Stock_A_Quantity'] = stock_a_quantity
            df_trade['Stock_B_Quantity'] = stock_b_quantity
            df_trade['Stock_A_Exit'] = stock_a_exit
            df_trade['Stock_B_Exit'] = stock_b_exit
            df_trade['Expected_Profit'] = expected_profit
            trade_counter += 1
            df_trade['Trade_Counter'] = trade_counter
            
            # Convert all values in df_trade to Decimal values to 5 decimal places
            df_trade_print = df_trade.astype(DecimaldDtype(5))
            print(f"Trade Entry: {df_trade_print}")
            
            # Has this date been traded on before?
            if ind_date not in traded_dates:
                # Go short Stock_A and long Stock_B    
                traded_dates.append(ind_date)
                while True:
                    # Check if the current window has a future date where the spread converges to the mean.
                    convergence = df_detrend[df_detrend.index > ind_date]
                    convergence = convergence[convergence['Spread'] <= spread_mean]
                    if len(convergence) > 0 or df_detrend.index[-1] <= end_dt:
                        break
                    end_dt = (df_detrend.index[-1] + pd.Timedelta(days=1))
                    end_dt_str = end_dt.strftime('%Y-%m-%d')
                    print(f"End Date: {end_date} {end_dt}")
                    print(f"Convergence Window Start Date: {win_date_start},  End Date: {win_date_end}")
                    ppos, df_closes, df_detrend, spread_mean, beta = get_trading_pair_spread((tp1_daily_ppo, tp2_daily_ppo), beta=beta, start_date=win_date_start, end_date=end_dt_str)
                
                # Remove rows in Convergence where Stock_A and Stock_B 0
                convergence = convergence[convergence['Stock_A'] != 0]
                convergence = convergence[convergence['Stock_B'] != 0]
                # Grab the first row in Convergence
                if len(convergence) > 0:
                    df_convergence = convergence.iloc[0].copy()
                    # Get the date of the convergence
                    ind_date = convergence.index[0]
                    # Get the current actual price of the Stocks from the df_closes DataFrame
                    stock_a_exit = df_closes.loc[ind_date]['Stock_A']
                    stock_b_exit = df_closes.loc[ind_date]['Stock_B']
                    
                    # Add the saved_mean to the df_convergence DataFrame
                    df_convergence['Spread_Mean'] = saved_spread_mean
                    df_convergence['New_Spread_Mean'] = spread_mean
                    df_convergence['Beta_HedgeRatio'] = beta
                    # Add the exit prices to the df_convergence DataFrame
                    df_convergence['Stock_A_Exit'] = stock_a_exit
                    df_convergence['Stock_B_Exit'] = stock_b_exit
                    # Get entry value of Stock_A
                    entry_value = df_trade['Stock_A_Entry'] * df_trade['Stock_A_Quantity']
                    # Calculate exit value of Stock_B
                    entry_value += df_trade['Stock_B_Entry'] * df_trade['Stock_B_Quantity']
                    # Calculate the exit value of Stock_A
                    exit_value = df_convergence['Stock_A_Exit'] * df_trade['Stock_A_Quantity']
                    # Calculate the exit value of Stock_B
                    exit_value += df_convergence['Stock_B_Exit'] * df_trade['Stock_B_Quantity']
                    # Calculate the profit/loss of the trade
                    profit_loss = exit_value - entry_value
                    # Add the entry_value, exit_value, and profit_loss to the df_convergence DataFrame
                    df_convergence['Entry_Value'] = entry_value
                    df_convergence['Exit_Value'] = exit_value
                    df_convergence['Profit_Loss'] = profit_loss
                    df_convergence['Trade_Counter'] = trade_counter
                    
                    last_trade_exit_date = ind_date
                    
                    # Add the df_trade row to the df_all_trades DataFrame
                    if len(df_all_trades) == 0:
                        df_all_trades = df_trade
                    else:
                        df_all_trades = pd.concat([df_all_trades, df_trade], axis=1)
                    # Add the df_convergence row to the df_all_convergence DataFrame
                    if len(df_all_convergence) == 0:
                        df_all_convergence = df_convergence
                    else:
                        df_all_convergence = pd.concat([df_all_convergence, df_convergence], axis=1)
                    
                    print(f"Convergence Date: {ind_date}")
                pass
            else:
                continue
            
    return df_all_trades, df_all_convergence
        
df_all_trades, df_all_convergence = simulate_pairs_trading(('AAPL', 'AMX'))

excluded_columns = ['Stock_A', 'Stock_B']

df_all_trades = df_all_trades.transpose()
df_all_trades.loc[:, ~df_all_trades.columns.isin(excluded_columns)] = df_all_trades.loc[:, ~df_all_trades.columns.isin(excluded_columns)].astype(float)

df_all_convergence = df_all_convergence.transpose()
df_all_convergence.loc[:, ~df_all_convergence.columns.isin(excluded_columns)] = df_all_convergence.loc[:, ~df_all_convergence.columns.isin(excluded_columns)].astype(float)

df_all_trades


ObjCache: dict_keys(['AAPL_weekly_ppo', 'AMX_weekly_ppo', 'AMX_daily_ppo'])
Start Date: 2019-07-22 00:00:00,  End Date: 2025-01-12 00:00:00
Divergence Window Start Date: 2019-07-22 00:00:00,  End Date: 2019-09-20 00:00:00
Divergence Window Start Date: 2019-07-23 00:00:00,  End Date: 2019-09-21 00:00:00
Divergence Window Start Date: 2019-07-24 00:00:00,  End Date: 2019-09-22 00:00:00
Divergence Window Start Date: 2019-07-25 00:00:00,  End Date: 2019-09-23 00:00:00
Divergence Window Start Date: 2019-07-26 00:00:00,  End Date: 2019-09-24 00:00:00
Divergence Window Start Date: 2019-07-27 00:00:00,  End Date: 2019-09-25 00:00:00
Divergence Window Start Date: 2019-07-28 00:00:00,  End Date: 2019-09-26 00:00:00
Divergence Window Start Date: 2019-07-29 00:00:00,  End Date: 2019-09-27 00:00:00
Divergence Window Start Date: 2019-07-30 00:00:00,  End Date: 2019-09-28 00:00:00
Divergence Window Start Date: 2019-07-31 00:00:00,  End Date: 2019-09-29 00:00:00
Divergence Window Start Date: 2019-08-01

Unnamed: 0,Stock_A,Stock_B,Spread,Mean_1std_a,Mean_2std_a,Mean_1std_b,Mean_2std_b,Spread_Mean,Beta_HedgeRatio,Stock_A_Entry,Stock_B_Entry,Stock_A_Quantity,Stock_B_Quantity,Stock_A_Exit,Stock_B_Exit,Expected_Profit,Trade_Counter
2020-01-15,4.736952e-15,1.184238e-15,8.884568,5.495224,8.662135,-0.838598,-4.00551,2.328313,4.495107,79.6825,15.75,1.0,4.495107,70.797932,17.726498,8.884568,39.0
2020-05-19,9.473903e-15,1.184238e-15,33.114105,31.238903,33.099158,27.518393,25.658139,29.378648,3.767857,79.7225,12.37,1.0,3.767857,46.608395,21.158577,33.114105,40.0
2020-08-28,4.736952e-15,5.921189e-16,151.330895,145.997405,150.68116,136.629896,131.946142,141.313651,-2.083535,124.8075,12.73,-2.083535,1.0,-26.523395,-59.901815,151.330895,65.0
2020-11-02,-4.736952e-15,-5.921189e-16,74.555645,71.159804,74.399104,64.681202,61.441902,67.920503,3.702547,118.69,11.92,1.0,3.702547,44.134355,32.056315,74.555645,66.0
2021-01-24,1.421085e-14,5.921189e-16,181.769456,178.144374,181.60907,171.214984,167.750288,174.679679,-2.957025,139.07001,14.44,-2.957025,1.0,-42.699446,-47.030375,181.769456,70.0
2021-02-05,-9.473903e-15,5.921189e-16,85.265548,79.230487,84.955926,67.779608,62.054169,73.505048,3.808761,136.75999,13.52,1.0,3.808761,51.494442,35.906692,85.265548,71.0
2021-04-23,-9.473903e-15,-1.184238e-15,226.613291,224.187579,226.414138,219.734459,217.507899,221.961019,-6.317131,134.32001,14.61,-6.317131,1.0,-92.293281,-21.262819,226.613291,72.0
2021-08-12,-9.473903e-15,1.184238e-15,91.876469,88.611059,91.244384,83.344409,80.711084,85.977734,3.372041,149.10001,16.97,1.0,3.372041,57.223541,44.216543,91.876469,83.0
2021-09-01,9.473903e-15,-1.184238e-15,73.015557,70.586281,72.63173,66.495384,64.449936,68.540833,4.296218,154.3,18.92,1.0,4.296218,81.284443,35.91531,73.015557,84.0
2021-12-06,1.894781e-14,-1.184238e-15,119.616,111.982395,117.404663,101.137859,95.715591,106.560127,3.267832,179.45,18.31,1.0,3.267832,59.834,54.914088,119.616,91.0


In [129]:


df_all_convergence


Unnamed: 0,Stock_A,Stock_B,Spread,Mean_1std_a,Mean_2std_a,Mean_1std_b,Mean_2std_b,Spread_Mean,New_Spread_Mean,Beta_HedgeRatio,Stock_A_Exit,Stock_B_Exit,Entry_Value,Exit_Value,Profit_Loss,Trade_Counter
2020-01-29,4.736952e-15,-1.184238e-15,-0.208044,5.495224,8.662135,-0.838598,-4.00551,2.328313,2.328313,4.495107,77.3775,17.26,150.480432,154.963044,4.482611,39.0
2020-05-24,9.473903e-15,-1.184238e-15,29.270891,31.238903,33.099158,27.518393,25.658139,29.378648,29.378648,3.767857,79.7225,13.39,126.330895,130.174109,3.843214,40.0
2020-09-08,4.736952e-15,-5.921189e-16,136.981579,145.997405,150.68116,136.629896,131.946142,141.313651,141.313651,-2.083535,112.0,11.99,-247.310736,-221.365867,25.944868,65.0
2020-11-08,-4.736952e-15,1.776357e-15,67.742959,71.159804,74.399104,64.681202,61.441902,67.920503,67.920503,3.702547,118.69,13.76,162.824355,169.637041,6.812686,66.0
2021-01-25,-4.736952e-15,5.921189e-16,174.659456,178.144374,181.60907,171.214984,167.750288,174.679679,174.679679,-2.957025,131.96001,14.44,-396.793545,-375.769095,21.02445,70.0
2021-02-22,-4.736952e-15,1.184238e-15,71.441412,79.230487,84.955926,67.779608,62.054169,73.505048,73.505048,3.808761,121.26,13.08,188.254432,171.078588,-17.175845,71.0
2021-04-29,-9.473903e-15,1.776357e-15,221.79498,224.187579,226.414138,219.734459,217.507899,221.961019,221.961019,-6.317131,131.46001,14.3,-833.90707,-816.150076,17.756994,72.0
2021-08-26,-9.473903e-15,1.184238e-15,85.643998,88.611059,91.244384,83.344409,80.711084,85.977734,85.977734,3.372041,148.60001,18.67,206.323551,211.556022,5.23247,83.0
2021-09-07,9.473903e-15,-1.184238e-15,67.127048,70.586281,72.63173,66.495384,64.449936,68.540833,68.540833,4.296218,148.97,19.05,235.584443,230.812952,-4.771492,84.0
2021-12-17,1.894781e-14,-1.184238e-15,105.129798,111.982395,117.404663,101.137859,95.715591,106.560127,106.560127,3.267832,171.14,20.2,239.284,237.150202,-2.133798,91.0


In [130]:
df_all_convergence['Profit_Loss'].sum()

156.4848834814208