# Break High of Day (BHOD) Strategy Vectorized Backtesting
In this notebook, instead of performing event-driven backtests, we're vectorizing the process, which exponentially decreases the amount of time it takes for the backtests to run.

Once again, we will specifically focus on the Break High of Day (BHOD) strategy, as it's relatively simple to understand and implement, and we believe it has a high success rate. Furthermore, we believe that we can use this backtesting method to fine-tune the BHOD strategy even further, as we can alter different parameters of the strategy and see how it would have performed with historical data. Once we find the optimal parameters, we ultimately hope to test this strategy in the stock market to see how well it can do.

## Importing Packages

In [2]:
from backtesting import Backtest, Strategy
from backtesting.backtesting import Order, Position, Trade, _Broker
from backtesting.test import GOOG
from backtesting.lib import crossover

import tulipy as ti

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import time
import dask
import dask.dataframe as dd
from dask import delayed

import os
import sys

## Data Transformation / Cleaning Functions
These functions will be used to clean our data for us to use later on.

In [3]:
# Function for converting timestamp and setting it as index
def get_true_timestamp_and_set_as_index(pd_csv_ticker_1min):
    # Converting timestamp to actual timestamp
    pd_csv_ticker_1min['timestamp'] = pd.to_datetime(pd_csv_ticker_1min['timestamp'])
    
    # Setting the index to the timestamp
    pd_csv_ticker_1min.set_index('timestamp', inplace=True)

    return pd_csv_ticker_1min

In [4]:
# Function for filling in missing minutes
def fill_missing_minutes(index_timestamp_1_min, keep_only_PM_to_TD = False, PM_cutoff_time ='04:00'):
    index_timestamp_1_min = index_timestamp_1_min.resample('min').asfreq()

    # Filling in NaN values (volume with 0 and close to the last known close)
    index_timestamp_1_min['volume'] = index_timestamp_1_min['volume'].fillna(0);
    index_timestamp_1_min['close'] = index_timestamp_1_min['close'].ffill();
    
    # Filling in the rest, with the NaN being equal to the previous close
    index_timestamp_1_min['open'] = index_timestamp_1_min['open'].fillna(index_timestamp_1_min['close']);
    index_timestamp_1_min['high'] = index_timestamp_1_min['high'].fillna(index_timestamp_1_min['close']);
    index_timestamp_1_min['low'] = index_timestamp_1_min['low'].fillna(index_timestamp_1_min['close']);
    
    # Making sure there are only weekdays
    index_timestamp_1_min = index_timestamp_1_min[index_timestamp_1_min.index.weekday < 5]

    # Keeping only data from PM_cutoff_time to 3:59pm (if keep_only_PM_to_TD is True)
    if keep_only_PM_to_TD:
        index_timestamp_1_min = index_timestamp_1_min.between_time(PM_cutoff_time, '15:59')

    return index_timestamp_1_min

In [5]:
# Renaming columns with uppercase
def upper_case_OHLCV(no_missing_1_min):
    no_missing_1_min.rename(columns = {'open':'Open', 'high':'High', 'low':'Low', 'close':'Close', 'volume':'Volume'}, inplace = True)

    return no_missing_1_min

# Vectorized Backtesting and Metrics Analysis
For the vectorized backtests, we'll use the dask dataframes like we did previously with the event-driven backtests. However, we'll combine that with new features that can only be done in a vectorized manner, and we'll also be implementing slightly different partitions to speed up the tests.

## Creating a Dictionary of Variables to Input
We create a dictionary of variables (variables_dict) that we can plug into our vectorized backtest to change our outcomes.

In [31]:
# Getting the dictionary of variables
variables_dict = {
    'Trading Start Time (HH:MM)': '09:35',
    'Trading End Time (HH:MM)': '15:55',
    'Use Volume Condition': True,
    'Volume Percent List (%)': [10],
    'Set Volume Average Period (#)': 14,
    'Risk per Trade R ($)': 20,
    'Bid-Ask Spread ($)': 0.06,
    'Commission per Share ($)': 0.0035,
    'Min Comm per Order ($)': 0.35,
    'Exchange Fees ($)': 0.003,
    'Other Fees ($)': 0.0005,
    'Max Trade Amount ($)': 50_000,
    'Use 1st R tp': True,
    'Set 1st R partial (#)': 1,
    'Set 1st R percent (%)': 25,
    'Use 2nd R tp': True,
    'Set 2nd R partial (#)': 2,
    'Set 2nd R percent (%)': 25,
    'Use 3rd R tp': True,
    'Set 3rd R partial (#)': 3,
    'Set 3rd R percent (%)': 50,
    'Use 4th R tp': False,
    'Set 4th R partial (#)': 4,
    'Set 4th R percent (%)': 0,
    'Move SL After 1st tp': True,
    'Move SL After 2nd tp': False,
    'Move SL After 3rd tp': False
}

In [32]:
# Getting an original dictionary of variables (copied from above, but just easy for resetting purposes)
variables_dict_orig = {
    'Trading Start Time (HH:MM)': '09:35',
    'Trading End Time (HH:MM)': '15:55',
    'Use Volume Condition': True,
    'Volume Percent List (%)': [10],
    'Set Volume Average Period (#)': 14,
    'Risk per Trade R ($)': 20,
    'Bid-Ask Spread ($)': 0.06,
    'Commission per Share ($)': 0.0035,
    'Min Comm per Order ($)': 0.35,
    'Exchange Fees ($)': 0.003,
    'Other Fees ($)': 0.0005,
    'Max Trade Amount ($)': 50_000,
    'Use 1st R tp': True,
    'Set 1st R partial (#)': 1,
    'Set 1st R percent (%)': 25,
    'Use 2nd R tp': True,
    'Set 2nd R partial (#)': 2,
    'Set 2nd R percent (%)': 25,
    'Use 3rd R tp': True,
    'Set 3rd R partial (#)': 3,
    'Set 3rd R percent (%)': 50,
    'Use 4th R tp': False,
    'Set 4th R partial (#)': 4,
    'Set 4th R percent (%)': 0,
    'Move SL After 1st tp': True,
    'Move SL After 2nd tp': False,
    'Move SL After 3rd tp': False
}

## Delayed Function for Vectorized Backtesting
We create this delayed function so we can compute (i.e., execute) the backtest when we're ready.

In [33]:
# Defining a delayed function that basically doesn't compute until dask.compute() is run (which allows the parallel running to execute)
@delayed
def run_vec_backtest_on_part(data_partition, backtest_strategy, var_dict):
    
    if data_partition.empty:  # Skip if the partition is on the weekend or holiday (and thus will have empty data)
        return None

    trade_log = backtest_strategy(data_partition, var_dict)

    return trade_log

## Vectorized Backtesting Function
Instead of having a singular class, we simply have a function that can run our backtests for us. In this vectorized backtesting format, instead of sequentially running through all the data, we can utilize the pandas to vectorize that process using a different set of logic. This allows for much faster computation and data processing, and it exponentially reduces the amount of time we need to run these backtests.

If you're unfamiliar with the Break High of Day strategy itself, there are YouTube videos and other websites (e.g., Bear Bull Traders) that explain the strategy extensively. For a simplified summary, the strategy involves checking when the price of a stock breaks its previous high of the day, and after confirming a series of conditions, enters a long trade. To exit a trade, one of three conditions must be met: 1) at least the high of a 1-minute candlestick (CS) reaches a desired take-profit (TP) level, 2) at least the low of a 1-minute CS reaches the stop loss (SL) or breakeven point (BE), or 3) we reach 3:55pm with neither previous condition being met. In all cases above, we only trade during the regular trading hours, and we never hold overnight positions.

In addition to the concepts mentioned above, this vectorized backtest allows us to test different volume conditions with minimal time increase, and it fully allows for variable take-profit levels and variable SL and BE positions. We can see some of the results from running this backtest below.

In [46]:
# BHOD Full Vectorized Strategy
def BHOD_Full_Vec_v1(partitioned_data, var_dict):
    
    # Making a copy of the partitioned data
    df_1min = partitioned_data.copy()

    # Part 1: Initializing variables and indicators to get the final BHOD condition:

    # Getting date and time separate for easier use
    df_1min['date'] = df_1min.index.date
    df_1min['time'] = df_1min.index.time

    # Making sure we're only trading within proper trading hours
    start_time = var_dict['Trading Start Time (HH:MM)']
    end_time = var_dict['Trading End Time (HH:MM)']

    df_1min['valid_trading_time'] = (
        df_1min.time >= pd.to_datetime(start_time).time()) & (df_1min.time <= pd.to_datetime(end_time).time())

    # Defining a function to add the moving averages
    def add_ma(df, period, ma_col, new_col_name):
        df[new_col_name] = df[ma_col].rolling(period).mean()

    # Adding volume moving average and another column with volume thresholds of XX%
    add_ma(df_1min, var_dict['Set Volume Average Period (#)'], 'Volume', 'volume_MA')
    volume_percent_list = var_dict['Volume Percent List (%)']
    volume_percent_list.sort()  # Sorting lowest to highest

    # Creating counter for 1st volume condition, indic cols for final trades table, dict of volume index lists, and recent exits dict
    volume_cond_count = 0  # Counter for first volume condition
    vol_conds_list = []  # Indicator columns for final trades table
    vol_cond_dict = {}  # Initializing a dictionary of lists for true volume condition indexes at each threshold
    recent_exits_dict = {}  # Recent exits dictionary
    
    for vol_perc in volume_percent_list:
        vol_thresh_name = 'w_vol_thresh_' + str(vol_perc) + '_perc'  # Setting volume threshold name
        vol_thresh_name_series = df_1min['volume_MA'] * (1 + vol_perc / 100)  # Setting volume threshold

        # Adding in volume indicator (if "Use Volume Condition" is True)
        vol_cond_name = 'volume_condition_' + str(vol_perc)
        vol_cond_name_series = not var_dict['Use Volume Condition'] or (df_1min['Volume'] >= vol_thresh_name_series)

        # Setting the first (lowest) volume percent as the first condition
        if not volume_cond_count:  # If it's not False (i.e., if volume_cond_count is 0, execute this statement)
            df_1min['volume_condition_first'] = vol_cond_name_series
        volume_cond_count += 1

        # Appending volume condition name to vol_conds_list
        vol_conds_list.append(vol_cond_name)

        # Inputting the lists of indexes that meet the volume condition threshold into vol_cond_dict
        vol_cond_dict[vol_cond_name] = vol_cond_name_series[vol_cond_name_series].index.tolist()
        
        # Initializing the most recent exits for each condition (starting with some early date)
        recent_exits_dict[vol_cond_name] = [pd.to_datetime('1950-01-01 09:00')]
    
    # Getting the HOD and previous HOD
    _filtered1 = df_1min[df_1min['time'] >= pd.to_datetime('09:30:00').time()].copy()  # Filtering rows starting at 9:30am each day
    _filtered1['HOD'] = _filtered1.groupby('date')['High'].cummax()  # Computing HOD starting at 9:30am each day
    df_1min = df_1min.merge(
        _filtered1[['HOD']], how = 'left', left_index = True, right_index = True)  # Merging HOD column back to original df
    df_1min['prev_HOD'] = df_1min.HOD.shift(1)  # Getting the prev_HOD

    # Setting condition that the current high must be greater than previous HOD
    df_1min['broke_prev_HOD_cond'] = df_1min.High > df_1min.prev_HOD

    # Setting non-continuous BHOD condition (i.e., previous 2 CS did not also break HOD)
    df_1min['non_continuous_BHOD'] = (~df_1min.broke_prev_HOD_cond.shift(1).fillna(False).astype(bool)) & (
        ~df_1min.broke_prev_HOD_cond.shift(2).fillna(False).astype(bool))

    # Getting final BHOD_condition
    df_1min['combined_BHOD_condition'] = (
        df_1min.valid_trading_time & df_1min.volume_condition_first & df_1min.broke_prev_HOD_cond & df_1min.non_continuous_BHOD)

    # Dropping unnecessary columns to save space
    df_1min = df_1min.drop(['date', 'time', 'valid_trading_time', 'volume_MA', 'volume_condition_first', 'HOD', 'broke_prev_HOD_cond',
        'non_continuous_BHOD'], axis=1)


    # Part 2: Initializing entry and exit variables to ultimately perform a trade:

    # Setting risk per trade in dollars (R)
    risk_per_trade_R = var_dict['Risk per Trade R ($)']
    
    # Introducing slippage (from bid-ask spread) in
    BA_spread = var_dict['Bid-Ask Spread ($)']
    slippage = BA_spread / 2
    
    # Max trade amount:
    max_trade_amount = var_dict['Max Trade Amount ($)']
    
    # Some useful metrics that will be used to check whether trade is above max trade amount
    _sep_w_slippage = 0.0
    _BHOD_SL = 0.0
    _dynamic_shares_total = 0.0
    _total_entry_amt = 0.0

    # Setting partials and percent take profit (using R:R-based TP)
    use_1st_RR_tp = var_dict['Use 1st R tp']
    set_1st_R_partial = var_dict['Set 1st R partial (#)']
    set_1st_R_percent = var_dict['Set 1st R percent (%)']
    use_2nd_RR_tp = var_dict['Use 2nd R tp']
    set_2nd_R_partial = var_dict['Set 2nd R partial (#)']
    set_2nd_R_percent = var_dict['Set 2nd R percent (%)']
    use_3rd_RR_tp = var_dict['Use 3rd R tp']
    set_3rd_R_partial = var_dict['Set 3rd R partial (#)']
    set_3rd_R_percent = var_dict['Set 3rd R percent (%)']
    use_4th_RR_tp = var_dict['Use 4th R tp']
    set_4th_R_partial = var_dict['Set 4th R partial (#)']
    set_4th_R_percent = var_dict['Set 4th R percent (%)']
    
    # Initializing dynamic shares calculation
    dynamic_shares_total = 0
    dynamic_1st_R_shares = 0
    dynamic_2nd_R_shares = 0
    dynamic_3rd_R_shares = 0
    dynamic_4th_R_shares = 0

    # Initializing when to move stop loss
    move_SL_after_1st_tp = var_dict['Move SL After 1st tp']
    move_SL_after_2nd_tp = var_dict['Move SL After 2nd tp']
    move_SL_after_3rd_tp = var_dict['Move SL After 3rd tp']


    # Part 3: Initializing columns to run the backtest:

    # Setting full entry condition that includes test for max entry amount
    df_1min['full_entry_condition'] = False
    
    # Initializing stored entry price (sep) with slippage with NaN
    df_1min['sep_w_slippage'] = np.nan
    
    # Defining the previous midpoint and individual stop losses
    # (For BHOD, sl is at the lower of the current CS low or at the midpoint of the open and close of the previous CS)
    prev_midpoint_series = (df_1min.Open.shift(1) + df_1min.Close.shift(1)) / 2
    df_1min['indiv_sl'] = np.minimum(prev_midpoint_series, df_1min.Low)
    
    # Initializing true BHOD stop losses and breakeven price
    df_1min['BHOD_SL'] = np.nan
    df_1min['BHOD_BE'] = np.nan
    
    # Initializing the take profit prices
    df_1min['TP_Long_1st_R_price'] = np.nan
    df_1min['TP_Long_2nd_R_price'] = np.nan
    df_1min['TP_Long_3rd_R_price'] = np.nan
    df_1min['TP_Long_4th_R_price'] = np.nan
    
    # Setting selling conditions
    df_1min['hit_SL'] = False
    df_1min['hit_BE'] = False
    df_1min['hit_1st_TP'] = False
    df_1min['hit_2nd_TP'] = False
    df_1min['hit_3rd_TP'] = False
    df_1min['hit_4th_TP'] = False
    
    # Initializing take profit indicators  
    df_1min['hit_SL_cum'] = 0.0
    df_1min['hit_BE_cum'] = 0.0
    df_1min['hit_1st_TP_cum'] = 0.0
    df_1min['hit_2nd_TP_cum'] = 0.0
    df_1min['hit_3rd_TP_cum'] = 0.0
    df_1min['hit_4th_TP_cum'] = 0.0
    
    # Initializing the shares with NaN and exit amounts with 0
    df_1min['shares'] = np.nan
    df_1min['exit_amt_sl'] = 0.0
    df_1min['exit_amt_be'] = 0.0
    df_1min['exit_amt_1st_tp'] = 0.0
    df_1min['exit_amt_2nd_tp'] = 0.0
    df_1min['exit_amt_3rd_tp'] = 0.0
    df_1min['exit_amt_4th_tp'] = 0.0
    df_1min['exit_amt_3_55_pm'] = 0.0

    # Initializing the trades table for analysis at the end
    trades_table_cols = ['Size', 'EntryPrice', 'ExitPrice', 'PnL', 'PnL_AF', 'EntryTime', 'ExitTime', 'Duration'] + vol_conds_list
    trades_table = pd.DataFrame(columns = trades_table_cols)


    # Part 4: Getting potential entry points and executing the trades    
    
    # Getting the list of all potential entry points:
    potential_entry_timestamps = df_1min[df_1min.combined_BHOD_condition].index.tolist()

    # Executing the strategy for each timestamp
    for timestamp in potential_entry_timestamps:
    
        # Step 1: Initializing some variables to make things easier
        
        # Getting a copy of just the timestamp row (observation) to make operations faster
        timestamp_row = df_1min.loc[timestamp].copy(deep = True)

        
        # Step 2: Setting the entry price, stop loss, and take profit prices for this entry row (Series)
    
        # Setting variables for max entry amount condition (to be used later if this condition is met)
        _sep_w_slippage = np.minimum((timestamp_row['prev_HOD'] + slippage), timestamp_row['High'])
        # ^Setting stored entry price with slippage
        _BHOD_SL = timestamp_row.indiv_sl  # Setting the true BHOD sl for this timestamp
        _dynamic_shares_total = int(np.floor(risk_per_trade_R / (_sep_w_slippage - _BHOD_SL)))  # Setting the total dynamic shares
        _total_entry_amt = _dynamic_shares_total * _sep_w_slippage  # Setting total purchase amount
    
        # Before doing anything else, checking to see if entry will be larger than the specified max entry amount
        if _total_entry_amt > max_trade_amount:
            continue
        timestamp_row['full_entry_condition'] = True  # Setting full entry condition to True if max entry amount condition is met
        
        # Filling in the stored entry price with slippage, BHOD SL, and BHOD BE for this timestamp
        timestamp_row['sep_w_slippage'] = _sep_w_slippage
        timestamp_row['BHOD_SL'] = _BHOD_SL
        timestamp_row['BHOD_BE'] = _sep_w_slippage
    
        # Setting the take profit prices for this timestamp
        entry_minus_SL = _sep_w_slippage - _BHOD_SL
        timestamp_row['TP_Long_1st_R_price'] = entry_minus_SL * set_1st_R_partial + _sep_w_slippage
        timestamp_row['TP_Long_2nd_R_price'] = entry_minus_SL * set_2nd_R_partial + _sep_w_slippage
        timestamp_row['TP_Long_3rd_R_price'] = entry_minus_SL * set_3rd_R_partial + _sep_w_slippage
        timestamp_row['TP_Long_4th_R_price'] = entry_minus_SL * set_4th_R_partial + _sep_w_slippage

        
        # Step 3: Setting share sizes for everything

        # Setting true dynamic shares total variable as the test variable for dynamic shares total
        dynamic_shares_total = _dynamic_shares_total
        
        # Setting take profit share sizes (Note: depending on the % splits, the shares may sometimes be 0; but that's fine; let it be)
        if use_1st_RR_tp:
            dynamic_1st_R_shares = int(np.floor(dynamic_shares_total * set_1st_R_percent / 100))
        # 2R is basically taking the remaining shares after 1st R and calculating 2nd R share size from that
        # (instead of doing it all at once in the beginning)
        if use_2nd_RR_tp:
            dynamic_2nd_R_shares = int(np.floor(
                (dynamic_shares_total - dynamic_1st_R_shares) * set_2nd_R_percent / (100 - set_1st_R_percent))  )
        # 3R follows suit
        if use_3rd_RR_tp:
            dynamic_3rd_R_shares = int(np.floor(
                (dynamic_shares_total - dynamic_1st_R_shares - dynamic_2nd_R_shares) * 
                set_3rd_R_percent / (100 - set_1st_R_percent - set_2nd_R_percent)) )
        # 4R follows suit as well (if applied)
        if use_4th_RR_tp:
            dynamic_4th_R_shares = int(np.floor(
                (dynamic_shares_total - dynamic_1st_R_shares - dynamic_2nd_R_shares - dynamic_3rd_R_shares) * 
                set_4th_R_percent / (100 - set_1st_R_percent - set_2nd_R_percent - set_3rd_R_percent))  )

        # Setting shares to the default
        timestamp_row['shares'] = dynamic_shares_total

        # Putting the timestamp_row back into the original dataframe (doing each cell individually because it's much faster)
        for col in df_1min.columns:
            df_1min.at[timestamp, col] = timestamp_row[col]

        
        # Step 4: Forward filling the sl, be, and tp prices for the day (to save time and space)
    
        # Creating entry to eod dataframe (df_eeod) so we don't need to constantly use the cond_ent_EOD condition from before
        df_eeod = df_1min.loc[(df_1min.index >= timestamp) & (df_1min.index.date == timestamp.date())].copy(deep = True)

        # Forward filling entries, exits, and shares
        df_eeod['sep_w_slippage'] = df_eeod['sep_w_slippage'].ffill()
        df_eeod['BHOD_SL'] = df_eeod['BHOD_SL'].ffill()
        df_eeod['BHOD_BE'] = df_eeod['BHOD_BE'].ffill()
        df_eeod['TP_Long_1st_R_price'] = df_eeod['TP_Long_1st_R_price'].ffill()
        df_eeod['TP_Long_2nd_R_price'] = df_eeod['TP_Long_2nd_R_price'].ffill()
        df_eeod['TP_Long_3rd_R_price'] = df_eeod['TP_Long_3rd_R_price'].ffill()
        df_eeod['TP_Long_4th_R_price'] = df_eeod['TP_Long_4th_R_price'].ffill()
        df_eeod['shares'] = df_eeod['shares'].ffill()


        # Step 5: Checking selling conditions and setting indicators
    
        # Setting conditions when price hits tp level
        hit_TP_Long_1st_R_price_f = df_eeod['High'] >= df_eeod['TP_Long_1st_R_price']
        hit_TP_Long_2nd_R_price_f = df_eeod['High'] >= df_eeod['TP_Long_2nd_R_price']
        hit_TP_Long_3rd_R_price_f = df_eeod['High'] >= df_eeod['TP_Long_3rd_R_price']
        hit_TP_Long_4th_R_price_f = df_eeod['High'] >= df_eeod['TP_Long_4th_R_price']

        # If price hits any of the tp conditions, set those to True
        df_eeod.loc[hit_TP_Long_1st_R_price_f, 'hit_1st_TP'] = True
        df_eeod.loc[hit_TP_Long_2nd_R_price_f, 'hit_2nd_TP'] = True
        df_eeod.loc[hit_TP_Long_3rd_R_price_f, 'hit_3rd_TP'] = True
        df_eeod.loc[hit_TP_Long_4th_R_price_f, 'hit_4th_TP'] = True
        
        # Using the cumsum() function to get cumulative indicators
        df_eeod['hit_1st_TP_cum'] = df_eeod['hit_1st_TP'].cumsum()
        df_eeod['hit_2nd_TP_cum'] = df_eeod['hit_2nd_TP'].cumsum()
        df_eeod['hit_3rd_TP_cum'] = df_eeod['hit_3rd_TP'].cumsum()
        df_eeod['hit_4th_TP_cum'] = df_eeod['hit_4th_TP'].cumsum()

        # If price falls below SL, set hit_SL to True
        df_eeod.loc[df_eeod['Low'] < df_eeod['BHOD_SL'], 'hit_SL'] = True

        # If price falls below BE, it's not the same CS as the entry, and XXX tp has been hit, set hit_BE to True
        cond_low_below_be = df_eeod['Low'] < df_eeod['BHOD_BE']
        cond_not_entry_CS = df_eeod.index != timestamp
        cond_1st_tp_hit_true = move_SL_after_1st_tp & (df_eeod['hit_1st_TP_cum'] > 0)  # Do NOT get rid of these parentheses
        cond_2nd_tp_hit_true = move_SL_after_2nd_tp & (df_eeod['hit_2nd_TP_cum'] > 0)
        cond_3rd_tp_hit_true = move_SL_after_3rd_tp & (df_eeod['hit_3rd_TP_cum'] > 0)
        cond_XXX_tp_hit = (cond_1st_tp_hit_true | cond_2nd_tp_hit_true | cond_3rd_tp_hit_true)
        
        df_eeod.loc[cond_low_below_be & cond_not_entry_CS & cond_XXX_tp_hit, 'hit_BE'] = True

        # Getting cumulative indicators for stop loss and breakeven
        df_eeod['hit_SL_cum'] = df_eeod['hit_SL'].cumsum()
        df_eeod['hit_BE_cum'] = df_eeod['hit_BE'].cumsum()
        

        # Step 6: Removing the shares based on conditions and adding in total prices
        
        # Getting stop loss conditions (first time hitting the stop loss)
        prev_hit_SL_cum = df_eeod['hit_SL_cum'].shift(1, fill_value = 0)  # Just checking for previous CS SL
        prev_hit_SL_cum_is_0 = prev_hit_SL_cum == 0  # Prev CS did NOT hit SL
        hit_SL_cum_is_1 = df_eeod['hit_SL_cum'] == 1  # Current cumulative SL is 1

        cond_hit_SL = hit_SL_cum_is_1 & prev_hit_SL_cum_is_0  # First time hitting SL (SL indicator previously was 0 and current is 1)
        cond_not_hit_SL = ~cond_hit_SL  # Not the first time hitting SL
        
        #####
        
        # Getting the 1st tp level exit amount
        prev_hit_1st_TP_cum = df_eeod['hit_1st_TP_cum'].shift(1, fill_value=0)
        hit_1st_TP_cum_is_1 = df_eeod['hit_1st_TP_cum'] == 1
        prev_hit_1st_TP_cum_is_0 = prev_hit_1st_TP_cum == 0

        cond_1st_tp_1 = hit_1st_TP_cum_is_1 & prev_hit_1st_TP_cum_is_0  # 1st time hitting 1st tp level
        cond_1st_tp_f = cond_1st_tp_1 & cond_not_hit_SL  # "Full" condition 1

        final_exit_amt_1st_tp = timestamp_row['TP_Long_1st_R_price'] * dynamic_1st_R_shares
        df_eeod.loc[cond_1st_tp_f, 'exit_amt_1st_tp'] = final_exit_amt_1st_tp

        #####
        
        # Getting the 2nd tp level exit amount
        prev_hit_2nd_TP_cum = df_eeod['hit_2nd_TP_cum'].shift(1, fill_value = 0)
        hit_2nd_TP_cum_is_1 = df_eeod['hit_2nd_TP_cum'] == 1
        prev_hit_2nd_TP_cum_is_0 = prev_hit_2nd_TP_cum == 0

        cond_2nd_tp_1 = hit_2nd_TP_cum_is_1 & prev_hit_2nd_TP_cum_is_0  # 1st time hitting 2nd tp level
        cond_2nd_tp_2 = cond_not_hit_SL | (~cond_1st_tp_1 & move_SL_after_1st_tp)
        # Originally, this is basically: "Yes if NOT hitting sl AND FIRST 1st tp on this CS as well"
        #   ^This condition is to make sure you didn't hit the SL along with the 1st tp and 2nd tp levels together
        #   ^If you've already hit the 1st tp previously, cond_1st_tp_1 will be False, making that part of the cond True
        # The second "&" condition is to check if you have move_SL_after_1st_tp on; if you do, then cond_1st_tp_1 determines all
        #   ^If move_SL_after_1st_tp is False, that means you're either moving the SL later or not moving the SL ever
        #   ^In which case, the 2nd grouped cond is False, so just check if SL has been hit (making it the same as the 1st tp cond)
        cond_2nd_tp = cond_2nd_tp_1 & cond_2nd_tp_2
        
        final_exit_amt_2nd_tp = timestamp_row['TP_Long_2nd_R_price'] * dynamic_2nd_R_shares
        df_eeod.loc[cond_2nd_tp, 'exit_amt_2nd_tp'] = final_exit_amt_2nd_tp

        #####
        
        # Getting the 3rd tp level exit amount
        prev_hit_3rd_TP_cum = df_eeod['hit_3rd_TP_cum'].shift(1, fill_value = 0)
        hit_3rd_TP_cum_is_1 = df_eeod['hit_3rd_TP_cum'] == 1
        prev_hit_3rd_TP_cum_is_0 = prev_hit_3rd_TP_cum == 0

        cond_3rd_tp_1 = hit_3rd_TP_cum_is_1 & prev_hit_3rd_TP_cum_is_0
        cond_3rd_tp_2 = cond_2nd_tp_2 | (~cond_2nd_tp_1 & move_SL_after_2nd_tp)
        # If you don't hit SL, then you're obviously fine. But if you do, depending on when you move SL, one of the cond will trigger:
        # If move_SL_after_1st_tp is True, then this new grouped "or" condition is False, so just check if it's the 1st time you hit
        #   ^the 1st tp level. If it is, even if cond_3rd_tp_1 is hit, it doesn't matter b/c SL takes priority.
        # If move_SL_after_2nd_tp is True, then the previous grouped "or" condition is False, so now check if it's the 1st time you
        #   ^hit the 2nd tp level. If it is, again, it doesn't matter b/c SL takes priority. Note in this scenario that it doesn't
        #   ^matter whether you've hit the 1st tp. If you have previously, then whatever, SL is still in place. But even if you also
        #   ^just hit the 1st tp along with the 2nd and 3rd, we already cover that with cond_2nd_tp_1, since in this case where you
        #   ^haven't hit the 1st tp level previously, you necessarily will hit the 1st tp before you hit the second.
        cond_3rd_tp = cond_3rd_tp_1 & cond_3rd_tp_2
        
        final_exit_amt_3rd_tp = timestamp_row['TP_Long_3rd_R_price'] * dynamic_3rd_R_shares
        df_eeod.loc[cond_3rd_tp, 'exit_amt_3rd_tp'] = final_exit_amt_3rd_tp

        #####

        # Getting the 4th tp level exit amount
        prev_hit_4th_TP_cum = df_eeod['hit_4th_TP_cum'].shift(1, fill_value = 0)
        hit_4th_TP_cum_is_1 = df_eeod['hit_4th_TP_cum'] == 1
        prev_hit_4th_TP_cum_is_0 = prev_hit_4th_TP_cum == 0

        cond_4th_tp_1 = hit_4th_TP_cum_is_1 & prev_hit_4th_TP_cum_is_0
        cond_4th_tp_2 = cond_3rd_tp_2 | (~cond_3rd_tp_1 & move_SL_after_3rd_tp)  # Same case as above
        cond_4th_tp = cond_4th_tp_1 & cond_4th_tp_2
        
        final_exit_amt_4th_tp = timestamp_row['TP_Long_4th_R_price'] * dynamic_4th_R_shares
        df_eeod.loc[cond_4th_tp, 'exit_amt_4th_tp'] = final_exit_amt_4th_tp

        #####

        # Condition to remove shares (rolling) if condition is met
        cond_1st_TP_shares = df_eeod['hit_1st_TP_cum'] > 0
        cond_2nd_TP_shares = df_eeod['hit_2nd_TP_cum'] > 0
        cond_3rd_TP_shares = df_eeod['hit_3rd_TP_cum'] > 0
        cond_4th_TP_shares = df_eeod['hit_4th_TP_cum'] > 0

        # Removing shares if ### tp is hit (rolling)
        df_eeod.loc[cond_1st_TP_shares, 'shares'] = (dynamic_shares_total - dynamic_1st_R_shares)
        df_eeod.loc[cond_2nd_TP_shares, 'shares'] = (dynamic_shares_total - dynamic_1st_R_shares - dynamic_2nd_R_shares)
        df_eeod.loc[cond_3rd_TP_shares, 'shares'] = (
            dynamic_shares_total - dynamic_1st_R_shares - dynamic_2nd_R_shares - dynamic_3rd_R_shares)
        df_eeod.loc[cond_4th_TP_shares, 'shares'] = (
            dynamic_shares_total - dynamic_1st_R_shares - dynamic_2nd_R_shares - dynamic_3rd_R_shares - dynamic_4th_R_shares)

        #####

        # Putting SL condition here and using previous CS shares to get original share size before previous block of code changed it
        can_move_SL_bc_1st_tp_not_hit = move_SL_after_1st_tp & prev_hit_1st_TP_cum_is_0  # Can move SL b/c 1st tp has not been hit
        can_move_SL_bc_2nd_tp_not_hit = move_SL_after_2nd_tp & prev_hit_2nd_TP_cum_is_0
        can_move_SL_bc_3rd_tp_not_hit = move_SL_after_3rd_tp & prev_hit_3rd_TP_cum_is_0
        cond_sl = cond_hit_SL & (can_move_SL_bc_1st_tp_not_hit | can_move_SL_bc_2nd_tp_not_hit | can_move_SL_bc_3rd_tp_not_hit)
        # Need this b/c SL ONLY triggers before/when XXX tp level condition is met, so you need to figure out when you move SL
        #    ^The later you move SL, the more likely that cond_sl will be true (since there's more opportunities for SL to be hit)
        # Btw, SL never triggers on the 1st CS

        try:
            # Getting final sl exit amount by multiplying the [higher of (SL - slippage) and the Low] and the remaining shares
            sl_exit_price = np.maximum(timestamp_row['BHOD_SL'] - slippage, df_eeod.loc[cond_sl, 'Low'].iloc[0])
            sl_prev_timestamp = df_eeod[cond_sl].index[0] - pd.Timedelta(minutes=1)  # Basically the time when SL hit minus one min
            sl_exit_shares = df_eeod.loc[sl_prev_timestamp, 'shares']
            df_eeod.loc[cond_sl, 'exit_amt_sl'] = sl_exit_price * sl_exit_shares
        except: pass

        #####

        # Breakeven condition check (since it relies on the shares already calculated)
        cond_hit_BE = df_eeod['hit_BE_cum'] == 1
        prev_hit_BE_cum = df_eeod['hit_BE_cum'].shift(1, fill_value = 0)
        prev_hit_BE_cum_is_0 = prev_hit_BE_cum == 0

        cond_be_1 = cond_hit_BE & prev_hit_BE_cum_is_0
        cond_be = cond_be_1 & cond_4th_tp_2
        # Basically, SL trumps BE ONLY when they happen at the same time ALONG with whichever tp level you move SL
        #   ^Btw, you don't need the [ (move_SL_after_XXX_tp & cond_XXX_tp_2 ) or ... ] because cond_4th_tp_2 already has conditions
        #   ^baked into it for the 3 move_SL_after_XXX_tp levels. Essentially, this second part of the condition is used to make sure
        #   ^that you don't hit SL and BE on the same CS, and cond_4th_tp_2 covers all of them.
        # Another way to think about it is that it's essentially the same as the code below, but just faster:
        # cond_be = cond_be_1 & ( (move_SL_after_1st_tp & cond_2nd_tp_2 ) | (move_SL_after_2nd_tp & cond_3rd_tp_2 )
        #                       | (move_SL_after_3rd_tp & cond_4th_tp_2 ) )

        try:
            # Getting final be exit amount by multiplying the [higher of (BE - slippage) and the Low] and the remaining shares
            be_exit_price = np.maximum(timestamp_row['BHOD_BE'] - slippage, df_eeod.loc[cond_be, 'Low'].iloc[0])
            be_exit_shares = df_eeod.loc[cond_be, 'shares'].iloc[0]
            df_eeod.loc[cond_be, 'exit_amt_be'] = be_exit_price * be_exit_shares
        except: pass

        #####

        # Setting shares to 0 if either sl or be is hit (need to come after so we can do sl/be exit amount calculations first)
        df_eeod.loc[(df_eeod['hit_SL_cum'] > 0) | (df_eeod['hit_BE_cum'] > 0), 'shares'] = 0.0

        #####

        # Getting out completely at 3:55pm at the Close price
        exit_cond_3_55_pm = df_eeod.index.time == pd.to_datetime('15:55:00').time()

        # Setting total exit amount at 3:55pm
        eod_exit_price = df_eeod.loc[exit_cond_3_55_pm, 'Close'].iloc[0]
        eod_exit_shares = df_eeod.loc[exit_cond_3_55_pm, 'shares'].iloc[0]
        df_eeod.loc[(exit_cond_3_55_pm & (df_eeod['shares'] != 0)), 'exit_amt_3_55_pm'] = eod_exit_price * eod_exit_shares
        df_eeod.loc[exit_cond_3_55_pm, 'shares'] = 0.0  # Setting shares to 0 at 3:55pm

        
        # Step 7: Filling in the trades table

        # Getting the indices to get the correct exit amounts
        cond_first_0_shares_ind = df_eeod['shares'] == 0
        first_zero_shares_index = (cond_first_0_shares_ind).idxmax() if (cond_first_0_shares_ind).any() else None

        # Getting the exit sums
        ext_amts = df_eeod.loc[df_eeod.index <= first_zero_shares_index, ['exit_amt_1st_tp', 'exit_amt_2nd_tp', 'exit_amt_3rd_tp',
            'exit_amt_4th_tp', 'exit_amt_sl', 'exit_amt_be', 'exit_amt_3_55_pm']].sum(axis = 0)

        # Getting only the ones that had a transaction
        ext_amts_true = ext_amts[ext_amts > 0]

        # Matching the exit amount with their respective shares (assuming this tp level is reached)
        tp_dict = {'exit_amt_1st_tp': dynamic_1st_R_shares, 'exit_amt_2nd_tp': dynamic_2nd_R_shares,
            'exit_amt_3rd_tp': dynamic_3rd_R_shares, 'exit_amt_4th_tp': dynamic_4th_R_shares}
        
        # Getting the values for the trades table
        for exit_type in ext_amts_true.index:

            # If the exit hits a tp level
            if exit_type in ['exit_amt_1st_tp', 'exit_amt_2nd_tp', 'exit_amt_3rd_tp', 'exit_amt_4th_tp']:
                _Size = tp_dict[exit_type]
                _ExitPrice = ext_amts_true[exit_type] / _Size

            # If the exit is at sl
            elif exit_type == 'exit_amt_sl':
                _Size = sl_exit_shares
                _ExitPrice = sl_exit_price

            # If the exit is at be
            elif exit_type == 'exit_amt_be':
                _Size = be_exit_shares
                _ExitPrice = be_exit_price

            # If the exit is at 3:55pm
            else:
                _Size = eod_exit_shares
                _ExitPrice = eod_exit_price

            # Setting fee values
            broker_fees = max(var_dict['Commission per Share ($)'] * _Size, var_dict['Min Comm per Order ($)'])
            total_fees = broker_fees + _Size * (var_dict['Exchange Fees ($)'] + var_dict['Other Fees ($)'])
            
            # Setting remaining values
            _EntryPrice = _sep_w_slippage
            _PnL = (_ExitPrice - _EntryPrice) * _Size
            _PnL_AF = _PnL - total_fees  # PnL After Fees                              
            _EntryTime = timestamp
            _ExitTime = df_eeod[df_eeod[exit_type] > 0].index[0]
            _Duration = _ExitTime - _EntryTime
            
            # Filling in the trades table      
            trades_table_dict = {
                'Size': _Size,
                'EntryPrice': _EntryPrice,
                'ExitPrice': _ExitPrice,
                'PnL': _PnL,
                'PnL_AF': _PnL_AF,
                'EntryTime': _EntryTime,
                'ExitTime': _ExitTime,
                'Duration': _Duration
            }

            # Setting indicators on whether or not to include the trade for that volume condition
            for volume_condition_name in vol_conds_list:

                # _EntryTime is greater than the most recent (last in list) and it's actually supposed to be an entry
                if _EntryTime > recent_exits_dict[volume_condition_name][-1] and _EntryTime in vol_cond_dict[volume_condition_name]:
                    trades_table_dict[volume_condition_name] = True  # Count this trade

                    if exit_type == (ext_amts_true.index)[-1]:  # Appending timestamp onto recent exits if at end of exit_type list
                        recent_exits_dict[volume_condition_name].append(first_zero_shares_index)

                # Otherwise, do not count this trade
                else:
                    trades_table_dict[volume_condition_name] = False  # Don't count this trade

            # Inputting into the trades table
            trades_table.loc[len(trades_table)] = trades_table_dict

        
    # Part 5: Returning the final trades table
    
    return trades_table


## Data Analysis in Function Format
We now define a few functions to analyze our data.

In [35]:
# Defining function to produce data analysis results (trade_results need to come in the correct format (index goes from 0 to n))
def metrics_table(trade_results, strategy_name, starting_cash = 100_000, show_drawdown_plot = False):  

    # Getting the trade PnL for each full trade (not just partials)
    trade_pnl = trade_results.groupby('EntryTime')['PnL'].sum()  # Group by EntryTime (our "Trade ID") and sum PnL for each trade
    
    # Net profit (with no fees and fees)
    net_profit = trade_results['PnL'].sum()
    net_profit_fees = trade_results['PnL_AF'].sum()

    # Average share size
    avg_share_size = trade_results.groupby('EntryTime')['Size'].sum().mean()

    # Profit factor
    pos_trades_dollar_total = trade_pnl[trade_pnl > 0].sum()
    neg_trades_dollar_total = trade_pnl[trade_pnl <= 0].sum()
    profit_factor = pos_trades_dollar_total / neg_trades_dollar_total * -1

    # Win ratio
    num_pos_trades = (trade_pnl > 0).sum()  # Count of full trades (not just partials) with positive PnL
    num_neg_trades = (trade_pnl <= 0).sum()
    num_trades_total = len(trade_pnl)  # Total number of trades
    win_ratio = num_pos_trades / num_trades_total
    
    # Profit/loss per winning/losing trade
    avg_winner_amount = trade_pnl[trade_pnl > 0].mean()
    avg_loser_amount = trade_pnl[trade_pnl <= 0].mean()

    # Risk:reward ratio
    rr_ratio = avg_loser_amount / avg_winner_amount * -1

    # Expected profitability
    loss_ratio = 1 - win_ratio
    expected_profitability = (win_ratio * avg_winner_amount) + (loss_ratio * avg_loser_amount)
    
    # Expected value (same result as expected profitability, but calculated differently)
    expected_value = net_profit / num_trades_total

    # Biggest winner / loser
    biggest_winner = round(max(trade_pnl), 3)
    biggest_loser = round(min(trade_pnl), 3)

    # Equity peak and trough
    cum_equity = trade_results['PnL'].cumsum()
    equity_peak = max(cum_equity)
    equity_trough = min(cum_equity)

    # Maximum winning and losing streaks:
    signs = trade_pnl.apply(lambda x: 1 if x > 0 else -1)  # Converting to binary indicators (1 for (+), -1 for (-) or (0))
    streaks = signs.groupby((signs != signs.shift()).cumsum())  # Identifying consecutive streaks using groupby and cumsum
    # Breaking the above code down: signs != signs.shift() gives True when the sign shifts, and cumsum() takes the sum of the Trues (1) and
    # Falses (0), so when the sign shifts (i.e., another True (i.e., 1)), it gives a different number, and it groups by those same numbers
    max_winning_streak = streaks.apply(lambda x: (x == 1).sum()).max()  # Applying a sum to the 1's in each group (streaks)
    max_losing_streak = streaks.apply(lambda x: (x == -1).sum()).max()  # Applying a sum to the -1's in each group (streaks)

    # Max drawdown
    cum_pnl_series = trade_pnl.cumsum() + starting_cash  # Getting the cumulative pnl as a series
    grouped_pnl_df = cum_pnl_series.to_frame(name = 'cum_pnl')  # Converting the cum_pnl_series to a df
    grouped_pnl_df['running_max'] = grouped_pnl_df['cum_pnl'].cummax()  # Getting the running maximum (the running peak of equity)
    grouped_pnl_df['drawdown'] = grouped_pnl_df['cum_pnl'] - grouped_pnl_df['running_max']  # Getting drawdown for each trade
    max_drawdown = grouped_pnl_df['drawdown'].min()  # Getting the max drawdown (most negative amount)
    max_drawdown_perc = (grouped_pnl_df['drawdown'] / grouped_pnl_df['running_max']).min() * 100  # Getting max drawdown as a percent

    # Max drawdown plot (if show_drawdown_plot set to True) 
    if show_drawdown_plot:
        # Plot cumulative PnL and running maximum
        plt.figure(figsize=(12, 6))
        plt.plot(grouped_pnl_df.index, grouped_pnl_df['cum_pnl'], label = 'Cumulative PnL', color = 'blue')
        plt.plot(grouped_pnl_df.index, grouped_pnl_df['running_max'], label = 'Running Max', color = 'green')
        
        # Plot the drawdown as an area (below the cumulative PnL)
        plt.fill_between(grouped_pnl_df.index, grouped_pnl_df['cum_pnl'], grouped_pnl_df['running_max'], 
                         where = (grouped_pnl_df['cum_pnl'] < grouped_pnl_df['running_max']),
                         color = 'red', alpha = 0.3, label = 'Drawdown')
        
        # Add titles and labels
        plt.title('Maximum Drawdown Visualization')
        plt.xlabel('Date')
        plt.ylabel('PnL')
        
        # Show the legend
        plt.legend()
        
        # Display the plot
        plt.show()

    # Making trade_pnl into a dataframe for deeper analyses
    full_trade_df = trade_pnl.to_frame(name = 'trade_pnl')

    # Average $ used for trade
    full_trade_df['total_trade_size'] = trade_results.groupby('EntryTime')['Size'].sum()
    full_trade_df['entry_amount'] = trade_results.groupby('EntryTime')['EntryPrice'].mean()
    full_trade_df['amount_used_for_trade'] = full_trade_df['total_trade_size'] * full_trade_df['entry_amount']
    avg_amount_used_for_trade = full_trade_df['amount_used_for_trade'].mean()
    
    # Maximum $ used for a trade
    max_amount_used_for_trade = full_trade_df['amount_used_for_trade'].max()

    # Average holding time (from entry to last exit (partial))
    last_holding_times = trade_results.groupby('EntryTime')['Duration'].nth(-1)  # Get the last Duration for each EntryTime
    average_holding_time = last_holding_times.mean().total_seconds() / 60  # Average holding time in minutes

    
    # Average holding time for each partial

    # Creating dataframe for partials analysis
    partials_df = trade_results[['EntryTime', 'Duration', 'PnL']].copy()

    # Assign a position (partial) number within each EntryTime
    partials_df['Trade_Position'] = trade_results.groupby('EntryTime').cumcount() + 1

    # Calculate mean holding time for each position (partial)
    avg_hold_time_by_position = partials_df.groupby('Trade_Position')['Duration'].mean()

    # Convert mean holding times to minutes
    avg_hold_time_by_pos_min = avg_hold_time_by_position.apply(lambda x: x.total_seconds() / 60)
    # This metric is not very useful to us, since it just takes every partial at a particular time and average it out
    
    # Average holding time for each partial number (making it so that we get the avg holding time depending on how many partials it took)
    # Get the last Duration for each EntryBar along with its position number:
    last_holding_times_w_pos = partials_df.groupby('EntryTime').apply(lambda x: x.iloc[-1])[['Duration', 'Trade_Position']]
    avg_hold_time_by_trade_num = last_holding_times_w_pos.groupby('Trade_Position')['Duration'].mean()
    avg_hold_time_by_trade_num_min = avg_hold_time_by_trade_num.apply(lambda x: x.total_seconds() / 60)
    # This might be interesting; in this current test, the time it takes to get out after you do your first partial (2) is actually less
    # than the time it takes to get out after your second partial (3) (10.2 min vs 6.4 min). This could indicate that after the first
    # partial, if it takes too long, it could be could to just get out before the price drops to breakeven (could be worth doing more
    # analysis to see when drop happens).

    # Putting everything into a metrics table
    metrics = {
        'Strategy ID': strategy_name,
        'Net Profit ($)': net_profit,
        'Net Profit w Fees ($)': net_profit_fees,
        'Avg Share Size': avg_share_size,
        'Pos Trades Total ($)': pos_trades_dollar_total,
        'Neg Trades Total ($)': neg_trades_dollar_total,
        'Profit Factor': profit_factor,
        '# Winning Trades': num_pos_trades,
        '# Total Trades': num_trades_total,
        'Win Ratio': win_ratio,
        'R:R Ratio': rr_ratio,
        'Expected Profitability (also EV) ($)': expected_profitability,
        'Avg Winning Trade ($)': avg_winner_amount,
        'Avg Losing Trade ($)': avg_loser_amount,
        'Biggest Winner ($)': biggest_winner,
        'Biggest Loser ($)': biggest_loser,
        'Peak Equity ($)': equity_peak,
        'Trough Equity ($)': equity_trough,
        'Max Win Streak': max_winning_streak,
        'Max Lose Streak': max_losing_streak,
        'Max Drawdown (%)': max_drawdown_perc,
        'Avg Amount Used for Trade ($)': avg_amount_used_for_trade,
        'Max Amount Used for Trade ($)': max_amount_used_for_trade,
        'Avg Holding Time (min)': average_holding_time
    }

    # Converting metrics to metrics_df to make things easier
    metrics_df = pd.DataFrame([metrics])
    metrics_df = metrics_df.round(3)  # Rounding the metrics table to 3 decimal places

    return metrics_df

    # The metrics below I've included for further analysis, but they're more difficult to implement and are not used right now
    
    # Sharpe ratio
    # Supposed to measure volatility of your position, but it's quite difficult to calculate as we don't have the risk-free return nor the
    # true return %s. We're just working with some arbitrarily large amount of cash, so it's hard to specify what our true return % is.


In [36]:
def partials_table_v2(trade_results):

    # Creating dataframe for partials analysis
    partials_df = trade_results[['EntryTime', 'PnL']].copy()

    # Assign a position (partial) number within each EntryTime
    partials_df['Trade_Position'] = trade_results.groupby('EntryTime').cumcount() + 1

    # Creating a column with the total count trades taken for each unique EntryTime (will be useful later)
    total_trades_taken = partials_df['EntryTime'].value_counts()  # Calculate the frequency of each EntryTime
    partials_df['Total_Trades_Taken'] = partials_df['EntryTime'].map(total_trades_taken)  # Map the frequency count to a new column in df
    
    
    # Table 1: Finding the additional amount gained by reaching each tp level

    # Adding a new column to the previous df to include what the TRUE trade position is (e.g., met tp lvl 1, did not meet tp lvl 1, etc.)
    partials_df['Trade_Position_True'] = None

    # Getting a list of the unique trades taken (so 1 to the [# of partials])
    types_of_partials = np.sort(total_trades_taken.unique())
    
    for i in types_of_partials:  # This condition just loops through each unique partial number (usually just [1, 2, 3])

        # Assigning a positive number to represent trades that reached a tp and took a partial profit at that level
        partials_df['Trade_Position_True'] = np.where(
            (partials_df['Trade_Position'] == int(i)) & (partials_df['PnL'] > 0), float(i), partials_df['Trade_Position_True'])
        
        # Assigning a negative number to represent trades that didn't reach that tp level (also includes trades that ended (-) @ EOD close)
        partials_df['Trade_Position_True'] = np.where(
            (partials_df['Trade_Position'] == int(i)) & (partials_df['PnL'] <= 0), -1.0 * float(i), partials_df['Trade_Position_True'])
    
    for i in types_of_partials[:-1]:  # This condition loops through all partials except the last partial

        # Assigning (i - 0.5) to represent the trades that were closed out because of EOD, but ended positive before [i] tp level
        partials_df['Trade_Position_True'] = np.where((partials_df['Trade_Position_True'] == i) & (
            partials_df['Total_Trades_Taken'] == i), float(i) - 0.5, partials_df['Trade_Position_True'])
        
        # For some clarity, this checks whether the trade was (-) or (+) from Trade_Position_True; if it were positive, and it only took
        # [i] number of trades, that means it MUST have gotten stopped out at the EOD, and it stopped positively (since normally a (+)
        # Trade_Position_True would indicate that the trade would move onto the next partial, but since there were only [i] number of
        # trades, that can't have happened)
        
    # Making sure the order of the dataframe is as it's supposed to be (this is the sole purpose of Trade_Position_Adjusted)
    partials_df['Trade_Position_Adjusted'] = partials_df['Trade_Position_True'].apply(  # First creating a column with correct ordering...
        lambda x: abs(x) - 0.9 if x < 0 else x)  # ...by making it so that negative numbers are turned into a value of (tp level - 0.9)
    partials_df = partials_df.sort_values('Trade_Position_Adjusted').reset_index()  # Sort by more appropriate order

    # Getting the frequency of each tp partial status (using Trade_Position_Adjusted)
    frequency_of_trade = partials_df['Trade_Position_Adjusted'].value_counts().sort_index()

    # Creating our final gain_across_tp_levels_df dataframe
    gain_across_tp_levels_df = frequency_of_trade.to_frame(name = 'Frequency').reset_index()

    # Defining a function to apply the conditions below and convert Trade_Position_Adjusted floats to more clear string values
    def convert_float_to_string(value):
        integer_part = int(value)  # Get the integer part
        decimal_part = round(value - integer_part, 1)  # Get the decimal part (rounded to avoid float precision issues)
        
        # Apply conditions
        if decimal_part == 0.1:
            return f"Didn't reach tp level {integer_part + 1}"
        elif decimal_part == 0.5:
            return f"EOD close before tp level {integer_part + 1}"
        elif decimal_part == 0.0:
            return f"Hit tp level {integer_part}"
        else:
            return value  # Leave as is if it doesn't match any conditions

    # Applying the function back to a new Trade Status column
    gain_across_tp_levels_df['Trade Status'] = gain_across_tp_levels_df['Trade_Position_Adjusted'].apply(convert_float_to_string)
    gain_across_tp_levels_df = gain_across_tp_levels_df[['Trade_Position_Adjusted', 'Trade Status', 'Frequency']]  # Reording columns
    gain_across_tp_levels_df = gain_across_tp_levels_df.set_index('Trade_Position_Adjusted')  # To make things easier in the next steps

    # Getting remaining metrics associated with each trade partial
    avg_gain_at_tp_level = partials_df.groupby(
        'Trade_Position_Adjusted', observed = True)['PnL'].mean().sort_index()  # Average gain at each tp level
    total_gain_at_tp_level = partials_df.groupby(
        'Trade_Position_Adjusted', observed = True)['PnL'].sum().sort_index()  # Total gain at tp level
    # Re-sorting partials_df for easier viewing (and potential analyses down the line)
    partials_df = partials_df.sort_values('EntryTime').reset_index(drop = True).drop(columns=['index'])

    # Combining both into the gain_across_tp_levels_df dataframe
    gain_across_tp_levels_df['Avg Gain at TP Level'] = avg_gain_at_tp_level
    gain_across_tp_levels_df['Total Gain at TP Level'] = total_gain_at_tp_level
    

    # Table 2: Number of trades reached (i.e., the number of trades that took place when the SL/BE was hit)

    # Getting the trade PnL for each full trade (not just partials)
    trade_pnl = trade_results.groupby('EntryTime')['PnL'].sum()  # Group by EntryTime (our "Trade ID") and sum PnL for each trade
    full_trade_df = trade_pnl.to_frame(name = 'trade_pnl')  # Making trade_pnl into a dataframe for deeper analyses

    # Count occurrences of each unique trade ID (EntryTime) and check if the last PnL is positive
    def label_occurrence(trade_group):
        last_partial_num = int(types_of_partials[-1])
        count = len(trade_group)  # Number of occurrences for the EntryTime
        last_pnl_positive = trade_group['PnL'].iloc[-1] > 0  # Check if last PnL is positive
        # Label as "[last_partial_num]tp" if last partial and the last PnL is (+) (so the last tp level reached); otherwise just the count
        return f"{count}tp" if count == last_partial_num and last_pnl_positive else str(count)
    
    # Apply the labeling function to each EntryTime group
    full_trade_df['num_trades_at_sl_be'] = trade_results.groupby('EntryTime').apply(label_occurrence)
    
    # Getting metrics associated with the pnl per trade
    frequency_of_counts = full_trade_df['num_trades_at_sl_be'].value_counts().sort_index()  # Count frequency of each labeled occurrence
    avg_pnl_per_trade = full_trade_df.groupby('num_trades_at_sl_be')['trade_pnl'].mean().sort_index()  # Find avg PnL at each partial
    total_pnl_at_stage = full_trade_df.groupby('num_trades_at_sl_be')['trade_pnl'].sum().sort_index()  # Find tot. PnL @ each partial stage

    # Combining the series into one dataframe for easier viewing
    num_trades_at_sl_be_df = pd.concat([frequency_of_counts, avg_pnl_per_trade, total_pnl_at_stage], axis = 1)
    num_trades_at_sl_be_df = num_trades_at_sl_be_df.reset_index()
    num_trades_at_sl_be_df.columns = ['# of Trades at SL/BE', 'Frequency', 'Avg PnL per Trade', 'Total PnL at Stage']

    # Returns a tuple of the two tables
    return gain_across_tp_levels_df, num_trades_at_sl_be_df


In [39]:
def trade_timing_PnL_graph(trade_results, timeframe = 'hour'):
    
    # Getting the trade PnL for each full trade (not just partials)
    trade_pnl = trade_results.groupby('EntryTime')['PnL'].sum()  # Group by EntryTime (our "Trade ID") and sum PnL for each trade

    # Setting figure size
    plt.figure(figsize=(12, 6))
    
    # Getting graph for different timeframes
    if timeframe == 'hour':

        # Grouping trade pnl by hour
        trade_pnl_hour = trade_pnl.groupby(trade_pnl.index.hour).mean()
        trade_pnl_by_time = trade_pnl_hour
    
        # Plot cumulative PnL at different hours
        plt.plot(trade_pnl_hour.index, trade_pnl_hour, label = 'trade_pnl_hour', color = 'blue')
        
        # Add xlabel and title
        plt.xlabel('Entry Hour')
        plt.title('Trade PnL by Hour')

    elif timeframe == 'minute':

        # Grouping trade pnl by minute
        trade_pnl_minute = trade_pnl.groupby(trade_pnl.index.time).mean()

        # Getting the full time index for graph
        full_time_index = pd.date_range(start="09:30", end="16:01", freq="T").time
        
        # Reindex the Series to include all times
        trade_pnl_minute = trade_pnl_minute.reindex(full_time_index).fillna(0)
        trade_pnl_by_time = trade_pnl_minute

        # Convert time to datetime using a reference date (e.g., 2000-01-01)
        reference_date = datetime(2000, 1, 1)
        date_time_index = [datetime.combine(reference_date, t) for t in trade_pnl_minute.index]

        # Plot cumulative PnL at different times
        plt.scatter(date_time_index, trade_pnl_minute, label = 'trade_pnl_minute', color = 'blue', s = 10)

        # Format the x-axis to show only time, with ticks every 30 minutes
        ax = plt.gca()
        ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=30))  # Major ticks every 30 minutes
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))   # Format as HH:MM
        
        # Add xlabel and title
        plt.xlabel('Entry Time')
        plt.title('Trade PnL by Minute (Time)')
        
    # Adding in additional labels + legend and displaying the plot
    plt.ylabel('Average PnL ($)')
    plt.legend()
    plt.show()

    # Return the trade pnl by time (hour or minute)
    return trade_pnl_by_time

# trade_timing_PnL_graph(combined_results_MSFT_3y_v_f, timeframe = 'minute')

## Metrics Table Scripting
We've defined some functions we can use above to analyze our results. However, to make it so that we can automatically analyze different volume thresholds without having to re-run the backtest, we can create another function that automatically takes each volume threshold and checks whether or not it needs to be added to the results.

In [40]:
# Creating a function to get the overall metrics table (for varying volume) from dask df
def dask_to_var_vol_metrics(dask_df_1min, strategy, var_dict, vol_start, vol_end, vol_step):

    # Getting a list of volume percents to cycle through
    volume_perc_list = [i for i in range(vol_start, vol_end, vol_step)]
    var_dict['Volume Percent List (%)'] = volume_perc_list

    # Getting the initial final dataframe
    metrics_table_full = pd.DataFrame()

    # Splitting up dataset into partitions
    partitions_v = dask_df_1min.to_delayed()

    # Apply the function to each partition and get delayed objects for each backtest
    results_v = [run_vec_backtest_on_part(partition, strategy, var_dict) for partition in partitions_v]

    # Compute all results and combine the "results" dataframes into one dataframe using pd.concat
    all_results_v = dask.compute(*results_v)
    all_results_v_filtered = [df for df in all_results_v if not df.empty]  # Making sure there's no NA columns to get rid of warning
    combined_results_v = pd.concat(all_results_v_filtered, axis=0).reset_index(drop = True)

    # Writing up script to test different metrics (e.g., volumes) with the partitions in the full dataset
    for i, volume_perc in enumerate(volume_perc_list):
        
        strat_name = 'BHOD Volume Threshold: ' + str(volume_perc) + '%'  # Naming the strategy
        vol_perc_col_indic = combined_results_v.iloc[:, 8+i]  # Getting the indicator column to include that entry or not
        combined_results_one_vol = combined_results_v[vol_perc_col_indic].reset_index(drop = True)  # Only keeping entries for that vol
    
        # Calculating metrics table for one specific entry condition, and then combining that metrics table to the overall metrics table
        metrics_table_curr = metrics_table(combined_results_one_vol, strat_name, show_drawdown_plot = False)
        metrics_table_full = pd.concat((metrics_table_full, metrics_table_curr), axis=0).reset_index(drop = True)
        
    return metrics_table_full


## Running Vectorized Backtest Across 19 Different Volume Thresholds
Now that we have everything we need, we can run our vectorized BHOD backtest across multiple different volume thresholds (which is simply the volume required for an entry). We start by loading in the data, and then we run our backtest and calculate the metrics associated with the results.

In [38]:
# Testing with dask MSFT 3y data
MSFT_1min_poly_3y_v_f = dd.read_csv('../../../MSFT_1min_poly_3y_9am_4pm.csv', parse_dates=['timestamp'])
MSFT_1min_poly_3y_v_f = MSFT_1min_poly_3y_v_f.set_index('timestamp')
# MSFT_1min_poly_3y_v_f = MSFT_1min_poly_3y_v_f.loc['2024-07-15 00:00':'2024-07-30 00:00']
MSFT_1min_poly_3y_v_f = MSFT_1min_poly_3y_v_f.reset_index()
MSFT_1min_poly_3y_v_f['timestamp'] = MSFT_1min_poly_3y_v_f['timestamp'].dt.floor('S')
MSFT_1min_poly_3y_v_f = MSFT_1min_poly_3y_v_f.set_index('timestamp')

# Repartitioning so that each partition holds one day's worth of data (it does include weekends, but not a big deal)
MSFT_1min_poly_3y_v_f = MSFT_1min_poly_3y_v_f.repartition(freq='10D')

In [41]:
start_time = time.time()
metrics_table_test = dask_to_var_vol_metrics(MSFT_1min_poly_3y_v_f, BHOD_Full_Vec_v1, variables_dict, 10, 101, 5)
print(time.time() - start_time)

30.470995903015137


For the event-driven backtest, it took over 60 seconds to run only a single volume threshold. Using this vectorized backtest, even with 19 times the number of volume thresholds to test, it took less than half the time. To make sure MSFT wasn't just a fluke, we can test this with TSLA as well and show what our final metrics table looks like.

In [43]:
# Testing with dask TSLA 3y data
TSLA_1min_poly_3y_v_f = dd.read_csv('../../../TSLA_1min_poly_3y_9am_4pm.csv', parse_dates=['timestamp'])
TSLA_1min_poly_3y_v_f = TSLA_1min_poly_3y_v_f.set_index('timestamp')
TSLA_1min_poly_3y_v_f = TSLA_1min_poly_3y_v_f.reset_index()
TSLA_1min_poly_3y_v_f['timestamp'] = TSLA_1min_poly_3y_v_f['timestamp'].dt.floor('S')
TSLA_1min_poly_3y_v_f = TSLA_1min_poly_3y_v_f.set_index('timestamp')

# Repartitioning so that each partition holds one day's worth of data (it does include weekends, but not a big deal)
TSLA_1min_poly_3y_v_f = TSLA_1min_poly_3y_v_f.repartition(freq='10D')

In [44]:
start_time = time.time()
metrics_table_test_TSLA = dask_to_var_vol_metrics(TSLA_1min_poly_3y_v_f, BHOD_Full_Vec_v1, variables_dict, 10, 101, 5)
print(time.time() - start_time)

40.05021595954895


In [45]:
metrics_table_test_TSLA

Unnamed: 0,Strategy ID,Net Profit ($),Net Profit w Fees ($),Avg Share Size,Pos Trades Total ($),Neg Trades Total ($),Profit Factor,# Winning Trades,# Total Trades,Win Ratio,...,Biggest Winner ($),Biggest Loser ($),Peak Equity ($),Trough Equity ($),Max Win Streak,Max Lose Streak,Max Drawdown (%),Avg Amount Used for Trade ($),Max Amount Used for Trade ($),Avg Holding Time (min)
0,BHOD Volume Threshold: 10%,3166.62,1646.749,42.447,20201.29,-17034.67,1.186,1092,1920,0.569,...,48.9,-27.5,3689.935,-44.225,20,8,-0.954,9653.463,46670.85,8.58
1,BHOD Volume Threshold: 15%,3607.51,2123.29,42.33,19947.835,-16340.325,1.221,1073,1868,0.574,...,48.9,-27.5,3783.39,-23.825,19,8,-0.768,9605.352,46670.85,8.602
2,BHOD Volume Threshold: 20%,4073.3,2641.482,42.511,19538.535,-15465.235,1.263,1040,1793,0.58,...,48.9,-27.5,4257.785,-23.825,19,7,-0.653,9632.354,46670.85,8.677
3,BHOD Volume Threshold: 25%,4426.755,3050.996,42.302,19100.14,-14673.385,1.302,1003,1718,0.584,...,48.9,-27.5,4594.33,-3.8,21,7,-0.703,9603.056,46670.85,8.884
4,BHOD Volume Threshold: 30%,4976.085,3660.929,42.37,18472.94,-13496.855,1.369,971,1629,0.596,...,48.9,-27.5,5081.06,3.26,19,6,-0.593,9646.804,46670.85,9.037
5,BHOD Volume Threshold: 35%,5166.5,3901.894,42.301,17909.065,-12742.565,1.405,938,1560,0.601,...,46.8,-27.5,5275.245,3.26,18,9,-0.426,9658.343,46670.85,9.113
6,BHOD Volume Threshold: 40%,5549.32,4343.57,42.376,17271.58,-11722.26,1.473,904,1477,0.612,...,46.8,-27.5,5701.85,3.26,17,8,-0.397,9694.924,46670.85,9.227
7,BHOD Volume Threshold: 45%,5918.89,4782.248,42.481,16697.2,-10778.31,1.549,855,1383,0.618,...,46.8,-27.5,5997.96,3.26,17,7,-0.368,9711.04,45239.16,9.386
8,BHOD Volume Threshold: 50%,5909.005,4827.628,42.721,16089.085,-10180.08,1.58,811,1310,0.619,...,46.8,-27.5,5994.645,3.26,18,7,-0.33,9772.938,45239.16,9.555
9,BHOD Volume Threshold: 55%,5702.845,4700.207,42.409,15049.415,-9346.57,1.61,754,1212,0.622,...,46.8,-27.5,5801.385,3.26,19,7,-0.356,9737.143,45239.16,9.806


While the TSLA backtest took a bit longer, it's likely due to the additional number of trades taken for TSLA. Nevertheless, we see that for a baseline volume threshold of 10%, our metrics row is exactly the same as the previous metrics table we got for the event-driven backtest, confirming our process went smoothly. Using a charting software such as Trader Workstation or TradingView reveals similar results as well.

# Conclusions for Vectorized Backtesting
While not shown here, this vectorized backtesting went through numerous iterations, with each one speeding up the backtest by a little bit. Those can be found in the archive folders in this repository. However, this current iteration is the fastest that we were able to make it, balancing efficiency with versatility. Not only can we alter most parameters in the broader BHOD strategy, but for certain features (like volume threshold), our program barely needs any extra time to run.

Overall, we plan to use this backtest once we can obtain enough 1-minute data, and we want to see how it performs against certain Stocks in Play (SiP) that we choose every day. Moreover, using this backtesting method, we can truly analyze how well the Break High of Day Strategy does in general, and we can see how different data affects it. For example, it might work better for certain sectors of stocks than others, or it might work better for higher float stocks compared to lower float ones. Either way, we can run this backtest to see how well different stocks might perform, and we can get an unbiased idea of how to optimize this strategy afterward.