# Introduction <br>
In this notebook I will be working with the same lead trader's data but this time it will be  the trade history.The procedure is pretty similar to the  one we used at analyzing position history but with some adaptations since the data are not similar:<br>
1.Clean the data<br>
2.Perform EDA on the trade history to identify the kind of strategy used <br>
The data scraper I used left few duplicates so I had to drop them.

### Clean the data

In [1]:
#Import required libraries
import pandas as pd
from datetime  import datetime
import matplotlib.pyplot as plt
import numpy as np

In [2]:
df_trade_history = pd.read_csv("../data/lead_trader_2m_th.csv")
df_trade_history.head()

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7
0,2025-03-21 20:43:25,SUIUSDTPerp,2025-03-21 20:43:25SUIUSDTPerpClose long2.2735...,Close long,2.2735,311.2 SUI,-16.74105000 USDT
1,2025-03-21 19:47:37,1000PEPEUSDTPerp,2025-03-21 19:47:371000PEPEUSDTPerpClose long0...,Close long,0.007265,"97,649 1000PEPE",-6.10198149 USDT
2,2025-03-21 18:46:25,1000PEPEUSDTPerp,2025-03-21 18:46:251000PEPEUSDTPerpOpen long0....,Open long,0.007153,"29,543 1000PEPE",0.00000000 USDT
3,2025-03-21 12:47:54,1000PEPEUSDTPerp,2025-03-21 12:47:541000PEPEUSDTPerpOpen long0....,Open long,0.007323,"25,821 1000PEPE",0.00000000 USDT
4,2025-03-21 10:36:31,SUIUSDTPerp,2025-03-21 10:36:31SUIUSDTPerpOpen long2.28220...,Open long,2.2822,93.7 SUI,0.00000000 USDT


The data was messier than the position history so I had to soil myself a little bit.Colummn 3 is just the whole columns so I will drop it.

In [3]:
#We have already defined the function to clean the cols.Check 'clean_trade_history.py'
from scripts.clean_trade_history import clean_trade_history
df_trade_history= clean_trade_history("../data/lead_trader_2m_th.csv")
df_trade_history.head()

Unnamed: 0,Time,Symbol,Side,Price,Quantity,Realized Profit(USDT)
0,2025-03-21 20:43:25,SUIUSDT,Close long,2.2735,311.2,-16.74
1,2025-03-21 19:47:37,1000PEPEUSDT,Close long,0.007265,97649.0,-6.1
2,2025-03-21 18:46:25,1000PEPEUSDT,Open long,0.007153,29543.0,0.0
3,2025-03-21 12:47:54,1000PEPEUSDT,Open long,0.007323,25821.0,0.0
4,2025-03-21 10:36:31,SUIUSDT,Open long,2.2822,93.7,0.0


In [4]:
df_trade_history.shape

(1465, 6)

In [None]:
df_trade_history.info()

In [None]:
df_trade_history.duplicated().sum()

Our data is now clean and ready for the next phase. Since we intend to take one symbol and overlay entry and exit points on its OHLC candlestick chart, we will the Symbol which was most traded.


In [None]:
most_traded = df_trade_history["Symbol"].value_counts()
most_traded

Now we will subset our data to only return trades for 1000PEPEUSDT since it was the one with most trades during the period.

In [None]:
def PEPEUSDT_trades(Symbol):
    df_1000PEPEUSDT= df_trade_history[df_trade_history["Symbol"] == Symbol]
    return df_1000PEPEUSDT 

In [None]:
df_1000PEPEUSDT = PEPEUSDT_trades("1000PEPEUSDT")
df_1000PEPEUSDT.head(5)

In [None]:
df_1000PEPEUSDT.shape

**Merge trades that were split due to slippage** <br>
The trades that happen within the same time stamp or less than 30s are actually a single trade but have been split due to
slippage — the order was filled in chunks at slightly different prices or times, likely because there wasn’t enough liquidity 
to execute it all at once.This is common in fast-moving or less liquid markets.

In [None]:
#We have already defined a function to merge trades split by slippage on merge_trades.py. So we will just  import and call it
from scripts.merge_trades import merge_same_timestamp_trades
df_1000PEPEUSDT_merged = merge_same_timestamp_trades(df_1000PEPEUSDT)
df_1000PEPEUSDT_merged.head()


In [None]:
df_1000PEPEUSDT_merged.shape

**Visualizing number of open positions before closes** <br>
The lead trader appears to use an averaging down strategy, where they open multiple buy positions at lower and lower prices after the market drops. This reduces their average entry price, aiming to profit from even a small bounce back up.It is clear that the highest number of average down was 9 (after the first open) but it only occurred once. Most trades are closed between first open and the second average down.

In [None]:
# Initialize variables
trade_counts_before_close = []
current_open_trades = []

# Iterate through the trades in the DataFrame
for row in df_1000PEPEUSDT.itertuples():
    if row.Side == "Open long":
        current_open_trades.append(row)  # Keep track of open trades
    elif row.Side == "Close long" and current_open_trades:
        trade_counts_before_close.append(len(current_open_trades))  # Record the count of trades before this close
        current_open_trades = []  # Reset the list for the next group

# Create a DataFrame for the trade counts
trade_counts_df = pd.DataFrame(trade_counts_before_close, columns=["Number of Opens before Close"])
trade_counts_df = trade_counts_df["Number of Opens before Close"].value_counts().reset_index()
trade_counts_df.columns = ["Number of Opens before Close", "Frequency"]
trade_counts_df = trade_counts_df.sort_values(by="Number of Opens before Close")
trade_counts_df = trade_counts_df.set_index("Number of Opens before Close")
trade_counts_df


In [None]:
# Summary statistics
max_trades = trade_counts_df["Frequency"].max() 
min_trades = trade_counts_df["Frequency"].min() 
print(f"Maximum number of trades before a close: {max_trades}")
print(f"Minimum number of trades before a close: {min_trades}")

In [None]:
# Plotting the distribution of trade counts before a close using a horizontal bar plot
plt.figure(figsize=(10, 6))
trade_counts_df.sort_values("Frequency", ascending=True)["Frequency"].plot(kind='barh', color='skyblue', edgecolor='black')
plt.xlabel("Frequency")
plt.ylabel("Number of Trades Before Close")
plt.title("Distribution of Number of Trades Before a Close")
plt.show()
