# Analysis of Trader Performance vs. Market Sentiment

## Project Overview

This project conducts a detailed analysis of the relationship between trader performance, using historical data from Hyperliquid, and market sentiment, as measured by the Bitcoin Fear & Greed Index. The primary objective is to uncover actionable patterns in profitability, trading volume, risk, and strategy (long vs. short) under different market conditions. The insights are intended to help drive the development of smarter, data-driven trading strategies.

---

## Directory Structure

The project is organized into the following structure to ensure clarity and reproducibility:

-   `notebook_1.ipynb`: The main Google Colab notebook containing all Python code for data loading, cleaning, analysis, and visualization.
-   `csv_files/`: Stores all tabular data outputs.
    -   `performance_summary.csv`: A summary table of key performance metrics by market sentiment.
-   `outputs/`: Stores all visual outputs from the analysis.
    -   `pnl_by_sentiment.png`: Total profit and loss by sentiment.
    -   `volume_by_sentiment.png`: Total trading volume by sentiment.
    -   `success_rate_by_sentiment.png`: Percentage of profitable trades by sentiment.
    -   `volume_vs_sentiment_timeseries.png`: Daily trading volume vs. the Fear & Greed Index.
    -   `pnl_distribution_by_sentiment.png`: Box plot showing the distribution and risk of PnL.
    -   `trade_size_distribution_by_sentiment.png`: Box plot showing the distribution of trade sizes.
    -   `pnl_by_trade_side.png`: Grouped bar chart comparing PnL for BUY vs. SELL trades.
    -   `pnl_by_top_coins.png`: Bar chart showing PnL for the top 5 most-traded coins.
-   `README.md`: This file, providing a complete overview and setup instructions.

---

## How to Run

1.  **Environment**: This project is designed to be run in a Google Colab environment.
2.  **Data**: Upload the `fear_greed_index.csv` and `historical_data.csv` files to the root directory of your Colab session.
3.  **Execution**: Open `notebook_1.ipynb` and run all the cells sequentially. The script will automatically create the `csv_files/` and `outputs/` directories and save all the analysis files.

---

## Key Findings & Insights

The analysis revealed several significant patterns connecting trader behavior to market sentiment:

### High-Level Trends
* **"Fear" Drives Profit and Volume**: The **highest total profit** and **greatest trading volume** are generated during periods of "Fear." This suggests that volatile conditions or contrarian strategies (buying when others are fearful) are highly effective for this cohort of traders.
* **"Extreme Greed" Has the Highest Win Rate**: While "Fear" generates the most absolute profit, trades made during "Extreme Greed" have the **highest success rate** (46.5% of trades are profitable). This indicates that while traders may be more cautious, their individual bets are more consistently successful when the market is euphoric.

### Deeper Insights from Advanced Visualizations
* **PnL Distribution Reveals Risk**: The box plot of PnL shows that while "Fear" has the highest total profit, it also comes with a very wide distribution of outcomes, indicating **higher risk**. In contrast, "Extreme Greed" shows a tighter, more consistent range of profits.
* **"Fear" Encourages Larger, Bolder Trades**: The trade size distribution shows that the **median trade size is largest during periods of "Fear"**, suggesting traders are more confident and willing to commit more capital during these times, likely to capitalize on perceived market bottoms.
* **Trade Side Strategy is Key**: The analysis of BUY vs. SELL trades reveals a crucial pattern: **BUY (long) positions are significantly more profitable across all sentiment types**, especially during "Fear" and "Greed." SELL (short) positions, on the other hand, consistently contribute much less to overall profitability.
* **Coin-Specific Performance**: The profitability analysis of the top 5 coins shows that performance is not uniform. Certain coins, like **BTC and ETH**, are major profit drivers, especially during "Fear" and "Greed," while others may perform differently, highlighting the need for asset-specific strategies.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# --- 1. Setup: Create directories as per the required structure ---
csv_output_dir = 'csv_files'
visual_output_dir = 'outputs'

if not os.path.exists(csv_output_dir):
    os.makedirs(csv_output_dir)
if not os.path.exists(visual_output_dir):
    os.makedirs(visual_output_dir)


In [None]:
# --- 2. Data Loading ---
# Load the datasets
try:
    df_sentiment = pd.read_csv('/content/fear_greed_index.csv')
    df_trades = pd.read_csv('/content/historical_data.csv')
except FileNotFoundError as e:
    print(f"Error loading data: {e}. Please make sure the files are in the correct directory.")
    # Exit if files are not found
    exit()


  df_trades = pd.read_csv('/content/historical_data.csv')


In [None]:
# --- 3. Data Preprocessing and Cleaning ---
# Convert date/timestamp columns to datetime objects
df_sentiment['date'] = pd.to_datetime(df_sentiment['date'])
df_trades['timestamp_dt'] = pd.to_datetime(df_trades['Timestamp IST'], format='%d-%m-%Y %H:%M')
df_trades['date'] = df_trades['timestamp_dt'].dt.date
df_trades['date'] = pd.to_datetime(df_trades['date'])

# Clean and convert numeric columns
numeric_cols = ['Size USD', 'Closed PnL']
for col in numeric_cols:
    df_trades[col] = pd.to_numeric(df_trades[col], errors='coerce')
df_trades.dropna(subset=numeric_cols, inplace=True)

# --- 4. Merging Datasets ---
df_merged = pd.merge(df_trades, df_sentiment, on='date', how='inner')

sns.set_style("whitegrid")
df_merged['is_profitable'] = df_merged['Closed PnL'] > 0


In [None]:
# Plot 1: Total PnL by Market Sentiment
plt.figure(figsize=(10, 6))
pnl_by_sentiment = df_merged.groupby('classification')['Closed PnL'].sum().reset_index().sort_values(by='Closed PnL', ascending=False)
sns.barplot(data=pnl_by_sentiment, x='classification', y='Closed PnL', palette='viridis')
plt.title('Total Profit and Loss (PnL) by Market Sentiment', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Total Closed PnL (in USD)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_by_sentiment.png")
plt.close()


Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=pnl_by_sentiment, x='classification', y='Closed PnL', palette='viridis')


In [None]:
# Plot 2: Total Trading Volume by Market Sentiment
plt.figure(figsize=(10, 6))
volume_by_sentiment = df_merged.groupby('classification')['Size USD'].sum().reset_index().sort_values(by='Size USD', ascending=False)
sns.barplot(data=volume_by_sentiment, x='classification', y='Size USD', palette='plasma')
plt.title('Total Trading Volume by Market Sentiment', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Total Trading Volume (in USD)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/volume_by_sentiment.png")
plt.close()


Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=volume_by_sentiment, x='classification', y='Size USD', palette='plasma')


In [None]:
#--- Visualization 1: Donut Chart of Trade Count by Sentiment ---
# A donut chart is a variation of a pie chart, showing parts of a whole.
plt.figure(figsize=(10, 8))
sentiment_counts = df_merged['classification'].value_counts()
plt.pie(sentiment_counts, labels=sentiment_counts.index, autopct='%1.1f%%', startangle=90, pctdistance=0.85, colors=sns.color_palette('YlGnBu'))

# Draw a circle at the center to make it a donut
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

plt.title('Proportion of Trades by Market Sentiment', fontsize=16)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/trades_by_sentiment_donut_chart.png")
plt.close()
print(f"Saved: trades_by_sentiment_donut_chart.png")


# --- Visualization 2: Scatter Plot of Trade Size vs. PnL ---
# This helps identify if larger trades lead to larger profits or losses.
# We'll use a sample to avoid overplotting and apply a log scale for better visibility.
plt.figure(figsize=(12, 8))
df_sample = df_merged.sample(n=5000, random_state=42) # Use a random sample for clarity

sns.scatterplot(
    data=df_sample,
    x='Size USD',
    y='Closed PnL',
    hue='classification',
    palette='viridis',
    alpha=0.6,
    size='Size USD',  # Make larger trades appear as larger points
    sizes=(20, 400)
)

plt.title('Trade Size vs. PnL by Market Sentiment (Sampled)', fontsize=16)
plt.xlabel('Trade Size (USD) - Log Scale', fontsize=12)
plt.ylabel('Closed PnL (USD)', fontsize=12)
plt.xscale('log') # Use a log scale for the x-axis due to wide range of trade sizes
plt.axhline(0, color='red', linestyle='--', lw=1) # Add a line at PnL=0
plt.legend(title='Sentiment')
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/size_vs_pnl_scatter.png")
plt.close()
print(f"Saved: size_vs_pnl_scatter.png")







Saved: trades_by_sentiment_donut_chart.png
Saved: size_vs_pnl_scatter.png


In [None]:
#--- Visualization 1: Cumulative PnL Curve (Equity Curve) ---
plt.figure(figsize=(14, 7))
# Sort by time to ensure the cumulative sum is correct
df_sorted = df_merged.sort_values('timestamp_dt')
df_sorted['cumulative_pnl'] = df_sorted['Closed PnL'].cumsum()
plt.plot(df_sorted['timestamp_dt'], df_sorted['cumulative_pnl'])
plt.title('Cumulative PnL Over Time (Equity Curve)', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Profit & Loss (USD)', fontsize=12)
plt.fill_between(df_sorted['timestamp_dt'], df_sorted['cumulative_pnl'], 0, alpha=0.1)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/cumulative_pnl_curve.png")
plt.close()
print(f"Saved: cumulative_pnl_curve.png")

# --- Visualization 2: Calendar Heatmap of Daily PnL ---
# This requires more advanced data manipulation with pandas.
daily_pnl = df_merged.groupby('date')['Closed PnL'].sum().reset_index()
daily_pnl['year'] = daily_pnl['date'].dt.year
daily_pnl['day_of_week'] = daily_pnl['date'].dt.dayofweek
daily_pnl['week_of_year'] = daily_pnl['date'].dt.isocalendar().week

years = daily_pnl['year'].unique()
for year in years:
    plt.figure(figsize=(16, 4))
    year_data = daily_pnl[daily_pnl['year'] == year]
    pivot_data = year_data.pivot_table(values='Closed PnL', index='day_of_week', columns='week_of_year')
    sns.heatmap(pivot_data, cmap='RdYlGn', center=0, linewidths=.5, cbar_kws={'label': 'PnL (USD)'})
    plt.title(f'Daily PnL Calendar Heatmap for {year}', fontsize=16)
    plt.xlabel('Week of Year', fontsize=12)
    plt.ylabel('Day of Week', fontsize=12)
    plt.yticks(ticks=range(7), labels=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'], rotation=0)
    plt.tight_layout()
    plt.savefig(f"{visual_output_dir}/pnl_calendar_heatmap_{year}.png")
    plt.close()
    print(f"Saved: pnl_calendar_heatmap_{year}.png")


# --- Visualization 3: Time Series with Rolling Averages ---
daily_agg = df_merged.groupby('date').agg({'Size USD': 'sum', 'value': 'first'}).reset_index()
# Calculate rolling averages
daily_agg['vol_7_day_avg'] = daily_agg['Size USD'].rolling(window=7).mean()
daily_agg['vol_30_day_avg'] = daily_agg['Size USD'].rolling(window=30).mean()

fig, ax1 = plt.subplots(figsize=(14, 7))
# Primary Axis (Volume)
ax1.bar(daily_agg['date'], daily_agg['Size USD'], color='lightblue', label='Daily Volume (USD)')
ax1.plot(daily_agg['date'], daily_agg['vol_7_day_avg'], color='blue', lw=2, label='7-Day Avg. Volume')
ax1.plot(daily_agg['date'], daily_agg['vol_30_day_avg'], color='navy', lw=2, label='30-Day Avg. Volume')
ax1.set_xlabel('Date', fontsize=12)
ax1.set_ylabel('Trading Volume (USD)', fontsize=12)

# Secondary Axis (Sentiment)
ax2 = ax1.twinx()
ax2.plot(daily_agg['date'], daily_agg['value'], color='orange', lw=2, label='Fear & Greed Index')
ax2.set_ylabel('Fear & Greed Index Score', fontsize=12)

plt.title('Daily Volume (with Rolling Averages) and Sentiment Index', fontsize=16)
fig.legend(loc="upper left", bbox_to_anchor=(0.1, 0.9))
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/volume_timeseries_with_rolling_avg.png")
plt.close()
print(f"Saved: volume_timeseries_with_rolling_avg.png")

# --- Visualization 4: Hexbin Plot for PnL vs. Trade Size Density ---
plt.figure(figsize=(12, 8))
# Filter for better visualization (excluding extreme PnL)
pnl_limit = df_merged['Closed PnL'].quantile(0.95)
size_limit = df_merged['Size USD'].quantile(0.95)
# Filter out non-positive trade sizes for log scale
df_filtered = df_merged[(df_merged['Closed PnL'] <= pnl_limit) & (df_merged['Size USD'] <= size_limit) & (df_merged['Size USD'] > 0)]

hb = plt.hexbin(
    df_filtered['Size USD'],
    df_filtered['Closed PnL'],
    gridsize=50,
    cmap='inferno',
    xscale='log' # Use log scale for trade size
)

cb = plt.colorbar(hb, label='Trade Count')
plt.title('Density of Trades by Size and PnL (Hexbin)', fontsize=16)
plt.xlabel('Trade Size (USD) - Log Scale', fontsize=12)
plt.ylabel('Closed PnL (USD)', fontsize=12)
plt.axhline(0, color='white', linestyle='--', lw=1)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_vs_size_hexbin.png")
plt.close()
print(f"Saved: pnl_vs_size_hexbin.png")

Saved: cumulative_pnl_curve.png
Saved: pnl_calendar_heatmap_2023.png
Saved: pnl_calendar_heatmap_2024.png
Saved: pnl_calendar_heatmap_2025.png
Saved: volume_timeseries_with_rolling_avg.png
Saved: pnl_vs_size_hexbin.png


In [None]:
# --- Visualization 3: KDE Plot of PnL Distributions ---
# A Kernel Density Estimate (KDE) plot is a smoothed version of a histogram.
# It's excellent for comparing the shape of distributions.
plt.figure(figsize=(12, 7))
sentiments = df_merged['classification'].unique()

for sentiment in sentiments:
    subset = df_merged[df_merged['classification'] == sentiment]
    # Filter out extreme outliers to make the main distribution visible
    pnl_filtered = subset[(subset['Closed PnL'] > -1000) & (subset['Closed PnL'] < 1000)]
    sns.kdeplot(pnl_filtered['Closed PnL'], label=sentiment, fill=True, alpha=0.2, lw=2)

plt.title('Distribution of Profit & Loss (PnL) by Sentiment', fontsize=16)
plt.xlabel('Closed PnL (USD)', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.axvline(0, color='black', linestyle='--', lw=1)
plt.legend(title='Sentiment')
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_distribution_kde.png")
plt.close()
print(f"Saved: pnl_distribution_kde.png")

In [None]:
#Plot 3: Trade Success Rate by Market Sentiment
success_rate = df_merged.groupby('classification')['is_profitable'].mean().reset_index()
success_rate['success_rate_pct'] = success_rate['is_profitable'] * 100
success_rate = success_rate.sort_values(by='success_rate_pct', ascending=False)
plt.figure(figsize=(10, 6))
sns.barplot(data=success_rate, x='classification', y='success_rate_pct', palette='cividis')
plt.title('Trade Success Rate by Market Sentiment', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Success Rate (%)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/success_rate_by_sentiment.png")
plt.close()

In [None]:
# Plot 4: Daily Trading Volume vs. Sentiment Index Time Series
daily_agg = df_merged.groupby('date').agg({'Size USD': 'sum', 'value': 'first'}).reset_index()
fig, ax1 = plt.subplots(figsize=(14, 7))
ax1.bar(daily_agg['date'], daily_agg['Size USD'], color='lightblue', label='Trading Volume (USD)')
ax1.set_xlabel('Date')
ax1.set_ylabel('Trading Volume (USD)', color='skyblue')
ax1.tick_params(axis='y', labelcolor='skyblue')
ax2 = ax1.twinx()
ax2.plot(daily_agg['date'], daily_agg['value'], color='orange', label='Fear & Greed Index')
ax2.set_ylabel('Fear & Greed Index', color='orange')
ax2.tick_params(axis='y', labelcolor='orange')
ax2.set_ylim(0, 100)
plt.title('Daily Trading Volume and Fear & Greed Index', fontsize=16)
fig.tight_layout()
plt.savefig(f"{visual_output_dir}/volume_vs_sentiment_timeseries.png")
plt.close()


In [None]:

# We limit the y-axis to the 99th percentile to make the box plot readable, excluding extreme outliers.
plt.figure(figsize=(12, 7))
pnl_quantile = df_merged['Closed PnL'].quantile(0.99)
df_filtered_pnl = df_merged[df_merged['Closed PnL'] < pnl_quantile]
sns.boxplot(data=df_filtered_pnl, x='classification', y='Closed PnL', palette='coolwarm')
plt.title('Distribution of Profit & Loss (PnL) by Sentiment (Outliers Hidden)', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Closed PnL (USD)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_distribution_by_sentiment.png")
plt.close()
print(f"Saved: pnl_distribution_by_sentiment.png")


# This plot shows how trade sizes vary across different sentiment periods.
plt.figure(figsize=(12, 7))
size_quantile = df_merged['Size USD'].quantile(0.99)
df_filtered_size = df_merged[df_merged['Size USD'] < size_quantile]
sns.boxplot(data=df_filtered_size, x='classification', y='Size USD', palette='spring')
plt.title('Distribution of Trade Size by Sentiment (Outliers Hidden)', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Trade Size (USD)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/trade_size_distribution_by_sentiment.png")
plt.close()
print(f"Saved: trade_size_distribution_by_sentiment.png")

In [None]:

# This grouped bar chart compares profitability of long (BUY) vs. short (SELL) positions.
plt.figure(figsize=(12, 7))
pnl_by_side = df_merged.groupby(['classification', 'Side'])['Closed PnL'].sum().reset_index()
sns.barplot(data=pnl_by_side, x='classification', y='Closed PnL', hue='Side', palette='muted')
plt.title('Total PnL by Trade Side and Market Sentiment', fontsize=16)
plt.xlabel('Market Sentiment', fontsize=12)
plt.ylabel('Total Closed PnL (USD)', fontsize=12)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_by_trade_side.png")
plt.close()
print(f"Saved: pnl_by_trade_side.png")


# This plot analyzes performance for the 5 most traded coins.
top_coins = df_merged.groupby('Coin')['Size USD'].sum().nlargest(5).index
df_top_coins = df_merged[df_merged['Coin'].isin(top_coins)]
plt.figure(figsize=(14, 8))
pnl_top_coins = df_top_coins.groupby(['Coin', 'classification'])['Closed PnL'].sum().reset_index()
sns.barplot(data=pnl_top_coins, x='Coin', y='Closed PnL', hue='classification', palette='Spectral')
plt.title('Total PnL for Top 5 Coins by Market Sentiment', fontsize=16)
plt.xlabel('Coin', fontsize=12)
plt.ylabel('Total Closed PnL (USD)', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig(f"{visual_output_dir}/pnl_by_top_coins.png")
plt.close()
print(f"Saved: pnl_by_top_coins.png")

print(f"All visualizations saved to the '{visual_output_dir}/' directory.")



In [None]:
# --- 6. Performance Summary and Export ---
performance_summary = df_merged.groupby('classification').agg(
    total_pnl=('Closed PnL', 'sum'),
    average_pnl=('Closed PnL', 'mean'),
    total_volume=('Size USD', 'sum'),
    average_trade_size=('Size USD', 'mean'),
    number_of_trades=('Account', 'count'),
    success_rate=('is_profitable', lambda x: x.mean() * 100)
).reset_index().sort_values(by='total_pnl', ascending=False)
performance_summary = performance_summary.round(2)

summary_path = f"{csv_output_dir}/performance_summary.csv"
performance_summary.to_csv(summary_path, index=False)

print(f"\nPerformance summary saved to: {summary_path}")
print("\n--- Performance Summary by Market Sentiment ---")
print(performance_summary)
print("\n=== ANALYSIS COMPLETED SUCCESSFULLY ===")