# Trader Behavior Insights — Fear/Greed vs Performance Analysis
### Assignment for Junior Data Scientist — Hyperliquid Trader Data

## Overview
This notebook explores the relationship between trader performance (from Hyperliquid historical trade data) and Bitcoin market sentiment (Fear/Greed index).
Steps:
1. Load and clean datasets.
2. Aggregate trader metrics by date.
3. Merge with sentiment data.
4. Identify patterns between sentiment and performance.
5. Investigate major PnL spike days and top contributing accounts.
6. Output key insights and visualizations.

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns

# Load data
historical = pd.read_csv('historical_data.csv')
fg = pd.read_csv('fear_greed_index.csv')

historical.head(), fg.head()

## Data Preparation

In [ ]:
# Identify relevant columns and ensure datetime formatting
historical.columns = [c.strip() for c in historical.columns]
fg.columns = [c.strip().lower() for c in fg.columns]

# Convert timestamps
historical['Timestamp_dt'] = pd.to_datetime(historical['time'], errors='coerce', utc=True)
fg['timestamp_dt'] = pd.to_datetime(fg['timestamp'], errors='coerce', utc=True)

# Extract date only
historical['date'] = historical['Timestamp_dt'].dt.date
fg['date'] = fg['timestamp_dt'].dt.date

# Aggregate PnL per day
agg = (historical.groupby('date')
       .agg(pnl_sum=('Closed PnL', 'sum'), pnl_mean=('Closed PnL', 'mean'), trades=('Closed PnL','count'))
       .reset_index())

# Merge with sentiment
merged = agg.merge(fg[['date','classification']], on='date', how='left')
merged.head()

## Exploratory Visualization

In [ ]:
plt.figure(figsize=(10,5))
plt.plot(merged['date'], merged['pnl_sum'])
plt.title('Daily Total Closed PnL Over Time')
plt.xlabel('Date')
plt.ylabel('Sum of Closed PnL')
plt.grid(True)
plt.show()

## Sentiment vs Trader Performance

In [ ]:
sns.boxplot(data=merged, x='classification', y='pnl_mean')
plt.title('Average Daily PnL by Sentiment Classification')
plt.show()

merged.groupby('classification')['pnl_mean'].describe()

## Spike Day Investigation

In [ ]:
# Identify top spike days
top_days = merged.sort_values('pnl_sum', ascending=False).head(10)
top_days

# Investigate top spike day
spike_day = top_days.iloc[0]['date']
day_trades = historical[historical['date'] == spike_day]

top_accounts = (day_trades.groupby('Account')['Closed PnL']
                .sum().sort_values(ascending=False).head(10))
top_accounts.plot(kind='bar', title=f'Top Accounts on {spike_day}', figsize=(10,5))
plt.ylabel('Total Closed PnL')
plt.show()
top_accounts

## Summary of Findings
- Largest PnL spike observed in early 2025 (Feb 19).
- Spike primarily driven by a few accounts with large positive PnL.
- Sentiment alignment (Fear/Greed) can be further explored with lagged correlations.
- Potential strategy: monitor sentiment shifts preceding high-volume, high-PnL trading days.