# KPIs & Hypothesis Testing

## Purpose
In this notebook, we calculate **performance KPIs** and run **statistical hypothesis tests** to evaluate the effectiveness and consistency of the trading signals from Telegram.

---

## KPIs to compute:
- Overall success rate (TP hits)
- Success rate per TP level (TP40, TP60, TP80, TP100)
- Success rate by symbol (top performers)
- Hierarchical TP achievement (sequential TP hit pattern)
- Monthly performance summary
- Sharpe ratio (risk-adjusted return per month)
- Volatility (standard deviation of returns per month)

---

## Hypothesis Testing Questions:
- Is there a significant difference in performance between **Long** and **Short** signals?
- Are signals sent at certain **times of day** more likely to succeed?
- Does the **entry price level** affect success probability?
- Has signal performance **changed over time** (early vs recent months)?

>  **Note:** Stop Loss (SL) is not considered in this analysis. This assumes all positions are held until a TP is hit or failed. This makes the strategy highly risky in real scenarios.


In [2]:
import pandas as pd
df = pd.read_csv("../data/clean/signals_tp_clean_with_returns.csv", parse_dates=["timestamp"])


In [3]:
# Overall Take Profit Hit Rate
tp_hit_cols = ["tp_40_hit", "tp_60_hit", "tp_80_hit", "tp_100_hit"]

# Ensure columns are boolean
for col in tp_hit_cols:
    df[col] = df[col].astype(bool)

# Calculate hit rates
tp_hit_rate = df[tp_hit_cols].mean().round(3) * 100

# Format as DataFrame
tp_hit_rate_df = tp_hit_rate.reset_index()
tp_hit_rate_df.columns = ["TP Level", "Hit Rate (%)"]

tp_hit_rate_df


Unnamed: 0,TP Level,Hit Rate (%)
0,tp_40_hit,86.9
1,tp_60_hit,80.4
2,tp_80_hit,70.3
3,tp_100_hit,52.6


In [4]:
# Sequential TP Hit Rate (hierarchical success)
sequential_hits = {
    "tp_40": df["tp_40_hit"].mean(),
    "tp_60": df[df["tp_40_hit"]]["tp_60_hit"].mean(),
    "tp_80": df[df["tp_40_hit"] & df["tp_60_hit"]]["tp_80_hit"].mean(),
    "tp_100": df[df["tp_40_hit"] & df["tp_60_hit"] & df["tp_80_hit"]]["tp_100_hit"].mean()
}

# Convert to DataFrame
sequential_hits_df = pd.DataFrame(
    list(sequential_hits.items()),
    columns=["TP Level", "Sequential Hit Rate (%)"]
)
sequential_hits_df["Sequential Hit Rate (%)"] = (sequential_hits_df["Sequential Hit Rate (%)"] * 100).round(2)

sequential_hits_df


Unnamed: 0,TP Level,Sequential Hit Rate (%)
0,tp_40,86.9
1,tp_60,92.57
2,tp_80,87.34
3,tp_100,74.89


# Non-sequential hit rate shows how often each TP level is reached regardless of order
# Sequential hit rate reflects realistic step-by-step achievement of TP levels


In [5]:
# Load dataset
df = pd.read_csv("../data/clean/signals_tp_clean_with_returns.csv", parse_dates=["timestamp"])

# Convert hit columns to boolean if needed
tp_hit_cols = ["tp_40_hit", "tp_60_hit", "tp_80_hit", "tp_100_hit"]
for col in tp_hit_cols:
    df[col] = df[col].astype(bool)

# Add hierarchical hit tracking
df["tp_60_seq"] = df["tp_40_hit"] & df["tp_60_hit"]
df["tp_80_seq"] = df["tp_40_hit"] & df["tp_60_hit"] & df["tp_80_hit"]
df["tp_100_seq"] = df["tp_40_hit"] & df["tp_60_hit"] & df["tp_80_hit"] & df["tp_100_hit"]

# Group by symbol and calculate sequential success rate
symbol_success = df.groupby("symbol").agg(
    total_signals=("tp_40_hit", "count"),
    tp_40_hit_rate=("tp_40_hit", "mean"),
    tp_60_hit_rate=("tp_60_seq", "mean"),
    tp_80_hit_rate=("tp_80_seq", "mean"),
    tp_100_hit_rate=("tp_100_seq", "mean")
).reset_index()

# Convert rates to percentage
for col in ["tp_40_hit_rate", "tp_60_hit_rate", "tp_80_hit_rate", "tp_100_hit_rate"]:
    symbol_success[col] = (symbol_success[col] * 100).round(2)

# Filter to only symbols with 10 or more signals
symbol_success = symbol_success[symbol_success["total_signals"] >= 10]

# Sort by highest TP_40 success rate (you can change this)
top_symbols = symbol_success.sort_values(by="tp_40_hit_rate", ascending=False).head(10)

top_symbols

Unnamed: 0,symbol,total_signals,tp_40_hit_rate,tp_60_hit_rate,tp_80_hit_rate,tp_100_hit_rate
6,1000BONKUSDT,11,100.0,100.0,81.82,81.82
11,1000PEPEUSDT,12,100.0,100.0,91.67,83.33
62,BANANAUSDT,11,100.0,100.0,90.91,36.36
346,TRUUSDT,10,100.0,100.0,80.0,70.0
52,AVAAIUSDT,14,100.0,92.86,71.43,57.14
67,BELUSDT,11,100.0,81.82,72.73,36.36
124,EDUUSDT,10,100.0,100.0,100.0,70.0
200,LDOUSDT,10,100.0,100.0,100.0,40.0
290,RUNEUSDT,10,100.0,100.0,70.0,70.0
238,NFPUSDT,12,100.0,83.33,83.33,50.0


In [6]:
# Load the data
df = pd.read_csv("../data/clean/signals_tp_clean_with_returns.csv", parse_dates=["timestamp"])

# Extract month
df["month"] = df["timestamp"].dt.to_period("M").astype(str)

# Group by month and compute volatility and Sharpe Ratio
monthly_stats = df.groupby("month").agg(
    avg_return=("estimated_return", "mean"),
    std_dev=("estimated_return", "std")
).reset_index()

# Compute Sharpe Ratio (assuming risk-free rate = 0)
monthly_stats["sharpe_ratio"] = (monthly_stats["avg_return"] / monthly_stats["std_dev"]).round(2)

# Round values for readability
monthly_stats["avg_return"] = monthly_stats["avg_return"].round(2)
monthly_stats["std_dev"] = monthly_stats["std_dev"].round(2)

# Show result
monthly_stats

  df["month"] = df["timestamp"].dt.to_period("M").astype(str)


Unnamed: 0,month,avg_return,std_dev,sharpe_ratio
0,2024-01,64.04,52.54,1.22
1,2024-02,70.5,46.77,1.51
2,2024-03,73.2,46.91,1.56
3,2024-04,78.69,39.94,1.97
4,2024-05,75.7,43.32,1.75
5,2024-06,76.92,39.45,1.95
6,2024-07,77.02,40.8,1.89
7,2024-08,77.08,42.83,1.8
8,2024-09,73.23,44.52,1.65
9,2024-10,71.36,45.88,1.56
