# 04_SIGNAL_STATS_ANALYSIS

## Business Objectives

This project seeks to evaluate the real effectiveness of trading signals shared through a Telegram group. The specific business objectives include:

- Assessing the actual performance of these signals when compared to real historical market prices (from Bybit).
- Building a reliable and structured database of all historical signals for further analysis.
- Determining whether signals consistently reach their Take Profit (TP) targets or get stopped out at a loss (SL).
- Developing dashboards and visual tools to understand patterns, performance by cryptocurrency, and signal direction (Long vs Short).
- Laying the groundwork for advanced analysis, including machine learning models or comparisons with other sources (e.g., WhatsApp groups).

## Key Performance Indicators (KPIs)

The following metrics have been defined to support these objectives:

- **Total number of signals sent**  
  A raw measure of the channel's activity and coverage.

- **Percentage of signals that hit TP1, TP3, or TP4**  
  Primary metric for evaluating signal success (partial or full).

- **Success rate by direction (Long vs Short)**  
  Helps determine whether the channel performs better in bullish or bearish conditions.

- **Average time to reach TP or SL**  
  A timing metric useful for evaluating volatility and risk.

- **Failure or no-result rate**  
  Measures how often signals fail or cannot be evaluated, providing insight into potential risk or noise.

- **Most successful cryptocurrencies**  
  Identifies the symbols with the best historical performance, guiding potential trading focus.

- **Symbol usage frequency**  
  Detects repeated recommendations, possible bias, or overexposure.

- **Duplicate or inconsistent signal alerts**  
  Validates whether signals are being reused or inconsistently updated, which may suggest operational issues or manipulation.

---


In [None]:
# Core libraries
import pandas as pd
import numpy as np
from datetime import datetime

# Plotly for interactive charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
!pip install scipy
import streamlit as st
from plotly.offline import plot


# statistics
from scipy.stats import iqr

# Configure plotly to work in Jupyter
pio.renderers.default = "notebook_connected"


In [None]:
import pandas as pd

# Load processed signal data
file_path = "C:/Users/PcVIP/Desktop/Bootcamp/final_project/telegram_signal_extractor/data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

# TP columns to evaluate
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]

# Initialize dictionaries to store TP hit counts by direction
tp_hits_long = {}
tp_hits_short = {}

for tp in tp_levels:
    col = tp + "_result"
    tp_hits_long[tp] = df[(df["direction"] == "Long") & (df[col].notna())].shape[0]
    tp_hits_short[tp] = df[(df["direction"] == "Short") & (df[col].notna())].shape[0]

# Print result counts
print("✅ TP hit counts by direction:")
print("Long:", tp_hits_long)
print("Short:", tp_hits_short)



In [None]:
import os

#  TP Hits Raw Count Chart 

# DATA
tp_hits_long = {'tp_40': 729, 'tp_60': 698, 'tp_80': 643, 'tp_100': 619}
tp_hits_short = {'tp_40': 644, 'tp_60': 610, 'tp_80': 578, 'tp_100': 538}
tp_levels = list(tp_hits_long.keys())


fig = go.Figure()

#  LONG
fig.add_bar(
    x=tp_levels,
    y=[tp_hits_long[tp] for tp in tp_levels],
    name='Long',
    marker_color='green',
    text=[tp_hits_long[tp] for tp in tp_levels],
    textposition='outside'
)

#  SHORT
fig.add_bar(
    x=tp_levels,
    y=[tp_hits_short[tp] for tp in tp_levels],
    name='Short',
    marker_color='red',
    text=[tp_hits_short[tp] for tp in tp_levels],
    textposition='outside'
)

# Estilo
fig.update_layout(
    barmode='group',
    title='TP Hits by Signal Direction (Long vs Short) – Raw Count',
    title_x=0.5,
    title_font_size=18,
    xaxis_title='Take Profit Level',
    yaxis_title='Number of Signals',
    plot_bgcolor='white',
    bargap=0.15,
    legend=dict(x=0.85, y=1.05)
)

# Crearte folder if it doesnt exist
os.makedirs("outputs/plots", exist_ok=True)

# save html
plot(fig, filename="../outputs/plots/tp_hits_raw_count.html", auto_open=False)

# show on streamlit
st.subheader("TP Hits by Signal Direction – Raw Count")
st.caption("This chart shows the raw number of trading signals that reached each TP level, broken down by direction.")
st.plotly_chart(fig, use_container_width=True)



Key Insights: TP Hits by Signal Direction (Raw Count)

Higher Volume of Long Signals

Across all TP levels, the number of signals that hit their take profit is consistently higher for Long signals than for Short ones. This reflects a greater overall volume or frequency of Long setups shared via the source.

TP Hit Drop-off
Both Long and Short signals exhibit a gradual decline in hits as the TP level increases from TP 40 to TP 100. This suggests that more ambitious targets are naturally harder to reach.

Stability of Long Signal Performance
The decrease in TP hits for Long signals is relatively smooth and consistent, indicating a stable pattern of target achievement across tiers.

Short Signals Underperform in Count
Short signals consistently result in fewer hits compared to Long ones. While this might be partly due to lower volume, it could also imply less favorable market conditions for Shorts during the period analyzed.



In [None]:
#  TP columns to evaluate 
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]

#  Load processed signal data 
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

#  Count total signals by direction 
total_long = df[df["direction"] == "Long"].shape[0]
total_short = df[df["direction"] == "Short"].shape[0]

#  Count TP hits per direction 
tp_hits_long = {}
tp_hits_short = {}

for tp in tp_levels:
    col = f"{tp}_result"
    tp_hits_long[tp] = df[(df["direction"] == "Long") & (df[col].notna())].shape[0]
    tp_hits_short[tp] = df[(df["direction"] == "Short") & (df[col].notna())].shape[0]

# Compute percentages 
tp_pct_long = {tp: round(tp_hits_long[tp] / total_long * 100, 1) for tp in tp_levels}
tp_pct_short = {tp: round(tp_hits_short[tp] / total_short * 100, 1) for tp in tp_levels}

#  Prepare plot 
long_values = [tp_pct_long[tp] for tp in tp_levels]
short_values = [tp_pct_short[tp] for tp in tp_levels]

fig = go.Figure()

fig.add_bar(
    x=tp_levels,
    y=long_values,
    name='Long',
    marker_color='green',
    text=[f"{val}%" for val in long_values],
    textposition='outside'
)

fig.add_bar(
    x=tp_levels,
    y=short_values,
    name='Short',
    marker_color='red',
    text=[f"{val}%" for val in short_values],
    textposition='outside'
)

fig.update_layout(
    barmode='group',
    title='TP Hits by Signal Direction (Long vs Short) – % of Hits per Direction',
    title_x=0.5,
    title_font_size=18,
    xaxis_title='Take Profit Level',
    yaxis_title='Percentage of Signals (%)',
    yaxis=dict(tick0=0, dtick=5),
    plot_bgcolor='white',
    bargap=0.15,
    legend=dict(x=0.85, y=1.05)
)

# === Export to HTML ===
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/tp_hits_by_direction.html", auto_open=False)

#  Display in Streamlit 
st.subheader("TP Hits by Signal Direction (% per Direction)")
st.caption("This chart shows what percentage of Long or Short trades hit each TP level relative to the total signals of that direction.")
st.plotly_chart(fig, use_container_width=True)



## TP Hits by Signal Direction (% per Direction)

### Key Takeaways

#### 1. Consistent performance across directions  
Both Long and Short signals maintain similar percentages of TP hits across all levels. The difference between directions is minimal, usually within 1–2 percentage points.

#### 2. Short signals slightly outperform Longs  
Short signals consistently show slightly higher success rates in percentage terms, indicating that although they occur less frequently, they tend to reach their targets more efficiently.

#### 3. TP 40 and TP 60 show the highest success rates  
Take Profit levels 40 and 60 show the highest hit percentages for both directions, making them the most achievable targets overall.

#### 4. Gradual decline at higher TP levels  
The hit rate decreases as the TP level increases. TP 100 has the lowest success rate, suggesting that more ambitious targets are harder to reach.

#### 5. Practical implication  
Traders may benefit from setting more conservative profit targets (e.g., TP 40 or TP 60), particularly when following Short signals, which show slightly better performance in relative terms.


In [None]:
#  TP columns to evaluate 
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]

#  Load processed signal data 
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

#  Compute global % of hits per TP level 
total_signals = len(df)
tp_hits_global = {}
tp_pct_global = {}

for tp in tp_levels:
    col = f"{tp}_result"
    tp_hits_global[tp] = df[df[col].notna()].shape[0]
    tp_pct_global[tp] = round(tp_hits_global[tp] / total_signals * 100, 1)

#  Prepare plot 
values = [tp_pct_global[tp] for tp in tp_levels]

fig = go.Figure()
fig.add_bar(
    x=tp_levels,
    y=values,
    marker_color='steelblue',
    text=[f"{val}%" for val in values],
    textposition='outside'
)

fig.update_layout(
    title='Global TP Hit Rate by Take Profit Level',
    title_x=0.5,
    title_font_size=18,
    xaxis_title='Take Profit Level',
    yaxis_title='Percentage of Signals (%)',
    yaxis=dict(tick0=0, dtick=5),
    plot_bgcolor='white',
    bargap=0.25,
    showlegend=False
)

#  Export to HTML 
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/tp_hit_rate_global.html", auto_open=False)

# === Display in Streamlit ===
st.subheader("Global TP Hit Rate by Take Profit Level")
st.caption("This chart shows the overall percentage of trading signals that successfully reached each Take Profit level, regardless of signal direction.")
st.plotly_chart(fig, use_container_width=True)


In [None]:
## Global TP Hit Rate by Take Profit Level

### Context
This chart shows the overall percentage of trading signals that successfully reached each Take Profit (TP) level, regardless of whether the signal was Long or Short. It provides a high-level view of the effectiveness of TP targets across all signals.

### Observations
- **TP_40** was hit by **62.4%** of all signals.
- **TP_60** was hit by **59.5%**.
- **TP_80** dropped to **55.5%**.
- **TP_100** was hit by only **52.6%**.

### Conclusions
- There is a clear inverse relationship between TP level and hit rate: the higher the TP target, the lower the success rate.
- TP_40 has the highest global hit rate, making it potentially more suitable for conservative or short-term strategies.
- TP_100 carries a higher risk, with a significantly lower success rate below 53%.

### Key Insight
Focusing on mid-range TP levels could improve overall performance. This pattern supports the idea of setting more realistic profit targets, especially in volatile market conditions.


In [None]:
#  Load data 
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

#  Extract hour from timestamp 
df["hour"] = df["timestamp"].dt.hour

#  Count signals per hour 
hour_counts = df["hour"].value_counts().sort_index()

#  Create plot 
fig = go.Figure()
fig.add_bar(
    x=hour_counts.index,
    y=hour_counts.values,
    marker_color='mediumslateblue',
    text=hour_counts.values,
    textposition='outside'
)

fig.update_layout(
    title="Distribution of Trading Signals by Hour of the Day (UTC)",
    title_x=0.5,
    xaxis_title="Hour of the Day (24h format, UTC)",
    yaxis_title="Number of Signals",
    xaxis=dict(tickmode='linear'),
    plot_bgcolor='white',
    bargap=0.15,
    showlegend=False
)

#  Save HTML 
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/signal_distribution_by_hour.html", auto_open=False)

#  Streamlit display ===
st.subheader("Signal Distribution by Hour")
st.caption("This chart shows the number of trading signals issued during each hour of the day in UTC.")
st.plotly_chart(fig, use_container_width=True)



#  Signal Distribution by Hour (UTC)

## Description
This bar chart shows the number of trading signals issued at each hour of the day, using Coordinated Universal Time (UTC) as the reference. It helps identify when the signal provider is most active throughout the 24-hour cycle.

## Key Insights
- The highest signal activity occurs between **10:00 and 20:00 UTC**, peaking at **18:00 UTC** with **198 signals**.
- A **clear upward trend** begins at **8:00 UTC**, followed by sustained high volume through the afternoon and early evening.
- Signal activity drops significantly outside the 8:00–20:00 UTC window, especially during the early morning hours (1:00–6:00 UTC), which show very low counts.
- This pattern suggests that signal generation aligns with **European and early US trading hours**, potentially overlapping with periods of high market liquidity and volatility.

## Implications
Understanding the time distribution of signals is useful for:
- Setting up alert systems during peak hours.
- Evaluating whether the timing of signals aligns with your preferred trading schedule.
- Planning automation or manual execution windows to match signal issuance patterns.


In [None]:
# === Clean symbol column and count frequencies ===
df["symbol"] = df["symbol"].str.upper().str.strip()
symbol_counts = df["symbol"].value_counts().head(20)

# === Plot ===
fig = go.Figure()
fig.add_bar(
    x=symbol_counts.index,
    y=symbol_counts.values,
    marker_color='darkorange',
    text=symbol_counts.values,
    textposition='outside'
)

fig.update_layout(
    title='Top 20 Most Frequent Trading Symbols',
    title_x=0.5,
    title_font_size=18,
    xaxis_title='Symbol',
    yaxis_title='Number of Signals',
    plot_bgcolor='white',
    bargap=0.25
)

# === Ensure correct output directory exists ===
import os
os.makedirs("../outputs/plots", exist_ok=True)

# === Save to correct path ===
plot(fig, filename="../outputs/plots/tp_signal_count_by_symbol.html", auto_open=False)

# === Display in Streamlit ===
st.subheader("Top 20 Most Frequent Trading Symbols")
st.caption("This chart displays the most commonly signaled trading pairs across the dataset.")
st.plotly_chart(fig, use_container_width=True)



# Top 20 Most Frequent Trading Symbols

## Description
This chart displays the 20 trading pairs with the highest number of signals in the dataset. It helps identify which assets are most frequently targeted by the Telegram signal provider.

## Key Observations
- 1000RATSUSDT, MYROUSDT, and WIFUSDT are the top 3 symbols with 25, 24, and 23 signals respectively.
- Most symbols in the top 20 received between 14 and 18 signals.
- The list is dominated by newly launched or meme-like tokens, suggesting a focus on high-volatility opportunities.

## Interpretation
- The signal provider appears to target assets with high momentum or recent market interest.
- High frequency does not necessarily mean higher success; further analysis is needed to assess win rates and profitability per symbol.
- Traders following these signals should be aware of the associated risks of low-liquidity and highly volatile assets.


In [None]:
# === Load data ===
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

# === Define TP Levels ===
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]

# === Calculate hit rates per symbol and TP level ===
symbol_group = df.groupby("symbol")
tp_stats = []

for symbol, group in symbol_group:
    total = len(group)
    if total < 10:
        continue  # skip symbols with fewer than 10 signals
    row = {"symbol": symbol, "total_signals": total}
    for tp in tp_levels:
        hits = group[f"{tp}_result"].notna().sum()
        row[f"{tp}_hits"] = hits
        row[f"{tp}_hit_rate"] = round(hits / total * 100, 1)
    tp_stats.append(row)

df_tp_stats = pd.DataFrame(tp_stats)

# === Identify best-performing symbol per TP Level ===
summary = {}
for tp in tp_levels:
    best = df_tp_stats.loc[df_tp_stats[f"{tp}_hit_rate"].idxmax()]
    summary[tp] = {
        "symbol": best["symbol"],
        "hit_rate": best[f"{tp}_hit_rate"],
        "total_signals": best["total_signals"],
        "hits": best[f"{tp}_hits"]
    }

df_best_per_tp = pd.DataFrame.from_dict(summary, orient="index")
df_best_per_tp.index.name = "TP Level"
df_best_per_tp.reset_index(inplace=True)

# === Show result ===
df_best_per_tp



In [None]:
# === Save Plot: Best Performing Symbols per TP Level (with Symbol Labels) ===

# Prepara la etiqueta combinada para el eje X
x_labels = [f"{row['TP Level']}<br>{row['symbol']}" for _, row in df_best_per_tp.iterrows()]

fig = go.Figure()
fig.add_bar(
    x=x_labels,
    y=df_best_per_tp["hit_rate"],
    text=[f"{val}%" for val in df_best_per_tp["hit_rate"]],
    textposition='outside',
    marker_color='mediumseagreen'
)

fig.update_layout(
    title='Best Performing Symbols by Take Profit Level (Hit Rate ≥ 10 signals)',
    title_x=0.5,
    title_font_size=18,
    xaxis_title='TP Level and Symbol',
    yaxis_title='Hit Rate (%)',
    yaxis=dict(tick0=0, dtick=10),
    plot_bgcolor='white',
    bargap=0.25,
    showlegend=False
)

# === Save in correct folder ===
import os
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/best_symbols_per_tp.html", auto_open=False)

# === Show in Streamlit ===
st.subheader("Best Performing Symbols by TP Level")
st.caption("This chart highlights the top-performing trading symbol for each TP level, based on hit rate among symbols with at least 10 signals.")
st.plotly_chart(fig, use_container_width=True)


## Best Performing Symbols by Take Profit Level (Hit Rate ≥ 10 signals)

###  Key Insights

1. **High consistency across TP levels**  
   All top-performing symbols at each TP level maintain hit rates **above 83%**, demonstrating strong reliability even at higher take profit thresholds.

2. **1000PEPEUSDT stands out**  
   - Performs best at both **TP 40** (91.7%) and **TP 100** (83.3%).  
   - Suggests strong **follow-through potential** from entry to aggressive exits.  

3. **Other standout performers**  
   - **BIGTIMEUSDT** dominates **TP 60** with a **90.0%** hit rate.  
   - **BANANAUSDT** excels at **TP 80** with a **90.9%** hit rate.  

4. **Strategic implications**  
   - For short-term strategies, **1000PEPEUSDT** and **BIGTIMEUSDT** offer high success rates.  
   - For more ambitious take profits, **BANANAUSDT** and **1000PEPEUSDT** remain consistent leaders.

> These findings help identify not only strong assets but also the most optimal TP levels per symbol, aiding risk-reward calibration in trading strategies.


In [None]:
# === Load data ===
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

# === Define TP levels ===
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]

# === Calculate valid TP hits (hierarchical) ===
total_signals = len(df)
tp_hits_hierarchical = {}

for i, tp in enumerate(tp_levels):
    prev_cols = [f"{tp_levels[j]}_result" for j in range(i + 1)]
    condition = df[prev_cols].notna().all(axis=1)
    tp_hits_hierarchical[tp] = round(condition.sum() / total_signals * 100, 1)

# === Prepare data for plot ===
levels = list(tp_hits_hierarchical.keys())
values = list(tp_hits_hierarchical.values())

fig = go.Figure()
fig.add_bar(
    x=levels,
    y=values,
    text=[f"{val}%" for val in values],
    textposition='outside',
    marker_color='darkblue'
)

fig.update_layout(
    title="Hierarchical TP Hit Rate (Sequential Success Required)",
    title_x=0.5,
    title_font_size=18,
    xaxis_title="Take Profit Level",
    yaxis_title="Percentage of Signals (%)",
    yaxis=dict(tick0=0, dtick=5),
    plot_bgcolor="white",
    bargap=0.25,
    showlegend=False
)

# === Save in correct folder ===
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/tp_hierarchical_hit_rate.html", auto_open=False)

# === Display in Streamlit ===
st.subheader("Hierarchical TP Hit Rate")
st.caption("This chart shows the percentage of signals that reached each TP level, considering sequential success (i.e., TP_100 only counts if TP_40 to TP_80 were also hit).")
st.plotly_chart(fig, use_container_width=True)



## Hierarchical TP Hit Rate (Sequential Success Required)

This bar chart visualizes the **percentage of trading signals that successfully reached each Take Profit (TP) level in sequence**. In this logic, for a signal to be considered as having hit TP_60, it must have previously hit TP_40. Similarly, TP_100 is only counted if TP_40, TP_60, and TP_80 were all reached beforehand.

| TP Level | Hit Rate (%) | Interpretation |
|----------|---------------|----------------|
| **TP 40** | 62.4% | Reached by 62.4% of all signals. |
| **TP 60** | 42.6% | Only 42.6% reached TP 60 after first reaching TP 40. |
| **TP 80** | 29.6% | 29.6% of signals reached TP 80 after hitting both TP 40 and TP 60. |
| **TP 100** | 21.7% | Only 21.7% of signals completed the full sequence from TP 40 to TP 100. |

This provides a more **realistic measure of consistency and sustained trend strength**. Unlike raw hit counts, this approach penalizes signals that skipped earlier targets or reversed before progressing further. It is particularly relevant when modeling staged exits or risk-managed trade plans.


In [None]:
# === Load data ===
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

# === Define TP levels ===
tp_levels = ["tp_40", "tp_60", "tp_80", "tp_100"]
top_symbols = df["symbol"].value_counts().head(10).index.tolist()

# === Calculate hierarchical hit rate per symbol ===
summary = []

for symbol in top_symbols:
    subset = df[df["symbol"] == symbol].copy()
    total = len(subset)
    result = {"Symbol": symbol}
    passed = pd.Series([True] * total, index=subset.index)

    for tp in tp_levels:
        passed = passed & subset[f"{tp}_result"].notna()
        result[tp] = round(passed.sum() / total * 100, 1)

    summary.append(result)

df_seq_hits = pd.DataFrame(summary)
df_seq_hits = df_seq_hits.set_index("Symbol")
df_seq_hits = df_seq_hits.loc[df_seq_hits.mean(axis=1).sort_values(ascending=False).index]  # ordenar

# === Plot heatmap ===
fig = px.imshow(
    df_seq_hits,
    text_auto=True,
    color_continuous_scale="YlGnBu",
    labels=dict(x="TP Level", y="Symbol", color="Sequential Hit Rate (%)"),
    aspect="auto"
)

fig.update_layout(
    title="Hierarchical TP Hit Rate (%) – Top 10 Symbols (Sequential Success)",
    title_x=0.5,
    xaxis_title="Take Profit Level",
    yaxis_title="Trading Symbol",
    margin=dict(l=260, r=20, t=60, b=40),
    coloraxis_colorbar=dict(title="Hit Rate (%)"),
    plot_bgcolor="white"    
)

# === Save properly ===
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/hierarchical_tp_hit_rate_top10_heatmap.html", auto_open=False)

# === Optional: Streamlit ===
import streamlit as st
st.subheader("Hierarchical TP Hit Rate – Top 10 Symbols")
st.caption("This heatmap shows, for each symbol, the percentage of signals that reached each TP level **only if** all previous levels were also reached.")
st.plotly_chart(fig, use_container_width=True)


## Hierarchical Take Profit Hit Rate – Top 10 Most Frequent Symbols

This heatmap shows the **sequential hit rate** for each of the top 10 most frequent trading symbols, across the four Take Profit levels (TP 40, TP 60, TP 80, TP 100). The sequential condition implies that a higher TP level is only considered successful if all preceding levels were also reached.

### Key Insights:
- **SPXUSDT** stands out as the most consistent symbol, maintaining a 77.8% hit rate at TP 40 and still achieving 38.9% at TP 100.
- **MYROUSDT** and **1000RATSUSDT** also show strong early-stage performance but fade more significantly beyond TP 60.
- Symbols like **PIPPINUSDT**, **GRIFFAINUSDT**, and **1000000MOGUSDT** have a steep performance drop after TP 60, indicating less reliability for longer targets.
- **Sequential hit rates** tend to decrease sharply after TP 60 for most symbols, highlighting the difficulty in maintaining success across multiple TP thresholds.

This visualization is particularly helpful for strategy design, as it allows for **symbol-specific optimization** of TP targets based on historical reliability.



In [None]:
# === Load data ===
file_path = "../data/clean/signals_all_tp_results.csv"
df = pd.read_csv(file_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

# === Top 10 symbols by frequency ===
top_symbols = df["symbol"].value_counts().head(10).index.tolist()
df_top = df[df["symbol"].isin(top_symbols)].copy()

# === Count error signals (no TP reached) ===
error_data = []

for symbol in top_symbols:
    subset = df_top[df_top["symbol"] == symbol]
    total = len(subset)
    no_tp_hits = subset[subset[["tp_40_result", "tp_60_result", "tp_80_result", "tp_100_result"]].isna().all(axis=1)]
    num_errors = len(no_tp_hits)
    num_hits = total - num_errors
    error_data.append({
        "symbol": symbol,
        "total_signals": total,
        "hits (≥1 TP)": num_hits,
        "errors (0 TP hit)": num_errors,
        "error_rate (%)": round(num_errors / total * 100, 1)
    })

df_errors = pd.DataFrame(error_data)

# === Display table ===
import plotly.graph_objects as go
import plotly.io as pio
from plotly.offline import plot
import os

# Optional: Visualize in tabular form using Plotly
fig = go.Figure(data=[go.Table(
    header=dict(values=list(df_errors.columns), fill_color='paleturquoise', align='left'),
    cells=dict(values=[df_errors[col] for col in df_errors.columns], fill_color='lavender', align='left'))
])

fig.update_layout(
    title="Error Analysis: Top 10 Symbols with No TP Hits",
    title_x=0.5,
    margin=dict(l=10, r=10, t=50, b=10)
)

# Save figure
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/top10_symbol_errors_table.html", auto_open=False)

# Show in Streamlit
import streamlit as st
st.subheader("Error Rate per Symbol – Top 10 Most Frequent Symbols")
st.caption("This table shows how many signals failed to hit any TP level.")
st.plotly_chart(fig, use_container_width=True)


In [None]:
df_errors  # Asegúrate de que df_errors fue generado previamente

# === Gráfica de barras: tasa de error por símbolo ===
fig = px.bar(
    df_errors,
    x="symbol",
    y="error_rate (%)",
    text="error_rate (%)",
    title="Error Rate by Symbol (Top 10 Symbols)",
    labels={"symbol": "Symbol", "error_rate (%)": "Error Rate (%)"},
    color="error_rate (%)",
    color_continuous_scale="Reds"
)

fig.update_traces(texttemplate='%{text}%', textposition='outside')
fig.update_layout(
    yaxis=dict(tick0=0, dtick=5),
    xaxis_title="Symbol",
    yaxis_title="Error Rate (%)",
    title_x=0.5,
    plot_bgcolor='white',
    coloraxis_colorbar=dict(title="Error Rate (%)")
)

# Guardar en HTML
from plotly.offline import plot
import os
os.makedirs("../outputs/plots", exist_ok=True)
plot(fig, filename="../outputs/plots/top10_errors_barchart.html", auto_open=False)

# Mostrar en Streamlit
import streamlit as st
st.subheader("Error Rate by Symbol (Top 10 Symbols)")
st.caption("This chart shows the proportion of signals that failed to reach any Take Profit level.")
st.plotly_chart(fig, use_container_width=True)

### Error Analysis: Top 10 Most Frequent Symbols

This analysis identifies how often the most frequent trading signals failed to reach any Take Profit (TP) level — essentially highlighting false or failed signals.

#### Objective
To quantify the error rate for each of the top 10 most used symbols, defined as the percentage of signals that did not hit any TP level.

#### Chart: Error Rate by Symbol (Top 10 Symbols)
The chart below shows the proportion of signals that did not reach any TP level for each of the top 10 most frequent trading symbols.

- Highest Error Rate: `WIFUSDT` with 30.4%
- Lowest Error Rate: `MYROUSDT` with 4.2%
- Other symbols like `AUCTIONUSDT` and `MAVIAUSDT` show moderate failure rates.

This visualization helps identify which tokens might be more unreliable or volatile, helping users filter out poor-performing symbols for future strategies.

Saved to: `../outputs/plots/top10_errors_barchart.html`
