# 06 - Analyse des Edges

**Objectif** : Si le backtest est passe (GO), analyser la qualite des edges detectes.

## Questions :
1. Les edges sont-ils reels ou des artefacts de la marge bookmaker ?
2. Quel seuil d'edge minimum donne le meilleur ROI ?
3. Quel marche (home/draw/away/over25) est le plus profitable ?
4. Le Kelly criterion donne-t-il de meilleurs resultats que le flat stake ?

In [None]:
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json

sns.set_theme(style='whitegrid')
print('OK')

In [None]:
# Charger les resultats du backtest
RESULTS_DIR = PROJECT_ROOT / 'data' / 'results'

bets_path = RESULTS_DIR / 'backtest_bets.csv'
if not bets_path.exists():
    print('ERREUR: Executez d abord le notebook 05_backtest')
    print('Les resultats du backtest sont necessaires pour cette analyse.')
else:
    df = pd.read_csv(bets_path)
    df['match_date'] = pd.to_datetime(df['match_date'])
    print(f'Paris charges: {len(df)}')
    print(f'Win rate: {df["won"].mean():.1%}')
    print(f'ROI: {df["pnl"].mean():.1%}')
    df.head()

## 1. ROI par seuil d'edge minimum

In [None]:
# Quel seuil donne le meilleur ROI ?
thresholds = np.arange(2, 25, 1)
roi_by_threshold = []

for t in thresholds:
    subset = df[df['edge_pct'] >= t]
    if len(subset) >= 10:
        roi_by_threshold.append({
            'threshold': t,
            'n_bets': len(subset),
            'roi': subset['pnl'].mean(),
            'win_rate': subset['won'].mean(),
        })

df_t = pd.DataFrame(roi_by_threshold)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8), sharex=True)

ax1.bar(df_t['threshold'], df_t['roi'] * 100, 
        color=['green' if r > 0 else 'red' for r in df_t['roi']], alpha=0.7)
ax1.axhline(y=0, color='black', linewidth=0.5)
ax1.set_ylabel('ROI (%)')
ax1.set_title('ROI par seuil d edge minimum')

ax2.bar(df_t['threshold'], df_t['n_bets'], color='steelblue', alpha=0.7)
ax2.set_xlabel('Seuil d edge minimum (%)')
ax2.set_ylabel('Nombre de paris')

plt.tight_layout()
plt.savefig(RESULTS_DIR / 'roi_by_threshold.png', dpi=150)
plt.show()

## 2. ROI par marche

In [None]:
# Performance par type de marche
market_stats = df.groupby('market').agg(
    n_bets=('pnl', 'count'),
    win_rate=('won', 'mean'),
    roi=('pnl', 'mean'),
    avg_edge=('edge_pct', 'mean'),
    avg_odds=('best_odds', 'mean'),
).round(4)

print('Performance par marche:\n')
print(market_stats.to_string())

## 3. Marge bookmaker vs edge reel

Verification que la `remove_margin()` fonctionne correctement.

In [None]:
# Distribution: model_prob vs fair_bookmaker_prob
fig, ax = plt.subplots(figsize=(8, 8))

ax.scatter(df['fair_bookmaker_prob'], df['model_prob'], 
           c=df['won'].map({True: 'green', False: 'red'}),
           alpha=0.5, s=30)
ax.plot([0, 1], [0, 1], 'k--', alpha=0.5, label='Pas d edge')
ax.set_xlabel('Probabilite bookmaker (marge retiree)')
ax.set_ylabel('Probabilite modele')
ax.set_title('Modele vs Bookmaker (vert=gagne, rouge=perdu)')
ax.legend()
plt.tight_layout()
plt.savefig(RESULTS_DIR / 'model_vs_bookmaker.png', dpi=150)
plt.show()

## 4. Kelly Criterion simulation

In [None]:
# Simuler Kelly vs Flat Stake
bankroll_flat = 1000.0
bankroll_kelly = 1000.0
flat_stake = 10.0  # 1% du bankroll initial

history_flat = [bankroll_flat]
history_kelly = [bankroll_kelly]

for _, bet in df.sort_values('match_date').iterrows():
    # Flat stake
    if bet['won']:
        bankroll_flat += flat_stake * (bet['best_odds'] - 1)
    else:
        bankroll_flat -= flat_stake
    history_flat.append(bankroll_flat)
    
    # Quarter Kelly
    b = bet['best_odds'] - 1
    p = bet['model_prob']
    q = 1 - p
    kelly = max(0, (b * p - q) / b)
    stake = bankroll_kelly * kelly * 0.25  # Quarter Kelly
    stake = min(stake, bankroll_kelly * 0.05)  # Max 5% du bankroll
    
    if bet['won']:
        bankroll_kelly += stake * (bet['best_odds'] - 1)
    else:
        bankroll_kelly -= stake
    history_kelly.append(bankroll_kelly)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(history_flat, label=f'Flat Stake ({flat_stake} par pari)', linewidth=1.5)
ax.plot(history_kelly, label='Quarter Kelly', linewidth=1.5)
ax.axhline(y=1000, color='gray', linestyle=':', alpha=0.5)
ax.set_xlabel('Nombre de paris')
ax.set_ylabel('Bankroll')
ax.set_title('Simulation: Flat Stake vs Kelly Criterion')
ax.legend()
plt.tight_layout()
plt.savefig(RESULTS_DIR / 'kelly_vs_flat.png', dpi=150)
plt.show()

print(f'\nBankroll finale (flat):  {bankroll_flat:.0f} ({(bankroll_flat/1000-1)*100:+.1f}%)')
print(f'Bankroll finale (kelly): {bankroll_kelly:.0f} ({(bankroll_kelly/1000-1)*100:+.1f}%)')

## Decision finale

Si les resultats sont positifs :
- **GO** → Phase 1 : construire le data pipeline automatise
- **NO-GO** → ameliorer le modele ou pivoter

### Ameliorations potentielles si NO-GO :
1. Ajouter XGBoost avec features avancees (xG, forme)
2. Calibrer l'ensemble avec isotonic regression
3. Augmenter les donnees (multi-saisons, multi-ligues)
4. Utiliser un modele de draw plus sophistique
5. Ajouter le Closing Line Value comme feature