# T20 Upset Radar and Strategy Simulator: Final Story

This notebook is the portfolio-ready narrative for the MVP.

It walks through:
1. Data scope and quality snapshot
2. Modeling and calibration baseline
3. Upset-focused evaluation
4. Explainability and local upset narratives
5. Strategy simulator examples
6. Missed upsets audit
7. Curated upset narratives
8. Cross-case pattern summary
9. Conclusions and MVP status


In [1]:
from __future__ import annotations

import json

import pandas as pd
from sklearn.metrics import confusion_matrix, precision_recall_fscore_support

from src.config import PROCESSED_DIR
from src.data_prep import (
    assign_favorite_underdog_from_elo,
    build_team1_win_target,
    load_matches,
    time_based_split,
)
from src.explain import build_counterfactual_explanation, rank_notable_upsets
from src.features import build_pre_match_feature_frame
from src.models import calibrate_classifier, evaluate_binary_model, train_logistic_baseline
from src.simulation import ScenarioInput, build_scenario_features, score_scenario


## 1) Data Scope and Quality Snapshot

In [2]:
df = load_matches()
df = build_team1_win_target(df)
df = assign_favorite_underdog_from_elo(df)

quality_path = PROCESSED_DIR / "data_quality_report.json"
quality = {}
if quality_path.exists():
    quality = json.loads(quality_path.read_text())

print("Rows:", len(df))
print("Date range:", df["date"].min(), "->", df["date"].max())
print("Unique teams:", df["team1"].nunique())
print("Overall upset rate:", round(float(df["is_upset"].mean()), 4))
if quality:
    print("Duplicate match_ids:", quality.get("duplicate_match_id_count", "n/a"))
    print("Columns with missing values:", quality.get("columns_with_missing_values", "n/a"))


Rows: 3761
Date range: 2014-03-16 00:00:00 -> 2026-02-09 00:00:00
Unique teams: 106
Overall upset rate: 0.3589
Duplicate match_ids: n/a
Columns with missing values: n/a


## 2) Baseline Modeling and Calibration

In [3]:
train_df, valid_df, test_df = time_based_split(df)

X_train, y_train = build_pre_match_feature_frame(train_df)
X_valid, y_valid = build_pre_match_feature_frame(valid_df)
X_test, y_test = build_pre_match_feature_frame(test_df)

base_model = train_logistic_baseline(X_train, y_train)
calibrated_model = calibrate_classifier(base_model, X_valid, y_valid)
test_metrics = evaluate_binary_model(calibrated_model, X_test, y_test)
pd.Series(test_metrics).round(4)


roc_auc               0.6399
log_loss              0.6681
brier                 0.2377
positive_rate_pred    0.4021
dtype: float64

## 3) Upset-Focused Evaluation

In [4]:
test_eval = test_df.copy()
test_eval = assign_favorite_underdog_from_elo(test_eval)
X_eval, _ = build_pre_match_feature_frame(test_eval)
test_eval["team1_win_prob"] = calibrated_model.predict_proba(X_eval)[:, 1]
test_eval["pred_team1_win"] = (test_eval["team1_win_prob"] >= 0.5).astype(int)

# Predict winner names in match orientation.
test_eval["pred_winner"] = test_eval.apply(
    lambda r: r["team1"] if r["pred_team1_win"] == 1 else r["team2"], axis=1
)
test_eval["pred_is_upset"] = (test_eval["pred_winner"] == test_eval["underdog_team"]).astype(int)

upset_cm = confusion_matrix(test_eval["is_upset"], test_eval["pred_is_upset"])
upset_prf = precision_recall_fscore_support(
    test_eval["is_upset"], test_eval["pred_is_upset"], average="binary", zero_division=0
)

print("Upset confusion matrix (rows=true, cols=pred):")
print(upset_cm)
print("Upset precision/recall/f1:", tuple(round(float(x), 4) for x in upset_prf[:3]))


Upset confusion matrix (rows=true, cols=pred):
[[830 425]
 [343 287]]
Upset precision/recall/f1: (0.4031, 0.4556, 0.4277)


## 4) Explainability: Notable Upsets + Counterfactual View

In [5]:
notable = rank_notable_upsets(df, top_n=8)
notable[["date", "team1", "team2", "winner", "elo_diff"]].head(8)


Unnamed: 0,date,team1,team2,winner,elo_diff
2932,2025-04-26,Singapore,Thailand,Singapore,-346.484061
3414,2025-09-14,Austria,Luxembourg,Luxembourg,345.280408
3317,2025-08-12,South Africa,Australia,South Africa,-328.19154
3632,2025-12-11,Singapore,Thailand,Singapore,-314.877439
1356,2023-03-09,England,Bangladesh,Bangladesh,310.410095
2531,2024-10-04,New Zealand,India,New Zealand,-289.76724
3400,2025-09-12,Samoa,Papua New Guinea,Samoa,-283.29952
1024,2022-08-21,United Arab Emirates,Kuwait,Kuwait,266.012655


In [6]:
if len(notable) > 0:
    top_case_idx = notable.index[0]
    cols = [
        "team1", "team2", "match_stage", "venue", "toss_winner", "toss_decision",
        "elo_team1", "elo_team2", "elo_diff", "team1_form_5", "team2_form_5",
        "team1_form_10", "team2_form_10", "h2h_win_pct"
    ]
    base_row = df.loc[[top_case_idx], [c for c in cols if c in df.columns]].copy()
    if not base_row.empty:
        local_exp = build_counterfactual_explanation(calibrated_model, base_row)
        print("Base team1 win probability:", round(local_exp["base_team1_win_prob"], 4))
        pd.DataFrame(local_exp["counterfactuals"])
    else:
        print("Top upset index not found in base dataframe.")
else:
    print("No notable upsets found in current filtered data.")


Base team1 win probability: 0.5664


## 5) Strategy Simulator Example Scenarios

In [7]:
teams = sorted(df["team1"].dropna().unique())
venues = sorted(df["venue"].dropna().unique())

team1 = teams[0]
team2 = teams[1]
venue = venues[0]

scenario_a = ScenarioInput(
    team1=team1,
    team2=team2,
    match_stage="Group Stage",
    venue=venue,
    toss_winner=team1,
    toss_decision="bat",
    elo_team1=1200,
    elo_team2=1175,
    team1_form_5=0.6,
    team2_form_5=0.5,
    team1_form_10=0.58,
    team2_form_10=0.52,
    h2h_win_pct=0.55,
)
scenario_b = ScenarioInput(**{**scenario_a.__dict__, "toss_decision": "field"})

res_a = score_scenario(calibrated_model, build_scenario_features(scenario_a))
res_b = score_scenario(calibrated_model, build_scenario_features(scenario_b))

pd.DataFrame([
    {"scenario": "bat first", **res_a},
    {"scenario": "field first", **res_b},
])


Unnamed: 0,scenario,team1_win_prob,team2_win_prob,underdog_win_prob,upset_risk,upset_severity_index
0,bat first,0.504084,0.495916,0.495916,0.495916,0.049592
1,field first,0.538342,0.461658,0.461658,0.461658,0.046166


## 6) Transition to Audit and Pattern Synthesis

The remaining sections move from simulator examples into error analysis and narrative synthesis:

- missed upset cases where confidence was high but outcomes flipped,
- curated upset stories for portfolio communication,
- and cross-case pattern summaries before final conclusions.


## 6) Missed Upsets Audit (High-Confidence Misses)

This table highlights matches where the model gave the favorite a high win probability, but the underdog actually won.

Interpretation focus:
- larger `favorite_confidence` means the model was more surprised by the upset
- these are useful cases for feature-gap analysis and future model improvements

In [8]:
audit_df = test_eval.copy()
audit_df["actual_winner"] = audit_df["winner"]
audit_df["favorite_confidence"] = audit_df.apply(
    lambda r: r["team1_win_prob"] if r["favorite_team"] == r["team1"] else (1 - r["team1_win_prob"]),
    axis=1,
)
audit_df["is_missed_upset"] = (audit_df["is_upset"] == 1) & (audit_df["pred_is_upset"] == 0)

missed_upsets = audit_df[audit_df["is_missed_upset"]].copy()
missed_upsets = missed_upsets.sort_values("favorite_confidence", ascending=False)

cols = [
    "date",
    "team1",
    "team2",
    "actual_winner",
    "favorite_team",
    "underdog_team",
    "favorite_confidence",
    "team1_win_prob",
    "is_upset",
    "pred_is_upset",
    "match_stage",
    "venue",
]

missed_upsets[cols].head(15)

Unnamed: 0,date,team1,team2,actual_winner,favorite_team,underdog_team,favorite_confidence,team1_win_prob,is_upset,pred_is_upset,match_stage,venue
2531,2024-10-04,New Zealand,India,New Zealand,India,New Zealand,0.83416,0.16584,1,0,,Dubai International Cricket Stadium
3563,2025-11-21,Papua New Guinea,Thailand,Papua New Guinea,Thailand,Papua New Guinea,0.779568,0.220432,1,0,,"Terdthai Cricket Ground, Bangkok"
2572,2024-10-15,Sri Lanka,West Indies,Sri Lanka,West Indies,Sri Lanka,0.773779,0.226221,1,0,,Rangiri Dambulla International Stadium
2027,2024-03-24,New Zealand,England,New Zealand,England,New Zealand,0.770556,0.229444,1,0,,"Saxton Oval, Nelson"
2142,2024-05-09,France,Malta,France,Malta,France,0.766662,0.233338,1,0,,Dreux Sport Cricket Club
2170,2024-05-25,Romania,Gibraltar,Gibraltar,Romania,Gibraltar,0.76532,0.76532,1,0,,"Moara Vlasiei Cricket Ground, Ilfov County"
3023,2025-05-29,Sweden,Jersey,Sweden,Jersey,Sweden,0.763607,0.236393,1,0,,"Simar Cricket Ground, Rome"
3006,2025-05-25,Sweden,Spain,Sweden,Spain,Sweden,0.757891,0.242109,1,0,,"Simar Cricket Ground, Rome"
2509,2024-09-29,Ireland,South Africa,Ireland,South Africa,Ireland,0.75601,0.24399,1,0,,"Zayed Cricket Stadium, Abu Dhabi"
3115,2025-06-15,Scotland,Netherlands,Scotland,Netherlands,Scotland,0.73062,0.26938,1,0,,"Titwood, Glasgow"


## 7) Curated Upset Narratives (Top Cases)

The table below converts top upset cases into short portfolio-style narratives.

Selection logic:
- rank by absolute ELO gap among upset matches
- keep concise fields for storytelling (`stage`, `venue`, `favorite`, `winner`, `elo_gap`)
- auto-generate a short explanation line for each case

In [9]:
narrative_cases = rank_notable_upsets(df, top_n=10).copy()

if narrative_cases.empty:
    print("No upset cases found for narrative generation.")
else:
    narrative_cases = narrative_cases.copy()
    narrative_cases["elo_gap"] = narrative_cases["elo_diff"].abs().round(1)
    narrative_cases["favorite_team"] = narrative_cases.apply(
        lambda r: r["team1"] if float(r["elo_diff"]) >= 0 else r["team2"],
        axis=1,
    )
    narrative_cases["underdog_team"] = narrative_cases.apply(
        lambda r: r["team2"] if float(r["elo_diff"]) >= 0 else r["team1"],
        axis=1,
    )

    def build_case_narrative(row: pd.Series) -> str:
        favorite = row.get("favorite_team", "favorite")
        winner = row.get("winner", "winner")
        stage = row.get("match_stage", "Unknown Stage")
        toss_decision = row.get("toss_decision", "unknown")
        gap = row.get("elo_gap", "n/a")
        return (
            f"At {stage}, {winner} beat favorite {favorite} despite an ELO gap of {gap}. "
            f"Toss decision was '{toss_decision}', suggesting context pressure beyond baseline strength."
        )

    narrative_cases["narrative"] = narrative_cases.apply(build_case_narrative, axis=1)

    narrative_cols = [
        "date",
        "team1",
        "team2",
        "winner",
        "favorite_team",
        "underdog_team",
        "elo_gap",
        "match_stage",
        "venue",
        "narrative",
    ]
    narrative_cases[narrative_cols].head(10)

## 8) Cross-Case Patterns from Curated Upsets

This quick summary aggregates the curated upset set to surface recurring contexts (stage, toss decision, venues).

In [10]:
if "narrative_cases" in globals() and len(narrative_cases) > 0:
    stage_summary = narrative_cases["match_stage"].value_counts(dropna=False).rename("count").to_frame()

    print("Curated upset count:", len(narrative_cases))
    print("\nTop stages in curated upsets:")
    display(stage_summary.head(10))

    if "toss_decision" in narrative_cases.columns:
        toss_summary = narrative_cases["toss_decision"].value_counts(dropna=False).rename("count").to_frame()
        print("\nToss decisions in curated upsets:")
        display(toss_summary)
    else:
        print("\nToss decisions are unavailable in this curated view.")
else:
    print("Narrative cases not available.")

Curated upset count: 10

Top stages in curated upsets:


Unnamed: 0_level_0,count
match_stage,Unnamed: 1_level_1
,9
3rd Place Play-Off,1



Toss decisions are unavailable in this curated view.


## Conclusions and MVP Status

### What changed in this sprint

- Improved scenario realism in the app with matchup-aware defaults, stable control state, and constrained venue selection (plus override for full exploration).
- Expanded trust messaging with explicit diagnostics scope controls, clearer ROC/correlation interpretation, and sample-size guardrails.
- Upgraded reviewer UX with cleaner UI copy, report snapshot views, and consistent percentage formatting for explainability outputs.
- Finalized portfolio polish across README guidance and notebook story flow.

### Final conclusions

- The project now delivers a complete upset analytics workflow: data quality checks, leakage-safe modeling, calibrated probabilities, explainability, and interactive simulation.
- Decision support is strongest when scenario outputs are read together with risk context, calibration behavior, and venue/sample-size diagnostics.
- The current build is showcase-ready for portfolio demos while preserving a clear path for next-phase model depth.

### Current limitations

- Outcome quality still depends on pre-match proxy richness (no player-level availability or tactical lineup features yet).
- Rare upset pockets remain noisy for low-sample teams and venues despite caution labels.
- Advanced-model comparison artifacts are still the highest-value technical extension for the next phase.
