# Error Analysis in the Mid-Probability Zone (0.4‚Äì0.6)

## Objective

The goal of this notebook is to analyze matches that fall into the **mid-probability range (0.4‚Äì0.6)** under the **final stacking model**.

This probability interval represents cases where the model expresses **high uncertainty**, meaning:
- No strong preference for either outcome
- Predictions that are close to random but still informed by data

Rather than treating this zone as noise, we explicitly study it to understand:
- What types of matches concentrate in this range
- Whether the uncertainty is structural, contextual, or due to missing features
- If targeted feature engineering could reduce ambiguity in this specific region

---

## Why Focus on the 0.4‚Äì0.6 Zone?

In probabilistic football modeling, probabilities close to 0.5 are critical because:

- They concentrate **decision risk**
- Small probability shifts can strongly affect downstream decisions
- They often hide subtle interactions that global metrics do not reveal

This zone is especially relevant after stacking, since:
- Base models already capture the most obvious signals
- Remaining uncertainty reflects **residual structure** rather than na√Øve underfitting

---

## Model Used

All analyses in this notebook are based on:

- **Final stacking model**
- **Out-of-fold (OOF) predicted probabilities**
- Temporal cross-validation setup consistent with previous evaluation notebooks

No in-sample predictions are used.

This ensures that:
- Observed patterns reflect true generalization behavior
- Insights are aligned with the model intended for deployment or reporting

---

## Scope of the Analysis

This notebook focuses on:

- Identifying recurring match characteristics within the 0.4‚Äì0.6 range
- Comparing feature distributions against higher- and lower-confidence predictions
- Studying disagreement between base models (e.g. logistic regression vs tree-based models)
- Evaluating whether uncertainty is:
  - Reducible (via feature engineering)
  - Or inherent to match dynamics

No model retraining is performed in this notebook.
Its output is **diagnostic insight**, not immediate performance optimization.

---

## Expected Outcomes

By the end of this analysis, we aim to:

- Clearly characterize matches that fall into the mid-probability zone
- Decide whether a dedicated feature engineering path is justified
- Document uncertainty as a first-class modeling result, not a failure

This notebook serves as a bridge between **evaluation** and **feature refinement** stages of the project.


In [1]:
import pandas as pd

import sys
from pathlib import Path

PROJECT_ROOT = Path().resolve().parents[0]
sys.path.append(str(PROJECT_ROOT))
DATA_DIR = PROJECT_ROOT / "data" / "processed"
df = pd.read_csv(DATA_DIR / "prematch_seasons18-24_home_advantage.csv")

df = df.sort_values(["season", "date"]).reset_index(drop=True)

FEATURES = [
    "home_form_weighted",
    "away_form_weighted",
    "home_momentum",
    "away_momentum",
    'home_advantage_diff',
]

TARGET = "home_win"
C_VALUE = 10
MIN_TEST_SIZE = 50
seasons = (
    df["season"]
    .sort_values()
    .unique()
)

seasons

array(['2018 A', '2019 A', '2019 C', '2020 A', '2021 A', '2021 C',
       '2022 A', '2022 C', '2023 A', '2023 C', '2024 A', '2024 C'],
      dtype=object)

In [2]:
def temporal_splits(df, seasons):
    for i in range(1, len(seasons)):
        train_seasons = seasons[:i]
        test_season = seasons[i]
        
        train_df = df[df["season"].isin(train_seasons)]
        test_df = df[df["season"] == test_season]
        
        yield train_df, test_df, test_season

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

def train_base_models(X_train, y_train):
    logit = LogisticRegression(max_iter=1000)
    rf = RandomForestClassifier(
        n_estimators=300,
        max_depth=6,
        random_state=42
    )
    
    logit.fit(X_train, y_train)
    rf.fit(X_train, y_train)
    
    return logit, rf


In [4]:
def predict_base_models(models, X):
    logit, rf = models
    
    return {
        "p_logit": logit.predict_proba(X)[:, 1],
        "p_rf": rf.predict_proba(X)[:, 1]
    }

In [5]:
def drop_na_rows(X, y=None):
    if y is not None:
        mask = ~X.isna().any(axis=1)
        return X[mask], y[mask]
    else:
        return X.dropna()


In [6]:
meta_rows = []

for train_df, test_df, season in temporal_splits(df, seasons):
    
    X_train = train_df[FEATURES]
    y_train = train_df[TARGET]
    
    X_test = test_df[FEATURES]
    y_test = test_df[TARGET]

    X_train, y_train = drop_na_rows(X_train, y_train)
    X_test, y_test = drop_na_rows(X_test, y_test)
    
    if len(X_test) == 0:
        continue 
    
    base_models = train_base_models(X_train, y_train)
    base_preds = predict_base_models(base_models, X_test)
    
    tmp = test_df.loc[X_test.index].copy()
    tmp["p_logit"] = base_preds["p_logit"]
    tmp["p_rf"] = base_preds["p_rf"]
    tmp["y_true"] = y_test
    tmp["season"] = season
    
    meta_rows.append(tmp)

meta_df = pd.concat(meta_rows).reset_index(drop=True)
meta_df.head()


Unnamed: 0,season,match_id,date,home_advantage,away_advantage,home_advantage_diff,home_win,home_team,away_team,home_form,...,form_diff,home_form_weighted,away_form_weighted,form_diff_weighted,home_momentum,away_momentum,momentum_diff,p_logit,p_rf,y_true
0,2019 A,306,2019-07-19,0.07264,0.1285,-0.05586,1,Atlas Guadalajara,FC Ju√°rez,0.4,...,0.4,0.6,0.0,0.6,0.9,0.0,0.9,0.961919,0.989439,1
1,2019 A,307,2019-07-19,0.03483,0.302018,-0.267188,0,Puebla FC,Club Tijuana,0.266667,...,-0.333333,0.2,0.8,-0.6,-0.3,0.9,-1.2,0.014722,0.01,0
2,2019 A,308,2019-07-20,0.129138,0.241862,-0.112724,0,Atl√©tico San Luis,Pumas UNAM,0.0,...,-0.466667,0.0,0.466667,-0.466667,0.0,-1.489094e-16,1.489094e-16,0.120113,0.033295,0
3,2019 A,310,2019-07-20,0.169935,0.163304,0.006631,1,CF Am√©rica,CF Monterrey,0.666667,...,0.2,0.822222,0.266667,0.555556,0.7,-0.9,1.6,0.983182,0.999247,1
4,2019 A,309,2019-07-20,0.121962,0.098039,0.023923,0,CF Pachuca,Club Le√≥n,0.466667,...,-0.333333,0.311111,0.8,-0.488889,-0.7,-3.301496e-16,-0.7,0.023798,0.0,0


In [7]:
meta_features = ["p_logit", "p_rf"]

meta_model = LogisticRegression()
meta_model.fit(meta_df[meta_features], meta_df["y_true"])

meta_df["p_meta"] = meta_model.predict_proba(meta_df[meta_features])[:, 1]


In [8]:
mid_zone = meta_df[
    (meta_df["p_meta"] >= 0.4) &
    (meta_df["p_meta"] <= 0.6)
].copy()

print(len(mid_zone), len(meta_df))
mid_zone.head(10)


51 1697


Unnamed: 0,season,match_id,date,home_advantage,away_advantage,home_advantage_diff,home_win,home_team,away_team,home_form,...,home_form_weighted,away_form_weighted,form_diff_weighted,home_momentum,away_momentum,momentum_diff,p_logit,p_rf,y_true,p_meta
77,2019 A,382,2019-09-14,0.098039,0.1285,-0.030461,1,Club Le√≥n,FC Ju√°rez,0.733333,...,0.688889,0.266667,0.422222,-0.2,0.3,-0.5,0.356496,0.551878,1,0.451918
78,2019 A,383,2019-09-14,0.063551,0.07264,-0.009089,1,Deportivo Guadalajara,Atlas Guadalajara,0.466667,...,0.488889,0.333333,0.155556,0.1,0.3,-0.2,0.391693,0.525126,1,0.425982
86,2019 A,392,2019-09-21,0.121962,0.302018,-0.180056,1,CF Pachuca,Club Tijuana,0.533333,...,0.533333,0.222222,0.311111,-2.019495e-16,-0.2,0.2,0.575129,0.503034,1,0.47417
89,2019 A,394,2019-09-22,0.241862,0.158864,0.082997,0,Pumas UNAM,Cruz Azul,0.333333,...,0.4,0.288889,0.111111,0.3,0.1,0.2,0.563015,0.468527,0,0.414766
96,2019 A,403,2019-09-25,0.1285,0.169935,-0.041435,0,FC Ju√°rez,CF Am√©rica,0.466667,...,0.511111,0.288889,0.222222,0.2,0.1,0.1,0.56817,0.530069,0,0.513816
178,2019 C,160,2019-01-06,0.241862,0.076923,0.164939,0,Pumas UNAM,CD Veracruz,0.533333,...,0.466667,0.177778,0.288889,-0.3,0.2,-0.5,0.223031,0.558486,0,0.402585
236,2019 C,219,2019-02-23,0.163304,0.03483,0.128474,0,CF Monterrey,Puebla FC,0.733333,...,0.688889,0.466667,0.222222,-0.2,-0.3,0.1,0.522232,0.490359,0,0.43037
257,2019 C,239,2019-03-09,0.100228,0.291167,-0.190939,0,Club Necaxa,Deportivo Toluca,0.266667,...,0.377778,0.4,-0.022222,0.5,0.3,0.2,0.490305,0.528785,0,0.476261
274,2019 C,258,2019-03-30,0.121962,0.291167,-0.169205,1,CF Pachuca,Deportivo Toluca,0.6,...,0.6,0.466667,0.133333,-1.976579e-16,-8.462227e-17,-1.130357e-16,0.42633,0.501469,1,0.404922
308,2019 C,290,2019-04-27,0.169935,0.333714,-0.163779,1,CF Am√©rica,Santos Laguna,0.466667,...,0.466667,0.311111,0.155556,-1.489094e-16,0.2,-0.2,0.315567,0.583515,1,0.483235


In [9]:
meta_df["model_disagreement"] = (
    meta_df["p_rf"] - meta_df["p_logit"]
).abs()
meta_df["signal_strength"] = (
    meta_df["home_form_weighted"].abs() +
    meta_df["away_form_weighted"].abs()
)
meta_df["balanced_match"] = (
    meta_df["form_diff_weighted"].abs() < 0.15
).astype(int)


In [10]:
from sklearn.linear_model import LogisticRegression

meta_features_ext = [
    "p_logit",
    "p_rf",
    "model_disagreement",
    "signal_strength",
    "balanced_match"
]

meta_model_ext = LogisticRegression(max_iter=1000)
meta_model_ext.fit(meta_df[meta_features_ext], meta_df["y_true"])

meta_df["p_meta_ext"] = meta_model_ext.predict_proba(
    meta_df[meta_features_ext]
)[:, 1]


In [11]:
mid_zone_base = meta_df[
    (meta_df["p_meta"] >= 0.4) &
    (meta_df["p_meta"] <= 0.6)
]

mid_zone_ext = meta_df[
    (meta_df["p_meta_ext"] >= 0.4) &
    (meta_df["p_meta_ext"] <= 0.6)
]

len(mid_zone_base), len(mid_zone_ext)


(51, 55)

In [12]:
from sklearn.metrics import brier_score_loss

brier_base = brier_score_loss(
    mid_zone_base["y_true"],
    mid_zone_base["p_meta"]
)

brier_ext = brier_score_loss(
    mid_zone_ext["y_true"],
    mid_zone_ext["p_meta_ext"]
)

brier_base, brier_ext


(0.25032371187897906, 0.25071987025396697)

In [13]:
coef_df = pd.Series(
    meta_model_ext.coef_[0],
    index=meta_features_ext
).sort_values()

coef_df


model_disagreement   -0.088753
balanced_match        0.221225
signal_strength       0.525552
p_logit               1.836180
p_rf                  6.417132
dtype: float64

We experimented with conflict-aware meta-features (model disagreement, signal strength, match balance). While theoretically appealing, they increased probability diffusion and slightly degraded local Brier score in the ambiguous 0.4‚Äì0.6 region. We therefore retained the simpler meta-model.

## Evaluate a isotonic model in the problematic area
This is the last experiment to see if the medium zone has a better evolution.

In [14]:
from sklearn.isotonic import IsotonicRegression

iso = IsotonicRegression(out_of_bounds="clip")

iso.fit(meta_df["p_meta"], meta_df["y_true"])

meta_df["p_meta_iso"] = iso.predict(meta_df["p_meta"])


In [15]:
mid_zone_iso = meta_df[
    (meta_df["p_meta_iso"] >= 0.4) &
    (meta_df["p_meta_iso"] <= 0.6)
]

len(mid_zone_base), len(mid_zone_iso)


(51, 43)

In [16]:
brier_iso = brier_score_loss(
    mid_zone_iso["y_true"],
    mid_zone_iso["p_meta_iso"]
)

brier_iso


0.24584717607973425

In [17]:
brier_global_iso = brier_score_loss(
    meta_df["y_true"],
    meta_df["p_meta_iso"]
)

brier_global_base = brier_score_loss(
    meta_df["y_true"],
    meta_df["p_meta"]
)

brier_global_base, brier_global_iso


(0.05705845390220038, 0.0538414446191328)

In [18]:
meta_model_reg = LogisticRegression(
    penalty="l2",
    C=0.05,      # üîë fuerte
    max_iter=1000
)

meta_model_reg.fit(
    meta_df[["p_logit", "p_rf"]],
    meta_df["y_true"]
)

meta_df["p_meta_reg"] = meta_model_reg.predict_proba(
    meta_df[["p_logit", "p_rf"]]
)[:, 1]




In [19]:
mid_zone_reg = meta_df[
    (meta_df["p_meta_reg"] >= 0.4) &
    (meta_df["p_meta_reg"] <= 0.6)
]

len(mid_zone_reg)


91

In [20]:
brier_reg = brier_score_loss(
    mid_zone_reg["y_true"],
    mid_zone_reg["p_meta_reg"]
)

brier_reg


0.24449071553745072

While conflict-aware meta-features were explored, they increased probability diffusion. Applying isotonic calibration to the meta-model effectively reduced uncertainty concentration in the 0.4‚Äì0.6 region and improved both local and global Brier scores, without introducing additional signals.