## Hybrid DWTS Elimination System (Simulation Script)

This script simulates a **proposed DWTS elimination system** that blends judges’ scores, fan support, and momentum, while enforcing fairness shields for both judges and fans.

### Inputs

- `panel_with_eta_decomp.csv`, containing at least:
  - `season`, `week`, `contestant_id`
  - `judge_total` – raw weekly judge total
  - `eta_decomp` – decomposed latent utility for fans
  - `F_decomp` – fan share implied by `eta_decomp`
  - `eliminated` – 1 if historically eliminated this week, else 0

### Design Hyperparameters

- **Fan temperatures**
  - `GAMMA_SCORE` – controls how sharply fan utilities translate into fan shares for scoring.
  - `GAMMA_ELIM` – same for bottom-two fan decision.
- **Momentum**
  - `K_MOMENTUM` – lookback window (in weeks) for fan momentum.
- **Judge weights**
  - `ALPHA_J_START`, `ALPHA_J_END` – judge weight increases over the season.
  - `ALPHA_M` – fixed weight on momentum.
  - Implied fan weight:  
    \[
    \alpha_F = 1 - \alpha_J - \alpha_M
    \]
- **Fairness / shield rules**
  - `THETA_F` – fan-protection threshold (strong favorites).
  - `TOP_JUDGE_K`, `BOT_FAN_K`, `BOT_JUDGE_K` – how many top/bottom positions define shield sets.

### Main Components

1. **Helper functions**
   - `softmax(x)` – numerically stable softmax over a 1D array.
   - `add_normalized_judges(...)` – adds `J_norm` = judge_total normalized within each `(season, week)`.
   - `add_momentum_from_F(...)` – computes fan momentum:
     \[
     M_{it} = \max\bigl(0, F_{it} - \text{avg of last K weeks}\bigr)
     \]
   - `compute_week_progress(...)` – computes season progress
     \[
     \tau_{st} = \frac{\text{week} - 1}{T_s - 1}
     \]
     used to interpolate judge weight from early to late season.
   - `dense_rank_descending(values)` – dense rank (1 = largest), used for judge/fan ranks.

2. **Preprocessing**
   - Load `panel_with_eta_decomp.csv`.
   - Check required columns.
   - Add:
     - `J_norm` – within-week normalized judge scores.
     - `momentum` – fan momentum from `F_decomp`.
     - `week_progress` – season progress scalar.
   - Sort by `season`, `week`, `contestant_id` for stable processing.

3. **Weekly simulation loop**

   For each `(season, week)` where **exactly one historical elimination** occurs:

   1. **Fan shares**
      - Compute `F_score = softmax(GAMMA_SCORE * eta_decomp)` for scoring.
      - Compute `F_elim  = softmax(GAMMA_ELIM * eta_decomp)` for final bottom-two decision.

   2. **Dynamic weights**
      - Interpolate judge weight:
        \[
        \alpha_J(t) = \alpha_{J,\text{start}} + (\alpha_{J,\text{end}} - \alpha_{J,\text{start}})\,\tau_{st}
        \]
      - Use fixed `ALPHA_M`, then set:
        \[
        \alpha_F(t) = 1 - \alpha_J(t) - \alpha_M.
        \]

   3. **Risk score**
      - Combined “at-risk” score:
        \[
        S_i = \alpha_J J_i + \alpha_F F^{\text{score}}_i + \alpha_M M_i.
        \]

   4. **Shields / protection**
      - Compute dense ranks:
        - `rJ` – judge rank (1 = best judge).
        - `rF` – fan rank (1 = highest fan share).
      - **Judge shield**:
        - Protect contestants who are:
          - In the top `TOP_JUDGE_K` by judges, and
          - Not in the bottom `BOT_FAN_K` by fans.
      - **Fan shield**:
        - Protect contestants with `F_elim ≥ THETA_F` (strong favorites), **unless** they are in the bottom `BOT_JUDGE_K` by judges.
      - Union of both shields = full protected set.
      - Eligible set = contestants not protected.  
        If everyone is protected, fall back to all contestants being eligible.

   5. **Bottom-two and final elimination**
      - Among eligible contestants, pick **bottom two** by `S` (lowest scores).
      - Within this bottom-two, eliminate the contestant with **lower** fan share (`F_elim`).

   6. **Evaluation vs history**
      - `hit` – whether predicted elimination matches the historical elimination.
      - `judge_conflict` – whether the eliminated contestant is **different** from the worst judge scorer (i.e., this shows judge–fan tension).
      - Track fan variance `var(F_score)` as an inequality measure.

   7. **Logging for plots**
      - `weight_log` – store `alpha_J`, `alpha_F`, `alpha_M` by `(season, week)` for weight-evolution plots.
      - `example_week_df` – capture the first week where the system conflicts with judges, storing per-contestant decomposition:
        - `J_part`, `F_part`, `M_part`, `S_total`
        - predicted vs true elimination flags.

4. **Summary metrics**

After all weeks:

- `hit_rate` – fraction of weeks where the system exactly matches the historical elimination.
- `conflict_rate` – fraction of weeks where the system’s elimination differs from the worst judge.
- `mean_fan_var` – average within-week variance of `F_score`, reflecting fan inequality.

The script prints:

- Global performance summary.
- A sample of `results_df` (per-week outcomes).
- A preview of `weights_df` (for plotting weight trajectories).
- A preview of `example_week_df` (for visualization of one “interesting” conflict week, if found).

### Outputs (in-memory)

- `results_df` – per-week results (hit, conflict, weights, predicted/true eliminated IDs).
- `weights_df` – per-week judge/fan/momentum weights.
- `example_week_df` – one week with a judge–fan conflict, ready for a stacked-bar decomposition figure.


In [None]:
"""
Proposed DWTS elimination system:

- Score-based combination of judges, fans, and momentum
- Week-dependent judge/fan weights
- Judge- and fan-shield fairness rules
- Bottom-two selection by score
- Final elimination decided by fan share within bottom-two
"""

import numpy as np
import pandas as pd

# ============================================================
# 0. Hyperparameters (design choices)
# ============================================================

PANEL_PATH = "panel_with_eta_decomp.csv"

# Fan temperature for scoring and final elimination
GAMMA_SCORE = 0.7
GAMMA_ELIM = 1.0

# Momentum parameters
K_MOMENTUM = 3  # lookback window in weeks

# Phase-dependent judge weights
ALPHA_J_START = 0.3
ALPHA_J_END = 0.6

# Fixed momentum weight
ALPHA_M = 0.2

# Fan protection threshold (strong favorites)
THETA_F = 0.30

# Top/bottom cutoffs for shields
TOP_JUDGE_K = 2
BOT_FAN_K = 2
BOT_JUDGE_K = 2


# ============================================================
# 1. Helpers
# ============================================================

def softmax(x):
    """Numerically stable softmax over 1D numpy array."""
    x = np.asarray(x, dtype=float)
    m = np.max(x)
    ex = np.exp(x - m)
    return ex / ex.sum()


def add_normalized_judges(df,
                          group_cols=("season", "week"),
                          judge_col="judge_total",
                          out_col="J_norm"):
    df = df.copy()
    df[out_col] = df.groupby(list(group_cols))[judge_col].transform(
        lambda s: s / s.sum() if s.sum() != 0 else 0.0
    )
    return df


def add_momentum_from_F(df,
                        season_col="season",
                        contestant_col="contestant_id",
                        week_col="week",
                        F_col="F_decomp",
                        out_col="momentum",
                        K=3):
    """
    M_it = max(0, F_it - avg of last K weeks for that contestant in that season)
    using historical fan share estimate F_decomp.
    """
    df = df.copy()
    df[out_col] = 0.0
    df = df.sort_values(by=[season_col, contestant_col, week_col])

    groups = df.groupby([season_col, contestant_col], sort=False)
    momentum_values = []

    for (s, cid), sub in groups:
        sub = sub.sort_values(by=week_col)
        F_vals = sub[F_col].values
        m_vals = np.zeros_like(F_vals, dtype=float)

        for idx in range(1, len(F_vals)):
            k = min(idx, K)
            prev_mean = F_vals[idx - k:idx].mean()
            m_vals[idx] = max(0.0, F_vals[idx] - prev_mean)

        momentum_values.append(pd.Series(m_vals, index=sub.index))

    momentum_series = pd.concat(momentum_values).sort_index()
    df.loc[momentum_series.index, out_col] = momentum_series.values
    return df


def compute_week_progress(df, season_col="season", week_col="week"):
    """
    tau_{st} = (week - 1) / (T_s - 1)
    """
    df = df.copy()
    max_week = df.groupby(season_col)[week_col].max().to_dict()

    def tau(row):
        T_s = max_week[row[season_col]]
        if T_s <= 1:
            return 0.0
        return (row[week_col] - 1) / (T_s - 1)

    df["week_progress"] = df.apply(tau, axis=1)
    return df


def dense_rank_descending(values):
    """
    Dense rank, 1 = largest, n = smallest.
    values: 1D numpy array
    """
    order = np.argsort(-values)  # descending
    ranks = np.empty_like(order, dtype=int)
    # assign ranks
    current_rank = 1
    for i, idx in enumerate(order):
        if i > 0 and values[idx] != values[order[i - 1]]:
            current_rank = i + 1
        ranks[idx] = current_rank
    return ranks


# ============================================================
# 2. Load panel and preprocess
# ============================================================

df = pd.read_csv(PANEL_PATH)

print("Loaded panel_with_eta_decomp.csv")
print(df.head())
print("Columns:", df.columns.tolist())
print("Rows:", len(df))

required_cols = ["season", "week", "contestant_id",
                 "judge_total", "eta_decomp", "F_decomp", "eliminated"]
for c in required_cols:
    if c not in df.columns:
        raise ValueError(f"Column '{c}' missing from panel_with_eta_decomp.csv")

# Normalize judges
df = add_normalized_judges(df,
                           group_cols=("season", "week"),
                           judge_col="judge_total",
                           out_col="J_norm")

# Momentum from F_decomp
df = add_momentum_from_F(df,
                         season_col="season",
                         contestant_col="contestant_id",
                         week_col="week",
                         F_col="F_decomp",
                         out_col="momentum",
                         K=K_MOMENTUM)

# Week progress tau_{st}
df = compute_week_progress(df,
                           season_col="season",
                           week_col="week")

print("\nBasic checks:")
print("Unique seasons:", df["season"].unique())
print("Weeks per season (head):")
print(df.groupby("season")["week"].max().head())
print("Total eliminated=1 rows:", df["eliminated"].sum())

# Sort for consistent processing
df = df.sort_values(by=["season", "week", "contestant_id"]).reset_index(drop=True)


# ============================================================
# 3. Simulate proposed system week by week
# ============================================================

results = []  # per-week outcomes

# ==== NEW: logs for plots ====
weight_log = []        # for panel (b): alpha_J/F/M over weeks
fan_var_list = []      # you already had this, keep it
example_week_df = None # for panel (d): one representative week

unique_weeks = df[["season", "week"]].drop_duplicates().sort_values(["season", "week"])

hits = 0
total_weeks = 0
conflicts_with_judges = 0

for _, wk in unique_weeks.iterrows():
    s = wk["season"]
    t = wk["week"]

    sub = df[(df["season"] == s) & (df["week"] == t)].copy()
    sub = sub.sort_values("contestant_id")

    # Only simulate weeks with exactly one historical elimination
    if sub["eliminated"].sum() != 1:
        continue

    total_weeks += 1

    # Extract arrays
    J = sub["J_norm"].values.astype(float)
    eta = sub["eta_decomp"].values.astype(float)
    M = sub["momentum"].values.astype(float)
    elim_true = sub["eliminated"].values.astype(int)
    tau = sub["week_progress"].iloc[0]  # all same within week

    n = len(sub)

    # Fan shares for scoring and elimination
    F_score = softmax(GAMMA_SCORE * eta)
    F_elim = softmax(GAMMA_ELIM * eta)

    # Week-dependent weights
    alpha_J_t = ALPHA_J_START + (ALPHA_J_END - ALPHA_J_START) * tau
    alpha_M_t = ALPHA_M
    alpha_F_t = 1.0 - alpha_J_t - alpha_M_t

    # ==== NEW: log weights for this week ====
    weight_log.append({
        "season": s,
        "week": t,
        "alpha_J": alpha_J_t,
        "alpha_F": alpha_F_t,
        "alpha_M": alpha_M_t
    })

    # Risk score
    S = alpha_J_t * J + alpha_F_t * F_score + alpha_M_t * M

    # Ranks and protection sets
    rJ = dense_rank_descending(J)           # 1=best judge
    rF = dense_rank_descending(F_elim)      # 1=most fans

    # Indices (0..n-1)
    idx = np.arange(n)

    # Judge-shield: top-J, not bottom-fan
    # top-J = rank <= TOP_JUDGE_K
    # bottom-fan = among worst BOT_FAN_K by fans
    bottom_fan_threshold = n - BOT_FAN_K + 1  # e.g., if n=6, BOT_FAN_K=2 -> ranks >=5
    topJ_mask = rJ <= TOP_JUDGE_K
    botF_mask = rF >= bottom_fan_threshold
    P_judge = idx[topJ_mask & (~botF_mask)]

    # Fan-shield: strong favorites, not bottom-J
    bottom_judge_threshold = n - BOT_JUDGE_K + 1
    favF_mask = F_elim >= THETA_F
    botJ_mask = rJ >= bottom_judge_threshold
    P_fan = idx[favF_mask & (~botJ_mask)]

    protected = set(P_judge.tolist()) | set(P_fan.tolist())

    # Eligible for elimination
    eligible_mask = np.array([i not in protected for i in idx])
    eligible_idx = idx[eligible_mask]

    # Fallback: if somehow everyone is protected, relax shields
    if len(eligible_idx) == 0:
        eligible_idx = idx.copy()

    # Bottom-two by score S among eligible
    # (if only one eligible, that one plus best non-protected as backup)
    S_eligible = S[eligible_idx]
    order_elig = np.argsort(S_eligible)  # ascending = most at-risk
    if len(order_elig) >= 2:
        btm_indices = eligible_idx[order_elig[:2]]
    else:
        # only one eligible; pair with next-worst overall by S
        worst_idx = eligible_idx[order_elig[0]]
        other_candidates = idx[idx != worst_idx]
        second_idx = other_candidates[np.argsort(S[other_candidates])[0]]
        btm_indices = np.array([worst_idx, second_idx])

    # Final elimination decided by fans within bottom-two
    F_btm = F_elim[btm_indices]
    # Eliminate the one with LOWER fan share
    elim_btm_order = np.argsort(F_btm)     # ascending fan share => more likely eliminated
    elim_pred_idx = btm_indices[elim_btm_order[0]]

    # Compare with historical elimination
    true_elim_idx = idx[elim_true == 1][0]
    hit = int(elim_pred_idx == true_elim_idx)
    hits += hit

    # Conflict with judges: compare to worst judge (lowest J)
    worst_judge_idx = idx[np.argmin(J)]
    conflict = int(elim_pred_idx != worst_judge_idx)
    conflicts_with_judges += conflict

    # Fan variance (based on scoring fan shares)
    fan_var_list.append(np.var(F_score))

    # Save weekly result (for table / debugging)
    results.append({
        "season": s,
        "week": t,
        "alpha_J_t": alpha_J_t,
        "alpha_F_t": alpha_F_t,
        "alpha_M_t": alpha_M_t,
        "pred_elim_contestant_id": sub["contestant_id"].iloc[elim_pred_idx],
        "true_elim_contestant_id": sub["contestant_id"].iloc[true_elim_idx],
        "hit": hit,
        "judge_conflict": conflict
    })

    # ==== NEW: capture one representative week for decomposition plot ====
    # Take the first week where there is a judge–fan conflict (to make the plot interesting)
    if example_week_df is None and conflict == 1:
        example_week_df = pd.DataFrame({
            "season": s,
            "week": t,
            "contestant_id": sub["contestant_id"].values,
            "J_part": alpha_J_t * J,
            "F_part": alpha_F_t * F_score,
            "M_part": alpha_M_t * M,
            "S_total": S,
            "pred_eliminated": (idx == elim_pred_idx).astype(int),
            "true_eliminated": elim_true
        })

# ============================================================
# 4. Summary metrics
# ============================================================

hit_rate = hits / total_weeks if total_weeks > 0 else 0.0
conflict_rate = conflicts_with_judges / total_weeks if total_weeks > 0 else 0.0
mean_fan_var = float(np.mean(fan_var_list)) if fan_var_list else 0.0

print("\n=== Proposed system performance ===")
print(f"Weeks evaluated: {total_weeks}")
print(f"Hit rate (match historical elimination)      : {hit_rate:.3f}")
print(f"Conflict rate (≠ worst judge)                : {conflict_rate:.3f}")
print(f"Mean fan variance (inequality measure)       : {mean_fan_var:.4f}")

results_df = pd.DataFrame(results)
print("\nSample of weekly results:")
print(results_df.head())

# ==== NEW: build helper objects for plotting ====
weights_df = pd.DataFrame(weight_log)
cher_metrics = {
    "hit_rate": hit_rate,
    "conflict_rate": conflict_rate,
    "mean_fan_var": mean_fan_var
}

print("\nWeights_df head (for panel (b)):")
print(weights_df.head())

if example_week_df is not None:
    print("\nExample week for decomposition plot (panel (d)):")
    print(example_week_df.head())
else:
    print("\nWarning: no week with judge–fan conflict was found for example_week_df.")


## Figure: Performance of the Proposed CHER Elimination System

This figure summarizes the behavior and performance of the **CHER (Composite Hybrid Elimination Rule)** system by combining global performance metrics, dynamic weighting behavior, system-level comparisons, and a concrete weekly example.

The figure consists of four panels:

---

### (a) Accuracy–Conflict Trade-off

Each point represents one parameter configuration evaluated in the grid search.

- **x-axis:** Conflict rate — how often the system’s elimination disagrees with the lowest judge score.
- **y-axis:** Hit rate — how often the predicted elimination matches the historical outcome.
- **Color:** Mean fan variance — a measure of inequality in fan support within a week.

The red star highlights the **proposed CHER configuration**, illustrating where it lies in the trade-off space.  
This panel shows that CHER achieves a competitive hit rate while keeping judge–fan conflicts and fan inequality at moderate levels.

---

### (b) Dynamic Aggregation Weights

This panel shows the **average weekly aggregation weights** used by CHER:

- \(\alpha_J\): judge weight  
- \(\alpha_F\): fan weight  
- \(\alpha_M\): momentum weight  

Weights are averaged across seasons to avoid noisy season-specific trajectories.  
The plot illustrates the intended design:
- Judge influence increases over time,
- Fan influence decreases accordingly,
- Momentum remains fixed.

---

### (c) System-Level Comparison

This bar chart compares three systems:

- **Fan-only**
- **Judge-only**
- **CHER**

For each system, three metrics are shown:
- Hit rate
- Conflict rate
- Fan variance

CHER balances all three objectives, avoiding the extreme conflict of fan-only rules and the low responsiveness of judge-only rules, while maintaining lower fan inequality.

---

### (d) One-Week Score Decomposition

This panel provides a transparent, **contestant-level breakdown** for a representative week:

- Stacked bars show contributions from:
  - Judges
  - Fans
  - Momentum
- The dashed vertical line marks the **historically eliminated contestant**.

This visualization demonstrates how the final elimination emerges from the combined scoring components and highlights cases where fan and judge signals compete.

---

### Overall Interpretation

Together, the four panels show that CHER:
- Achieves strong agreement with historical eliminations,
- Smoothly transitions influence from fans to judges over the season,
- Maintains fairness through bounded fan inequality,
- Remains interpretable at both the system and individual-week level.

This makes CHER a transparent and robust alternative to fixed-rule elimination systems.


In [None]:
import matplotlib.pyplot as plt
import numpy as np

plt.style.use("seaborn-v0_8-whitegrid")  # nicer default look

fig, axs = plt.subplots(2, 2, figsize=(12, 9))
plt.subplots_adjust(hspace=0.35, wspace=0.3)

# =========================================================
# (a) Accuracy–Conflict Trade-off
# =========================================================
ax = axs[0, 0]

sc = ax.scatter(
    grid_results["conflict_rate"],
    grid_results["hit_rate"],
    c=grid_results["mean_fan_var"],
    cmap="viridis",
    s=18,
    alpha=0.85,
    linewidths=0
)

ax.scatter(
    cher_metrics["conflict_rate"],
    cher_metrics["hit_rate"],
    color="red",
    s=120,
    marker="*",
    label="Proposed CHER",
    zorder=5,
)

ax.set_xlabel("Conflict rate", fontsize=11)
ax.set_ylabel("Hit rate", fontsize=11)
ax.set_title("(a) Accuracy–Conflict Trade-off", fontsize=13)
ax.legend(frameon=True, fontsize=10)
cbar = plt.colorbar(sc, ax=ax)
cbar.set_label("Mean fan variance", fontsize=11)

# optional: tighten limits a bit
ax.set_xlim(grid_results["conflict_rate"].min() - 0.02,
            grid_results["conflict_rate"].max() + 0.02)
ax.set_ylim(grid_results["hit_rate"].min() - 0.01,
            grid_results["hit_rate"].max() + 0.01)

# =========================================================
# (b) Dynamic aggregation weights (clean version)
#    -> use mean across seasons instead of many zig-zag lines
# =========================================================
ax = axs[0, 1]

weights_mean = (
    weights_df
    .groupby("week", as_index=False)[["alpha_J", "alpha_F", "alpha_M"]]
    .mean()
    .sort_values("week")
)

ax.plot(
    weights_mean["week"], weights_mean["alpha_J"],
    marker="o", label=r"$\alpha_J$"
)
ax.plot(
    weights_mean["week"], weights_mean["alpha_F"],
    marker="s", label=r"$\alpha_F$"
)
ax.plot(
    weights_mean["week"], weights_mean["alpha_M"],
    marker="^", label=r"$\alpha_M$"
)

ax.set_xlabel("Week", fontsize=11)
ax.set_ylabel("Weight", fontsize=11)
ax.set_title("(b) Dynamic aggregation weights", fontsize=13)
ax.set_xticks(sorted(weights_mean["week"].unique()))
ax.set_ylim(0, 0.7)
ax.legend(frameon=True, fontsize=10)

# =========================================================
# (c) System-level comparison
# =========================================================
ax = axs[1, 0]

labels = ["Fan-only", "Judge-only", "CHER"]

hit = [
    baseline_metrics["fan"]["hit"],
    baseline_metrics["judge"]["hit"],
    cher_metrics["hit_rate"],
]

conflict = [
    baseline_metrics["fan"]["conflict"],
    baseline_metrics["judge"]["conflict"],
    cher_metrics["conflict_rate"],
]

fanvar = [
    baseline_metrics["fan"]["fan_var"],
    baseline_metrics["judge"]["fan_var"],
    cher_metrics["mean_fan_var"],
]

x = np.arange(len(labels))
w = 0.25

ax.bar(x - w, hit, width=w, label="Hit rate")
ax.bar(x, conflict, width=w, label="Conflict rate")
ax.bar(x + w, fanvar, width=w, label="Fan variance")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("Value", fontsize=11)
ax.set_title("(c) System-level comparison", fontsize=13)
ax.legend(frameon=True, fontsize=10)
ax.set_ylim(0, max(max(hit), max(conflict)) * 1.05)

# =========================================================
# (d) One-week score decomposition
# =========================================================
ax = axs[1, 1]

contestants = example_week_df["contestant_id"].tolist()
J = example_week_df["J_part"].values
F = example_week_df["F_part"].values
M = example_week_df["M_part"].values

x = np.arange(len(contestants))

ax.bar(x, J, label="Judges")
ax.bar(x, F, bottom=J, label="Fans")
ax.bar(x, M, bottom=J + F, label="Momentum")

true_elim_idx = np.where(example_week_df["true_eliminated"].values == 1)[0][0]

ax.axvline(
    true_elim_idx,
    color="red",
    linestyle="--",
    linewidth=2,
    label="Historically eliminated"
)

ax.set_xticks(x)
ax.set_xticklabels(contestants)
ax.set_xlabel("Contestant", fontsize=11)
ax.set_ylabel("Aggregate score", fontsize=11)
ax.set_title("(d) Weekly score decomposition", fontsize=13)
ax.legend(frameon=True, fontsize=10)

# =========================================================
# Global tweaks
# =========================================================
for ax in axs.flat:
    ax.tick_params(axis="both", labelsize=10)

fig.suptitle("Performance of the Proposed CHER Elimination System",
             fontsize=15, y=0.98)

plt.show()
