# Prologue

The goal is to explain why candidate X is ranked higher than candidate Y. The ranking is based on a weighted sum of scores. Instead of showing the raw formula, we want to provide a (1-1) trade-off explanation.

Definition:
- Pros ($P$): Subjects where $x$ scores better than $y$ (Contribution $> 0$).
- Cons ($C$): Subjects where $x$ scores worse than $y$ (Contribution $< 0$).
- Trade-off (1-1): A pair $(p, c)$ where $p \in Pros$ and $c \in Cons$ such that the advantage in $p$ is strictly greater than the disadvantage in $c$ (i.e., $contribution(p) + contribution(c) > 0$).
- Explanation: A set of disjoint trade-offs that covers every element in the set of Cons.

In [1]:
import sys, os, platform
venv_python = os.path.abspath("../venv/bin/python")
print("venv python:", venv_python)

venv python: /Users/felipebrito/Workspace/venv/bin/python


In [2]:
#!{venv_python} -m pip install -U pip ipykernel
#!{venv_python} -m ipykernel install --user --name venv_outro --display-name "Python (venv_outro)"

In [3]:
import sys
print(sys.executable)
#!{sys.executable} -m pip install -U pip
#!{sys.executable} -m pip install -U gurobipy

/Users/felipebrito/.pyenv/versions/3.12.3/bin/python


In [4]:
import numpy as np
import pandas as pd
import networkx as nx
import gurobipy as gp
#from scipy.optimize import linprog

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 140)

## Prologue data: Xavier (x) vs Yvonne (y)

Courses: A..G  
Weights and scores are exactly those shown in the prologue table.


In [5]:
courses = ["A","B","C","D","E","F","G"]

course_names = {
    "A": "Anatomie",
    "B": "Biologie",
    "C": "Chirurgie",
    "D": "Diagnostic",
    "E": "Epidemiologie",
    "F": "Forensic Pathology",
    "G": "Génétique",
}

w = {"A": 8, "B": 7, "C": 7, "D": 6, "E": 6, "F": 5, "G": 6}

x = {"A": 85, "B": 81, "C": 71, "D": 69, "E": 75, "F": 81, "G": 88}  # Xavier
y = {"A": 81, "B": 81, "C": 75, "D": 63, "E": 67, "F": 88, "G": 95}  # Yvonne

df_scores = pd.DataFrame(
    {
        "course": courses,
        "name": [course_names[c] for c in courses],
        "weight": [w[c] for c in courses],
        "x": [x[c] for c in courses],
        "y": [y[c] for c in courses],
    }
)
df_scores


Unnamed: 0,course,name,weight,x,y
0,A,Anatomie,8,85,81
1,B,Biologie,7,81,81
2,C,Chirurgie,7,71,75
3,D,Diagnostic,6,69,63
4,E,Epidemiologie,6,75,67
5,F,Forensic Pathology,5,81,88
6,G,Génétique,6,88,95


## Step 1 — Compute contributions


In [6]:
df = df_scores.copy()
df["delta"] = df["weight"] * (df["x"] - df["y"])
df["sign"] = np.where(df["delta"] > 0, "pro", np.where(df["delta"] < 0, "con", "neutral"))
df[["course","name","weight","x","y","delta","sign"]]


Unnamed: 0,course,name,weight,x,y,delta,sign
0,A,Anatomie,8,85,81,32,pro
1,B,Biologie,7,81,81,0,neutral
2,C,Chirurgie,7,71,75,-28,con
3,D,Diagnostic,6,69,63,36,pro
4,E,Epidemiologie,6,75,67,48,pro
5,F,Forensic Pathology,5,81,88,-35,con
6,G,Génétique,6,88,95,-42,con


## Step 2 — Build feasible (1–1) trade-offs

A feasible edge $(p,c)$ exists iff:
- $p \in pros(x,y)$, $c \in cons(x,y)$
- $\Delta_p + \Delta_c > 0$

We will build the edge list $A$.


In [7]:
pros = df.loc[df["delta"] > 0, "course"].tolist()
cons = df.loc[df["delta"] < 0, "course"].tolist()
neutral = df.loc[df["delta"] == 0, "course"].tolist()

delta = dict(zip(df["course"], df["delta"]))

A = []
for p in pros:
    for c in cons:
        margin = delta[p] + delta[c]
        if margin > 0:
            A.append((p, c, margin))

print("pros:", pros)
print("cons:", cons)
print("neutral:", neutral)
print("\nNumber of feasible (1–1) trade-offs:", len(A))

df_A = (
    pd.DataFrame(A, columns=["P (pro)", "C (con)", "margin"])
      .sort_values("margin", ascending=False)
)
df_A


pros: ['A', 'D', 'E']
cons: ['C', 'F', 'G']
neutral: ['B']

Number of feasible (1–1) trade-offs: 6


Unnamed: 0,P (pro),C (con),margin
3,E,C,20
4,E,F,13
1,D,C,8
5,E,G,6
0,A,C,4
2,D,F,1


## Step 3 — Linear optimization model (LP, integral by structure)

Decision variables: $z_{p,c} \in [0,1]$ for each feasible pair $(p,c) \in A$.

Constraints:
1. Each $c \in cons$ is covered exactly once:
$$
\sum_{p} z_{p,c} = 1
$$
2. Each $p \in pros$ is used at most once:
$$
\sum_{c} z_{p,c} \le 1
$$

Objective: maximize total margin $\sum (\Delta_p+\Delta_c) z_{p,c}$.


In [8]:
from gurobipy import GRB

def solve_explanation_1_1_gurobi(delta, pros, cons, feasible_edges, verbose=False):
    # feasible_edges: list of (p, c, margin)
    edges = [(p, c) for (p, c, m) in feasible_edges]
    margins = np.array([m for (_, _, m) in feasible_edges], dtype=float)

    n = len(edges)
    if n == 0:
        return {"status": "infeasible", "message": "No feasible trade-off edges.", "edges": edges}

    # Build model
    m = gp.Model("explanation_1_1")
    m.Params.OutputFlag = 1 if verbose else 0

    # Decision variables: choose trade-off edge j or not
    z = m.addVars(n, vtype=GRB.BINARY, name="z")

    # Objective: maximize total margin
    m.setObjective(gp.quicksum(margins[j] * z[j] for j in range(n)), GRB.MAXIMIZE)

    # Each con must be covered exactly once: sum_{j with c==con} z_j == 1
    con_to_js = {con: [] for con in cons}
    for j, (p, c) in enumerate(edges):
        if c in con_to_js:
            con_to_js[c].append(j)

    for con in cons:
        js = con_to_js.get(con, [])
        # If a con has no feasible incident edges, infeasible immediately
        if len(js) == 0:
            return {
                "status": "infeasible",
                "message": f"No feasible trade-off covers con={con}.",
                "edges": edges,
                "margins": margins,
            }
        m.addConstr(gp.quicksum(z[j] for j in js) == 1, name=f"cover_con[{con}]")

    # Each pro can be used at most once: sum_{j with p==pro} z_j <= 1
    pro_to_js = {pro: [] for pro in pros}
    for j, (p, c) in enumerate(edges):
        if p in pro_to_js:
            pro_to_js[p].append(j)

    for pro in pros:
        js = pro_to_js.get(pro, [])
        if len(js) > 0:
            m.addConstr(gp.quicksum(z[j] for j in js) <= 1, name=f"use_pro[{pro}]")

    # Optimize
    m.optimize()

    out = {"edges": edges, "margins": margins, "model_status": int(m.Status)}

    if m.Status == GRB.OPTIMAL:
        z_val = np.array([z[j].X for j in range(n)], dtype=float)
        chosen = [(edges[j][0], edges[j][1], float(margins[j]), float(z_val[j]))
                  for j in range(n) if z_val[j] > 0.5]
        out.update({"status": "feasible", "chosen": chosen, "z": z_val, "obj": float(m.ObjVal)})
        return out

    if m.Status == GRB.INFEASIBLE:
        # "certificate" style: compute IIS (subset of conflicting constraints)
        try:
            m.computeIIS()
            iis_cons = [c.ConstrName for c in m.getConstrs() if c.IISConstr]
            out.update({
                "status": "infeasible",
                "message": "Model infeasible (IIS computed).",
                "iis_constraints": iis_cons
            })
        except gp.GurobiError as e:
            out.update({"status": "infeasible", "message": f"Model infeasible. IIS failed: {e}"})
        return out

    out.update({"status": "infeasible", "message": "Optimization did not reach OPTIMAL (see model_status)."})
    return out
    
sol = solve_explanation_1_1_gurobi(delta, pros, cons, A, verbose=False)
sol["status"], sol.get("message",""), sol.get("obj", None)

Restricted license - for non-production use only - expires 2027-11-29


('feasible', '', 11.0)

## Step 4 — Extract the (1–1) explanation and verify the definition


In [9]:
def verify_explanation(chosen_pairs, pros, cons):
    P_used = [p for (p, c, m, z) in chosen_pairs]
    C_used = [c for (p, c, m, z) in chosen_pairs]

    ok_cover = (sorted(C_used) == sorted(cons))
    ok_disjoint_pro = (len(P_used) == len(set(P_used)))
    ok_disjoint_con = (len(C_used) == len(set(C_used)))

    return {
        "covers_all_cons": ok_cover,
        "disjoint_pros": ok_disjoint_pro,
        "disjoint_cons": ok_disjoint_con,
    }

if sol["status"] == "feasible":
    chosen = sol["chosen"]
    df_chosen = pd.DataFrame(
        [{"P": p, "C": c, "margin": m, "ΔP": delta[p], "ΔC": delta[c]} for (p,c,m,_) in chosen]
    ).sort_values("margin", ascending=False)
    display(df_chosen)

    checks = verify_explanation(chosen, pros, cons)
    print("Checks:", checks)

    explanation = [(p,c) for (p,c,_,_) in chosen]
    print("\nExplanation E =", explanation)


Unnamed: 0,P,C,margin,ΔP,ΔC
2,E,G,6.0,48,-42
0,A,C,4.0,32,-28
1,D,F,1.0,36,-35


Checks: {'covers_all_cons': True, 'disjoint_pros': True, 'disjoint_cons': True}

Explanation E = [('A', 'C'), ('D', 'F'), ('E', 'G')]


## Step 5 — Certificate of non-existence (Hall-violation subset)

When no explanation (1–1) exists, the feasible edges $A$ define a bipartite graph between:
- Left = cons(x,y)
- Right = pros(x,y)

A (1–1) explanation exists **iff** there is a matching that covers **all cons**.

If not, we return a **Hall certificate**: a subset $S \subseteq cons$ with
$$
|N(S)| < |S|
$$
where $N(S)$ is the neighborhood of $S$ in the feasible-edge graph.


In [10]:
def hall_certificate_from_feasible_edges(cons, pros, feasible_edges):
    G2 = nx.Graph()
    G2.add_nodes_from(cons, bipartite=0)
    G2.add_nodes_from(pros, bipartite=1)
    G2.add_edges_from([(c, p) for (p,c,_) in feasible_edges])

    matching = nx.algorithms.bipartite.matching.maximum_matching(G2, top_nodes=cons)
    matched_cons = [c for c in cons if c in matching]

    if len(matched_cons) == len(cons):
        return {"status": "feasible", "matching": matching, "certificate": None}

    matched_edges = {(c, matching[c]) for c in cons if c in matching}

    from collections import deque
    unmatched = [c for c in cons if c not in matching]
    Z_left = set(unmatched)
    q = deque(unmatched)
    Z_right = set()

    while q:
        c = q.popleft()
        for p in G2.neighbors(c):
            if (c, p) in matched_edges:
                continue
            if p in Z_right:
                continue
            Z_right.add(p)
            if p in matching:
                c2 = matching[p]
                if c2 not in Z_left:
                    Z_left.add(c2)
                    q.append(c2)

    S = Z_left
    N = set()
    for c in S:
        N |= set(G2.neighbors(c))

    return {
        "status": "infeasible",
        "matching_size": len(matched_cons),
        "cons_size": len(cons),
        "hall_S": sorted(S),
        "hall_neighbors": sorted(N),
        "hall_sizes": {"|S|": len(S), "|N(S)|": len(N)},
        "matching": matching,
    }

if sol["status"] == "infeasible":
    cert = hall_certificate_from_feasible_edges(cons, pros, A)
    cert
else:
    print("Feasible on the prologue instance. Hall certificate not needed here.")


Feasible on the prologue instance. Hall certificate not needed here.


# Question 1 — (1–1) explanations via linear optimization

These steps where already done in the prologue, but will be repeated here with the different table:

1. **Formulate a linear optimization program** that computes a **(1–1) explanation** of the comparison $x \succ y$ **when it exists**.
2. **Return a certificate of non-existence** when no (1–1) explanation exists.
3. **Implement the formulation using the Gurobi Optimizer solver**, and extract the explanation.

In addition, we will use the larger candidate table and give the required **simple argument** that there is **no (1–1) explanation** for $w \succ w'$.

## Step 0 — Data (courses, weights, and candidate scores)

We encode the table given in the statement (candidates $x, y, z, t, u, v, w, w'$) and the weights.

In [11]:
courses = ["A","B","C","D","E","F","G"]

weights = {"A": 8, "B": 7, "C": 7, "D": 6, "E": 6, "F": 5, "G": 6}

scores = {
    "x":  {"A":85,"B":81,"C":71,"D":69,"E":75,"F":81,"G":88},
    "y":  {"A":81,"B":81,"C":75,"D":63,"E":67,"F":88,"G":95},
    "z":  {"A":74,"B":89,"C":74,"D":81,"E":68,"F":84,"G":79},
    "t":  {"A":74,"B":71,"C":84,"D":91,"E":77,"F":76,"G":73},
    "u":  {"A":72,"B":75,"C":66,"D":85,"E":88,"F":66,"G":93},
    "v":  {"A":71,"B":73,"C":63,"D":92,"E":76,"F":79,"G":93},
    "w":  {"A":79,"B":69,"C":78,"D":76,"E":67,"F":84,"G":79},
    "w'": {"A":57,"B":76,"C":81,"D":76,"E":82,"F":86,"G":77},
}

df_scores = pd.DataFrame(
    [{"candidate": cand, **scores[cand]} for cand in scores.keys()]
).set_index("candidate")[courses]

df_scores.loc[list(scores.keys())]

Unnamed: 0_level_0,A,B,C,D,E,F,G
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
x,85,81,71,69,75,81,88
y,81,81,75,63,67,88,95
z,74,89,74,81,68,84,79
t,74,71,84,91,77,76,73
u,72,75,66,85,88,66,93
v,71,73,63,92,76,79,93
w,79,69,78,76,67,84,79
w',57,76,81,76,82,86,77


## Step 1 — Compute contributions $\Delta_k$ and the sets pros/cons/neutral

For a comparison $x \succ y$, define:
\[
\Delta_k = w_k (x_k - y_k).
\]

- $pros = \{k : \Delta_k > 0\}$
- $cons = \{k : \Delta_k < 0\}$
- $neutral = \{k : \Delta_k = 0\}$

In [12]:
def compute_deltas(candidate_x: str, candidate_y: str, df_scores: pd.DataFrame, weights: dict):
    x = df_scores.loc[candidate_x].to_dict()
    y = df_scores.loc[candidate_y].to_dict()
    
    delta = {k: weights[k] * (x[k] - y[k]) for k in courses}

    pros = [k for k in courses if delta[k] > 0]
    cons = [k for k in courses if delta[k] < 0]
    neutral = [k for k in courses if delta[k] == 0]

    df = pd.DataFrame(
        {
            "course": courses,
            candidate_x: [x[k] for k in courses],
            candidate_y: [y[k] for k in courses],
            "w": [weights[k] for k in courses],
            "Δ = w(x−y)": [delta[k] for k in courses],
        }
    )
    return delta, pros, cons, neutral, df

## Step 2 — Build feasible (1–1) trade-offs

A (1–1) trade-off is a pair $(p,c)$ with:
- $p \in pros$,
- $c \in cons$,
- and **positive combined contribution**:
$$
\Delta_p + \Delta_c > 0.
$$

We compute the feasible edge list $A$.

In [13]:
def feasible_tradeoffs_1_1(delta: dict, pros: list, cons: list):
    A = []
    for p in pros:
        for c in cons:
            m = delta[p] + delta[c]
            if m > 0:
                A.append((p, c, m))
    return A

delta_xy, pros_xy, cons_xy, neutral_xy, df_xy = compute_deltas("x", "y", df_scores, weights)
df_xy
A_xy = feasible_tradeoffs_1_1(delta_xy, pros_xy, cons_xy)
pd.DataFrame(A_xy, columns=["P (pro)","C (con)","margin"]).sort_values("margin", ascending=False)

Unnamed: 0,P (pro),C (con),margin
3,E,C,20
4,E,F,13
1,D,C,8
5,E,G,6
0,A,C,4
2,D,F,1


## Step 3 — Linear optimization program for a (1–1) explanation

In [None]:
def solve_explanation_1_1_lp(delta: dict, pros: list, cons: list, feasible_edges: list):
    edges = [(p, c) for (p, c, _) in feasible_edges]
    margins = np.array([m for (_, _, m) in feasible_edges], dtype=float)
    n = len(edges)

    if n == 0:
        return {"status": "infeasible", "message": "No feasible trade-off edges.", "edges": edges}

    model = gp.Model("explanation_1_1_lp")
    model.Params.OutputFlag = 0

    z = model.addVars(n, lb=0.0, ub=1.0, vtype=GRB.CONTINUOUS, name="z")

    model.setObjective(gp.quicksum(margins[j] * z[j] for j in range(n)), GRB.MAXIMIZE)

    # Equality constraints
    for con in cons:
        model.addConstr(
            gp.quicksum(z[j] for j, (p, c) in enumerate(edges) if c == con) == 1.0,
            name=f"cover_con[{con}]",
        )

    # Inequality constraints:
    for pro in pros:
        model.addConstr(
            gp.quicksum(z[j] for j, (p, c) in enumerate(edges) if p == pro) <= 1.0,
            name=f"use_pro[{pro}]",
        )

    # Optimize
    model.optimize()

    out = {"edges": edges, "margins": margins, "model_status": int(model.Status)}
    if model.Status == GRB.OPTIMAL:
        z_val = np.array([z[j].X for j in range(n)], dtype=float)
        chosen = [(edges[j][0], edges[j][1], margins[j], z_val[j]) for j in range(n) if z_val[j] > 0.5]
        out.update({"status": "feasible", "chosen": chosen, "z": z_val, "obj": float(model.ObjVal)})
    elif model.Status == GRB.INFEASIBLE:
        out.update({"status": "infeasible", "message": "Model infeasible."})
    else:
        out.update({"status": "infeasible", "message": f"Optimization ended with status {model.Status}."})

    return out


sol_xy = solve_explanation_1_1_lp(delta_xy, pros_xy, cons_xy, A_xy)
sol_xy["status"], sol_xy.get("message","")

('feasible', '')

## Step 4 — Extract and verify the (1–1) explanation (when feasible)

In [15]:
def verify_explanation(chosen_pairs, pros, cons):
    P_used = [p for (p, c, m, z) in chosen_pairs]
    C_used = [c for (p, c, m, z) in chosen_pairs]
    return {
        "covers_all_cons_exactly": sorted(C_used) == sorted(cons),
        "pros_disjoint": len(P_used) == len(set(P_used)),
        "cons_disjoint": len(C_used) == len(set(C_used)),
    }

if sol_xy["status"] == "feasible":
    chosen = sol_xy["chosen"]
    df_chosen = pd.DataFrame(
        [{"P": p, "C": c, "ΔP": delta_xy[p], "ΔC": delta_xy[c], "margin": m} for (p,c,m,_) in chosen]
    ).sort_values("margin", ascending=False)
    display(df_chosen)
    print("Checks:", verify_explanation(chosen, pros_xy, cons_xy))

Unnamed: 0,P,C,ΔP,ΔC,margin
2,E,G,48,-42,6.0
0,A,C,32,-28,4.0
1,D,F,36,-35,1.0


Checks: {'covers_all_cons_exactly': True, 'pros_disjoint': True, 'cons_disjoint': True}


## Step 5 — Certificate of non-existence (when infeasible)

A (1–1) explanation exists **iff** the bipartite graph $(cons, pros, A)$ has a matching covering all nodes in **cons**.

When infeasible, we output a **Hall certificate**: a subset $S \subseteq cons$ such that
\[
|N(S)| < |S|,
\]
where $N(S)$ is the set of pros connected to $S$ via feasible trade-offs.

In [16]:
def hall_certificate(cons, pros, feasible_edges):
    G = nx.Graph()
    G.add_nodes_from(cons, bipartite=0)
    G.add_nodes_from(pros, bipartite=1)
    G.add_edges_from([(c, p) for (p, c, _) in feasible_edges])

    matching = nx.algorithms.bipartite.matching.maximum_matching(G, top_nodes=cons)
    matched_cons = [c for c in cons if c in matching]
    if len(matched_cons) == len(cons):
        return {"status": "feasible", "matching": matching, "certificate": None}

    matched_edges = {(c, matching[c]) for c in cons if c in matching}

    from collections import deque
    unmatched = [c for c in cons if c not in matching]

    Z_left = set(unmatched)
    Z_right = set()
    q = deque(unmatched)

    while q:
        c = q.popleft()
        for p in G.neighbors(c):
            if (c, p) in matched_edges:
                continue
            if p in Z_right:
                continue
            Z_right.add(p)
            if p in matching:
                c2 = matching[p]
                if c2 not in Z_left:
                    Z_left.add(c2)
                    q.append(c2)

    S = Z_left
    N = set()
    for c in S:
        N |= set(G.neighbors(c))

    return {
        "status": "infeasible",
        "hall_subset_S": sorted(S),
        "neighbors_N(S)": sorted(N),
        "sizes": {"|S|": len(S), "|N(S)|": len(N)},
        "matching_size": len(matched_cons),
    }

# Required part on the extended table: show no (1–1) explanation for $w \succ w'$

We now consider the comparison $w \succ w'$ using the table.

### Simple argument (the one-line proof)
In a (1–1) explanation, **each** course in $cons(w,w')$ must be paired with a **distinct** course in $pros(w,w')$ (disjointness).

Therefore, a necessary condition is:
$$
|pros(w,w')| \ge |cons(w,w')|.
$$

We will compute these sets and show that this condition fails, hence **no (1–1) explanation exists**.

In [17]:
delta_ww, pros_ww, cons_ww, neutral_ww, df_ww = compute_deltas("w", "w'", df_scores, weights)
df_ww

Unnamed: 0,course,w,w',Δ = w(x−y)
0,A,8,57,176
1,B,7,76,-49
2,C,7,81,-21
3,D,6,76,0
4,E,6,82,-90
5,F,5,86,-10
6,G,6,77,12


In [18]:
print("pros(w,w'):", pros_ww)
print("cons(w,w'):", cons_ww)
print("neutral(w,w'):", neutral_ww)
print("Counts: |pros| =", len(pros_ww), ", |cons| =", len(cons_ww))

pros(w,w'): ['A', 'G']
cons(w,w'): ['B', 'C', 'E', 'F']
neutral(w,w'): ['D']
Counts: |pros| = 2 , |cons| = 4


### Conclusion (simple argument)
Here, $|pros(w,w')| < |cons(w,w')|$.  
So it is **impossible** to cover every con course with a distinct pro course using only (1–1) disjoint pairs.  
Hence **no explanation of type (1–1)** exists for $w \succ w'$.

### Solver confirmation + certificate of non-existence (Hall certificate)

We also run the Q1 LP model and output a Hall-type certificate.

In [19]:
A_ww = feasible_tradeoffs_1_1(delta_ww, pros_ww, cons_ww)
pd.DataFrame(A_ww, columns=["P (pro)","C (con)","margin"]).sort_values("margin", ascending=False)

Unnamed: 0,P (pro),C (con),margin
3,A,F,166
1,A,C,155
0,A,B,127
2,A,E,86
4,G,F,2


In [20]:
sol_ww = solve_explanation_1_1_lp(delta_ww, pros_ww, cons_ww, A_ww)
sol_ww["status"], sol_ww.get("message","")

('infeasible', 'Model infeasible.')

In [21]:
cert_ww = hall_certificate(cons_ww, pros_ww, A_ww)
cert_ww

{'status': 'infeasible',
 'hall_subset_S': ['B', 'C', 'E'],
 'neighbors_N(S)': ['A'],
 'sizes': {'|S|': 3, '|N(S)|': 1},
 'matching_size': 2}

# Question 2 — (1–m) explanations via linear optimization

These steps where already done in the prologue, but will be repeated here with the different table:

1. **Formulate a linear optimization program** that computes a **(1–m) explanation** of the comparison $x \succ y$ **when it exists**.
2. **Return a certificate of non-existence** when no (1–m) explanation exists.
3. **Implement the formulation using the Gurobi Optimizer solver**, and extract the explanation.

We extend the definition of a trade-off. A **(1–m) trade-off** is a pair $(p, C_{set})$ where:
- $p \in pros$
- $C_{set} \subseteq cons$ is a subset of cons.
- The pro covers the sum of the cons: $\Delta_p + \sum_{c \in C_{set}} \Delta_c > 0$.

An **explanation** is a collection of disjoint trade-offs that cover **all** cons.

### LP Formulation
Since the number of subsets of $cons$ is exponential, we cannot pre-calculate all edges. Instead, we treat this as a **Generalized Assignment Problem** (or Bin Packing variant). We want to assign every element of $cons$ to exactly one element of $pros$ such that the capacity constraint (positive margin) is respected.

**Variables:**
$x_{p,c} \in \{0, 1\}$: equal to 1 if con $c$ is assigned to pro $p$.

**Constraints:**
1. **Cover:** Each con $c$ must be assigned to exactly one pro.
   $$\sum_{p \in pros} x_{p,c} = 1, \quad \forall c \in cons$$
2. **Capacity (Strict Margin):** For each pro $p$, the sum of the negative deltas assigned to it must be strictly less than $\Delta_p$.
   $$\sum_{c \in cons} |\Delta_c| \cdot x_{p,c} \le \Delta_p - 1, \quad \forall p \in pros$$
   *(Note: Since inputs are integers, $>0$ is equivalent to $\ge 1$)*.

**Objective:**
Maximize feasibility or total margin.

In [None]:
def solve_explanation_1_m_lp(delta, pros, cons):
    m = gp.Model("explanation_1_m")
    m.Params.OutputFlag = 0
    
    # 1. Decision Variables: x[p,c] = 1 if pro p covers con c
    x = m.addVars(pros, cons, vtype=GRB.BINARY, name="x")
    
    # 2. Constraints
    
    # (a) Every con must be covered exactly once
    for c in cons:
        m.addConstr(gp.quicksum(x[p, c] for p in pros) == 1, name=f"cover_{c}")
        
    # (b) Capacity: Sum of |Delta_c| assigned to p must be < Delta_p
    #     Since values are integers, sum(|Delta_c|) <= Delta_p - 1
    for p in pros:
        capacity = delta[p]
        m.addConstr(
            gp.quicksum(x[p, c] * abs(delta[c]) for c in cons) <= capacity - 1, 
            name=f"capacity_{p}"
        )

    # 3. Objective: Maximize total margin (optional, feasibility is enough)
    obj_expr = gp.quicksum(
        (delta[p] / len(cons)) - x[p,c] * abs(delta[c]) 
        for p in pros for c in cons
    )
    m.setObjective(obj_expr, GRB.MAXIMIZE)
    
    m.optimize()
    
    # 4. Extract solution
    if m.Status == GRB.OPTIMAL:
        tradeoffs = []
        for p in pros:
            assigned_cons = [c for c in cons if x[p,c].X > 0.5]
            if assigned_cons:
                sum_cons = sum(delta[c] for c in assigned_cons)
                margin = delta[p] + sum_cons
                tradeoffs.append({
                    "pro": p,
                    "cons_set": assigned_cons,
                    "pro_weight": delta[p],
                    "cons_weight": sum_cons,
                    "net_margin": margin
                })
        
        return {"status": "feasible", "explanation": tradeoffs}
    
    elif m.Status == GRB.INFEASIBLE:
        # Check for global capacity violation (Simple certificate)
        total_pros = sum(delta[p] for p in pros)
        total_cons_abs = sum(abs(delta[c]) for c in cons)
        
        msg = "Model Infeasible."
        if total_pros <= total_cons_abs:
            msg += f" Global Capacity Violation: Sum(Pros)={total_pros} <= Sum(|Cons|)={total_cons_abs}."
            
        return {"status": "infeasible", "message": msg}
        
    return {"status": "error", "message": "Optimization failed"}

# Test on the w > w'
delta_ww, pros_ww, cons_ww, _, _ = compute_deltas("w", "w'", df_scores, weights)
sol_ww_1m = solve_explanation_1_m_lp(delta_ww, pros_ww, cons_ww)

if sol_ww_1m["status"] == "feasible":
    print("Feasible explanation found for w > w' (1-m):")
    df_res = pd.DataFrame(sol_ww_1m["explanation"])
    display(df_res)
else:
    print("No explanation for w > w':", sol_ww_1m["message"])

Feasible explanation found for w > w' (1-m):


Unnamed: 0,pro,cons_set,pro_weight,cons_weight,net_margin
0,A,"[B, C, E, F]",176,-170,6


### Why does the Solver's solution differ from the example?

The solver returned the explanation **`{(A, [B, C, E, F])}`**, concentrating the entire load on criterion **A**, whereas the problem statement suggests splitting the load with criterion **G** (`{(A, ...), (G, F)}`).

**Both solutions are mathematically optimal and equivalent.**

Since the total sum of *Pros* and the total sum of *Cons* are constant, the global "net total" remains the same ($18$), regardless of how we distribute the debts:

1.  **Solver's Solution (Concentrated):**
    * **A** covers everything ($176 - 170 = \mathbf{6}$ margin).
    * **G** is unused (preserves its $12$).
    * *Global Net:* $6 + 12 = \mathbf{18}$.

2.  **Statement's Solution (Distributed):**
    * **A** covers the majority ($176 - 160 = 16$ margin).
    * **G** covers F ($12 - 10 = 2$ margin).
    * *Global Net:* $16 + 2 = \mathbf{18}$.

**Conclusion:** The solver chose the first option because it is **sparser** (simpler), as it uses only 1 positive argument to explain the decision, instead of 2.

## Application to $u \succ v$

We now analyze the comparison between candidate $u$ and $v$. The PDF asks to show that **no (1-m) explanation exists**.

**Certificate of Non-Existence (Global Capacity):**
For a (1-m) explanation to exist, the sum of the advantages (Pros) must strictly exceed the sum of the absolute disadvantages (Cons).
$$\sum_{p \in pros} \Delta_p > \sum_{c \in cons} |\Delta_c|$$
This is a necessary condition (relaxing the disjointness and bin-packing constraints). If this holds, it might still be infeasible due to fragmentation, but if this fails, it is definitely impossible.

In [23]:
# 1. Compute Deltas for u vs v
delta_uv, pros_uv, cons_uv, _, df_uv = compute_deltas("u", "v", df_scores, weights)

print(f"Comparison u > v")
print(f"Pros: {pros_uv}")
print(f"Cons: {cons_uv}")
display(df_uv[["course", "u", "v", "Δ = w(x−y)"]])

# 2. Run the Solver
sol_uv = solve_explanation_1_m_lp(delta_uv, pros_uv, cons_uv)

# 3. Analyze results
if sol_uv["status"] == "infeasible":
    print("\nResult: NO (1-m) explanation exists.")
    print(f"Solver Message: {sol_uv['message']}")
    
    # Explicit Calculation of the Certificate
    sum_pros = sum(delta_uv[p] for p in pros_uv)
    sum_cons_abs = sum(abs(delta_uv[c]) for c in cons_uv)
    
    print("\n--- Certificate of Non-Existence ---")
    print(f"Total 'Pro' Capacity: {sum_pros}")
    print(f"Total 'Con' Load:     {sum_cons_abs}")
    print(f"Net Total:            {sum_pros - sum_cons_abs}")
    
    if sum_pros <= sum_cons_abs:
        print("Conclusion: Since Total Pros <= Total |Cons|, it is mathematically impossible to cover all cons with positive margins.")
else:
    print("Explanation found (Unexpected):")
    display(pd.DataFrame(sol_uv["explanation"]))

Comparison u > v
Pros: ['A', 'B', 'C', 'E']
Cons: ['D', 'F']


Unnamed: 0,course,u,v,Δ = w(x−y)
0,A,72,71,8
1,B,75,73,14
2,C,66,63,21
3,D,85,92,-42
4,E,88,76,72
5,F,66,79,-65
6,G,93,93,0



Result: NO (1-m) explanation exists.
Solver Message: Model Infeasible.

--- Certificate of Non-Existence ---
Total 'Pro' Capacity: 115
Total 'Con' Load:     107
Net Total:            8


## Analysis of Infeasibility: $u \succ v$

### 1. Why is the model Infeasible despite Positive Net Total?

The result `Model Infeasible` for $u \succ v$ is correct and expected given the strict **(1-m)** definition.

#### The Reason: Fluid vs. Discrete Capacity (Bin Packing)
The condition $\sum \Delta_{pros} > \sum |\Delta_{cons}|$ is a **necessary but not sufficient** condition. It assumes capacity is a "fluid" resource (like water) that can be poured arbitrarily to fill the "holes" (cons). However, the (1-m) explanation model treats capacity as **discrete blocks** (like rigid boxes) due to the constraint that **one Pro must cover the entire set of assigned Cons**.

In the specific case of $u \succ v$:
* **Total Capacity:** We have a Net Total surplus ($115 > 107$), but the 'Pro' capacity is **fragmented** across small values ($21, 14, 8$) with only one large value ($72$).
* **Total Load:** The 'Con' loads are concentrated in two **large, indivisible blocks** ($65$ and $42$).
* **The Conflict:**
    1.  To cover the largest Con $F$ ($65$), we *must* use the largest Pro $E$ ($72$).
    2.  After using $E$, the remaining capacity is $7$ (from $E$) plus the smaller Pros ($21, 14, 8$).
    3.  The remaining Con $D$ requires $42$. None of the remaining Pros ($21, 14, 8$) are large enough to cover $42$ individually, and the model forbids summing multiple separate Pros to cover a single Con.

Therefore, the model is infeasible due to a **granularity mismatch**: we cannot aggregate multiple small Pros to pay off a single large Con.

---

### 2. A Stricter Existence Condition: $k$-Prefix Sum

To avoid running the full LP solver to detect this specific type of failure, we can define a stricter necessary condition based on **Majorization** logic.

Let $P$ be the list of Pro values sorted in descending order ($p_1 \ge p_2 \ge \dots$) and $C$ be the list of absolute Con values sorted in descending order ($c_1 \ge c_2 \ge \dots$).

For a (1-m) explanation to exist, it is **necessary** that for every $k$ (from 1 to $|Cons|$), the sum of the $k$ largest Pros is strictly greater than the sum of the $k$ largest Cons.

$$
\sum_{i=1}^{k} p_i > \sum_{i=1}^{k} c_i \quad \forall k \in \{1, \dots, |Cons|\}
$$

#### Proof of Infeasibility for $u \succ v$ using this condition:

* **Sorted Pros ($P$):** $[72, 21, 14, 8]$
* **Sorted Cons ($C$):** $[65, 42]$

**Check for $k=1$:**
$$72 > 65 \quad (\text{True})$$

**Check for $k=2$:**
$$
(72 + 21) > (65 + 42) \\
93 > 107 \quad (\textbf{False})
$$

Since the condition fails for $k=2$, we can mathematically prove no (1-m) explanation exists without needing the solver.

In [24]:
def check_k_prefix_condition(delta, pros, cons):
    """
    Checks the necessary k-Prefix Sum condition.
    Returns (True, None) if passed, or (False, message) if failed.
    """
    # 1. Get values and sort descending
    P_vals = sorted([delta[p] for p in pros], reverse=True)
    C_vals = sorted([abs(delta[c]) for c in cons], reverse=True)
    
    # 2. Check cumulative sums for k = 1 to |Cons|
    current_p_sum = 0
    current_c_sum = 0
    
    # We only need to check up to the number of Cons (or Pros, whichever is smaller/relevant)
    # But strictly, we need to cover all Cons, so we iterate k up to len(Cons)
    limit = min(len(P_vals), len(C_vals))
    
    for k in range(limit):
        current_p_sum += P_vals[k]
        current_c_sum += C_vals[k]
        
        if current_p_sum <= current_c_sum:
            return False, f"Failed at k={k+1}: Sum(Top-{k+1} Pros)={current_p_sum} <= Sum(Top-{k+1} Cons)={current_c_sum}"
            
    # Check if total pros are exhausted but cons remain (if len(C) > len(P))
    if len(C_vals) > len(P_vals):
         return False, "Not enough Pro items to cover Cons (Cardinality mismatch)."

    return True, "Condition Passed"

# Test on u > v
passed, msg = check_k_prefix_condition(delta_uv, pros_uv, cons_uv)
print(f"Strong Condition Check for u > v: {msg}")

Strong Condition Check for u > v: Failed at k=2: Sum(Top-2 Pros)=93 <= Sum(Top-2 Cons)=107


# Question 3: (m-1) Explanations

We now look for explanations where **multiple Pros** are combined to compensate for a **single Con**.
An explanation $E$ is a collection of disjoint sets of Pros $\{\mathcal{P}_1, \dots, \mathcal{P}_k\}$, where each set $\mathcal{P}_i$ covers exactly one Con $C_i$, covering all Cons.

### Optimization Model
This is a **Many-to-One** assignment problem.
* **Decision Variables:** $x_{c,p} = 1$ if Pro $p$ is used to cover Con $c$.
* **Constraints:**
    1.  Each Pro $p$ is used at most once.
    2.  Each Con $c$ must achieve a positive net margin ($\sum \Delta_{pros} > |\Delta_{c}|$).

In [None]:
def solve_explanation_m_1_lp(delta, pros, cons):
    """
    Solves for an (m-1) explanation: Multiple Pros cover Single Con.
    """
    
    m = gp.Model("explanation_m_1")
    m.Params.OutputFlag = 0

    x = m.addVars(cons, pros, vtype=GRB.BINARY, name="x")

    for p in pros:
        m.addConstr(gp.quicksum(x[c, p] for c in cons) <= 1, name=f"use_pro_{p}_once")

    for c in cons:
        con_load = abs(delta[c])
        m.addConstr(
            gp.quicksum(x[c, p] * delta[p] for p in pros) >= con_load,
            name=f"cover_con_{c}"
        )

    obj_expr = gp.quicksum(
        (x[c,p] * delta[p]) - (abs(delta[c]) / len(pros)) 
        for c in cons for p in pros
    )
    m.setObjective(obj_expr, GRB.MAXIMIZE)

    m.optimize()

    if m.Status == GRB.OPTIMAL:
        tradeoffs = []
        for c in cons:
            assigned_pros = [p for p in pros if x[c,p].X > 0.5]
            
            sum_pros = sum(delta[p] for p in assigned_pros)
            margin = sum_pros - abs(delta[c])
            
            tradeoffs.append({
                "con": c,
                "pros_set": assigned_pros,
                "con_weight": delta[c], # Negative value
                "pros_weight": sum_pros,
                "net_margin": margin
            })
        
        return {"status": "feasible", "explanation": tradeoffs}

    elif m.Status == GRB.INFEASIBLE:
        sum_pros = sum(delta[p] for p in pros)
        sum_cons = sum(abs(delta[c]) for c in cons)
        
        msg = "Model Infeasible."
        if sum_pros <= sum_cons:
            msg += f" Global Capacity Violation: Total Pros ({sum_pros}) <= Total |Cons| ({sum_cons})."
        else:
            msg += " Feasible by global sum, but infeasible due to fragmentation (Bin Packing constraints)."
            
        return {"status": "infeasible", "message": msg}

    return {"status": "error", "message": "Optimization failed"}

## Application: Comparing $y \succ z$

### Theoretical Logic for $y \succ z$

For $y \succ z$, a direct (1-1) explanation is impossible due to the uneven distribution of scores. The **(m-1)** structure solves this by allowing **coalitions** of positive criteria.

1.  **Overcoming the Critical Weakness:** Candidate $y$ has a massive deficit in **Diagnostic (D, -108)**. No single "Pro" is strong enough to cover it. The model creates a **strong coalition** of **Genetics (G, +96)** and **Forensic Pathology (F, +20)** to overcome this ($116 > 108$).

2.  **Exact Compensation:** The deficit in **Biology (B, -56)** is exactly matched by the advantage in **Anatomie (A, +56)**. This "tie" ($56 = 56$) requires relaxing the strict inequality condition to allow **neutralization**.

3.  **Simple Coverage:** The small remaining deficit in **Epidemiologie (E, -6)** is easily covered by **Chirurgie (C, +7)**.


We apply the (m-1) model to the comparison between candidate $y$ and $z$.

In [None]:
# 1. Compute Deltas for y vs z
delta_yz, pros_yz, cons_yz, _, df_yz = compute_deltas("y", "z", df_scores, weights)

print(f"Comparison y > z")
print(f"Pros: {pros_yz}")
print(f"Cons: {cons_yz}")
display(df_yz[["course", "y", "z", "Δ = w(x−y)"]])

# 2. Run the (m-1) Solver
sol_yz = solve_explanation_m_1_lp(delta_yz, pros_yz, cons_yz)

# 3. Output
if sol_yz["status"] == "feasible":
    print("\nFeasible (m-1) explanation found:")
    df_res = pd.DataFrame(sol_yz["explanation"])
    cols = ["con", "pros_set", "con_weight", "pros_weight", "net_margin"]
    display(df_res[cols])
else:
    print("\nResult: NO (m-1) explanation exists.")
    print(f"Solver Message: {sol_yz['message']}")
    
    # Detailed check for the user
    sum_pros = sum(delta_yz[p] for p in pros_yz)
    sum_cons = sum(abs(delta_yz[c]) for c in cons_yz)
    print("\n--- Certificate Stats ---")
    print(f"Total Pro Capacity: {sum_pros}")
    print(f"Total Con Load:     {sum_cons}")

Comparison y > z
Pros: ['A', 'C', 'F', 'G']
Cons: ['B', 'D', 'E']


Unnamed: 0,course,y,z,Δ = w(x−y)
0,A,81,74,56
1,B,81,89,-56
2,C,75,74,7
3,D,63,81,-108
4,E,67,68,-6
5,F,88,84,20
6,G,95,79,96



Feasible (m-1) explanation found:


Unnamed: 0,con,pros_set,con_weight,pros_weight,net_margin
0,B,[A],-56,56,0
1,D,"[F, G]",-108,116,8
2,E,[C],-6,7,1


> **Note:**
> For the comparison $y \succ z$, the standard strict model returns **Infeasible**. This happens because the pro-course **A** has a weight of **56**, which exactly matches the deficit of the con-course **B** (**-56**).
>
> Since the strict definition requires a strictly positive margin ($\text{margin} > 0$), the pair $(A, B)$ is rejected ($56 \ngtr 56$). To generate the explanation shown below, we must **relax the condition** to allow for **weak preference** ($\text{margin} \ge 0$), accepting trade-offs where the advantages exactly offset the disadvantages without a surplus.

### Analysis of Comparison $z \succ t$

**1. Failure of pure explanation types:**
* **(1-m) Failure:** The dominant Pro is **B (+126)**. The Cons are **C (-70), D (-60), E (-54)**. Pro B can cover $\{C, E\}$ ($126 > 124$), but cannot cover $\{C, D\}$ ($126 < 130$). Once B is used, the remaining Pros ($40, 36$) are too weak individually to cover the remaining Con ($D, -60$).
* **(m-1) Failure:** To cover the large Con **C (-70)** without wasting the massive Pro B, we would ideally use $\{F, G\}$ ($40+36 = 76$). However, this leaves Pro B alone against Cons $\{D, E\}$. While B can cover D, it cannot be split to cover E simultaneously.

**2. Success of Combined Explanation:**
The general model finds a valid explanation by mixing strategies:

* **Trade-off 1 (Type 1-2):** Pro **B (+126)** covers Cons **{C, E}** ($-70 -54 = -124$).
    * Margin: $126 - 124 = +2$.
* **Trade-off 2 (Type 2-1):** Pros **{F, G}** ($40 + 36 = +76$) cover Con **D (-60)**.
    * Margin: $76 - 60 = +16$.

This demonstrates that while neither pure structure works, the comparison is valid via a **hybrid explanation**.

# Question 4: Hybrid Explanation with solvers

### Example with z and t

In [27]:
# 1. Compute Deltas for z vs t
delta_zt, pros_zt, cons_zt, _, df_zt = compute_deltas("z", "t", df_scores, weights)

print(f"Comparison z > t")
print(f"Pros: {pros_zt}")
print(f"Cons: {cons_zt}")
display(df_zt[["course", "z", "t", "Δ = w(x−y)"]])

Comparison z > t
Pros: ['B', 'F', 'G']
Cons: ['C', 'D', 'E']


Unnamed: 0,course,z,t,Δ = w(x−y)
0,A,74,74,0
1,B,89,71,126
2,C,74,84,-70
3,D,81,91,-60
4,E,68,77,-54
5,F,84,76,40
6,G,79,73,36


### Disproving pure (1-m) and (m-1) types for $z \succ t$

We run the functions you created in Questions 1 and 3. We expect both to fail.

In [28]:
print("--- Check 1: Try (1-m) Explanation ---")
sol_1m = solve_explanation_1_m_lp(delta_zt, pros_zt, cons_zt)
print(f"Status: {sol_1m['status']}")
if sol_1m['status'] == 'infeasible':
    print(f"Reason: {sol_1m['message']}")

print("\n--- Check 2: Try (m-1) Explanation ---")
# We use the strict version (default)
sol_m1 = solve_explanation_m_1_lp(delta_zt, pros_zt, cons_zt)
print(f"Status: {sol_m1['status']}")
if sol_m1['status'] == 'infeasible':
    print(f"Reason: {sol_m1['message']}")

--- Check 1: Try (1-m) Explanation ---
Status: infeasible
Reason: Model Infeasible.

--- Check 2: Try (m-1) Explanation ---
Status: infeasible
Reason: Model Infeasible. Feasible by global sum, but infeasible due to fragmentation (Bin Packing constraints).


### General Formulation: (m-n) Trade-offs

When neither pure **(1-m)** nor pure **(m-1)** explanations exist, we look for a **General Explanation** consisting of a set of disjoint trade-offs, where each trade-off can involve **multiple Pros** covering **multiple Cons**.

We treat this as a partitioning problem where we group Pros and Cons into $K$ possible "baskets" (trade-offs).

#### Optimization Model

Let $K$ be the set of possible trade-off groups (with $|K| = |Cons|$).

**Decision Variables:**
* $x_{p,k} \in \{0,1\}$: Pro $p$ is assigned to group $k$.
* $y_{c,k} \in \{0,1\}$: Con $c$ is assigned to group $k$.
* $u_k \in \{0,1\}$: Group $k$ is active (contains at least one trade-off).

**Constraints:**
1.  **Partitioning (Cover All):** Every Con $c$ must belong to exactly one group.
    $$\sum_{k} y_{c,k} = 1, \quad \forall c \in Cons$$
2.  **Disjoint Resources:** Every Pro $p$ can belong to at most one group.
    $$\sum_{k} x_{p,k} \le 1, \quad \forall p \in Pros$$
3.  **Activation:** If a Con is assigned to group $k$, the group is marked active.
    $$y_{c,k} \le u_k, \quad \forall c, k$$
4.  **Capacity (Strict Margin):** For any active group $k$, the sum of its Pros must strictly exceed the sum of its Cons.
    $$\sum_{p} x_{p,k} \cdot \Delta_p \ge \sum_{c} y_{c,k} \cdot |\Delta_c| + u_k, \quad \forall k$$
    *(Note: The term $+u_k$ ensures strictly positive margin if the group is active, assuming integer values).*

**Objective:**
Minimize the number of active groups ($\sum u_k$) to find the most concise explanation (or simply maximize total margin for feasibility).

In [29]:
def solve_explanation_general_lp(delta, pros, cons):
    """
    Solves for a General (Many-to-Many) explanation.
    Allows mixing (1-m), (m-1), and (m-m) trade-offs.
    """
    import gurobipy as gp
    from gurobipy import GRB

    # Max possible trade-offs is the number of cons (case 1-1 for everyone)
    K = range(len(cons)) 
    
    m = gp.Model("explanation_general")
    m.Params.OutputFlag = 0

    # Variables
    # x[p, k]: Pro p assigned to group k
    x = m.addVars(pros, K, vtype=GRB.BINARY, name="x")
    # y[c, k]: Con c assigned to group k
    y = m.addVars(cons, K, vtype=GRB.BINARY, name="y")
    # u[k]: Is group k active?
    u = m.addVars(K, vtype=GRB.BINARY, name="u")

    # 1. Every Con must be assigned to exactly one group
    for c in cons:
        m.addConstr(gp.quicksum(y[c, k] for k in K) == 1, name=f"cover_{c}")

    # 2. Every Pro can be assigned to at most one group
    for p in pros:
        m.addConstr(gp.quicksum(x[p, k] for k in K) <= 1, name=f"use_pro_{p}")

    # 3. Link y and u: If a con is assigned to k, k must be active
    for k in K:
        for c in cons:
            m.addConstr(y[c, k] <= u[k], name=f"activate_{k}_{c}")

    # 4. Capacity Constraint per group
    # If active (u=1), Sum(Pros) >= Sum(|Cons|) + 1
    # If inactive (u=0), Sum(Pros) >= Sum(|Cons|) -> 0 >= 0 (Trivial)
    for k in K:
        sum_pros = gp.quicksum(x[p, k] * delta[p] for p in pros)
        sum_cons = gp.quicksum(y[c, k] * abs(delta[c]) for c in cons)
        m.addConstr(sum_pros >= sum_cons + u[k], name=f"capacity_{k}")

    # Objective: Minimize number of active groups (for sparsity) or Maximize Margin
    #m.setObjective(gp.quicksum(u[k] for k in K), GRB.MINIMIZE)
    # Objective: Just check for feasability without optimizing number of groups
    #m.setObjective(0, GRB.MAXIMIZE)
    # Objective: MAXIMIZE the number of active groups
    m.setObjective(gp.quicksum(u[k] for k in K), GRB.MAXIMIZE)
    
    m.optimize()

    if m.Status == GRB.OPTIMAL:
        explanation = []
        for k in K:
            if u[k].X > 0.5:
                p_set = [p for p in pros if x[p,k].X > 0.5]
                c_set = [c for c in cons if y[c,k].X > 0.5]
                
                w_p = sum(delta[p] for p in p_set)
                w_c = sum(delta[c] for c in c_set) # This is negative sum, careful with display
                
                explanation.append({
                    "type": f"({len(p_set)}-{len(c_set)})",
                    "pros": p_set,
                    "cons": c_set,
                    "pros_weight": w_p,
                    "cons_weight": w_c, # keeping sign
                    "margin": w_p + w_c # w_c is negative
                })
        return {"status": "feasible", "explanation": explanation}

    return {"status": "infeasible", "message": "No combined explanation exists."}

# Run the General Solver
sol_general = solve_explanation_general_lp(delta_zt, pros_zt, cons_zt)

if sol_general["status"] == "feasible":
    print("\nCombined Explanation found for z > t:")
    df_gen = pd.DataFrame(sol_general["explanation"])
    display(df_gen)
else:
    print("General model Infeasible.")


Combined Explanation found for z > t:


Unnamed: 0,type,pros,cons,pros_weight,cons_weight,margin
0,(1-2),[B],"[D, E]",126,-114,12
1,(2-1),"[F, G]",[C],76,-70,6


## Example with $a_1$ and $a_2$

### Analysis of $a_1 \succ a_2$: Why Simple Explanations Fail

The comparison between **a1** and **a2** presents a specific challenge: the **global net margin is extremely tight** ($+1.5$).
* **Total Pros:** $144 + 36 + 27.5 = 207.5$
* **Total Cons:** $70 + 70 + 66 = 206$

Because the surplus is so small, there is almost no "slack" to handle inefficiency in grouping. This leads to the failure of both restricted models:

**1. Why (1-m) Fails (One Pro, Many Cons):**
The only dominant Pro is **A (+144)**.
* To maximize efficiency, **A** could cover **B (-70)** and **C (-70)**, utilizing $140$ capacity and leaving a surplus of $4$.
* However, this leaves **Con D (-66)** exposed.
* The remaining Pros, **E (+36)** and **F (+27.5)**, are individually too weak to cover D.
* **Result:** Infeasible due to the "Bin Packing" constraint (items E and F are too small for the remaining bin D).

**2. Why (m-1) Fails (Many Pros, One Con):**
* We must cover **Con B (-70)**. The only viable option is to use **Pro A**.
* Once **A** is used, the remaining Pros are **E (+36)** and **F (+27.5)**. Their combined sum is **63.5**.
* We still have to cover **Con C (-70)** and **Con D (-66)**.
* **Result:** Infeasible. The remaining capacity ($63.5$) is insufficient to cover even the smallest remaining Con ($66$).

**Conclusion:**
This case requires a **General (m-n) Explanation** (specifically a single 3-3 trade-off). We must pool **all** Pros together against **all** Cons to utilize every fraction of the $+1.5$ margin. Any fragmentation results in lost capacity that makes the explanation impossible.

In [None]:
scores_a1 = {"A": 89, "B": 74, "C": 81, "D": 68, "E": 84, "F": 79,   "G": 77}
scores_a2 = {"A": 71, "B": 84, "C": 91, "D": 79, "E": 78, "F": 73.5, "G": 77}

delta_a1a2 = {c: weights[c] * (scores_a1[c] - scores_a2[c]) for c in courses}
pros_a1a2 = [c for c in courses if delta_a1a2[c] > 0]
cons_a1a2 = [c for c in courses if delta_a1a2[c] < 0]

print(f"Comparison a1 > a2")
print(f"Pros: {pros_a1a2}")
print(f"Cons: {cons_a1a2}")
display(pd.DataFrame([
    {"course": c, "a1": scores_a1[c], "a2": scores_a2[c], "Δ": delta_a1a2[c]} 
    for c in courses
]).set_index("course").T)

print("\n" + "="*40)
print("SOLVER 1: Type (1-m) [One Pro covers Many Cons]")
print("="*40)
sol_1m = solve_explanation_1_m_lp(delta_a1a2, pros_a1a2, cons_a1a2)
if sol_1m["status"] == "feasible":
    display(pd.DataFrame(sol_1m["explanation"]))
else:
    print(f"Result: {sol_1m['message']}")

print("\n" + "="*40)
print("SOLVER 2: Type (m-1) [Many Pros cover One Con]")
print("="*40)

sol_m1 = solve_explanation_m_1_lp(delta_a1a2, pros_a1a2, cons_a1a2)
if sol_m1["status"] == "feasible":
    display(pd.DataFrame(sol_m1["explanation"]))
else:
    print(f"Result (Strict): {sol_m1['message']}")

print("\n" + "="*40)
print("SOLVER 3: Type (m-n) [General / Max Granularity]")
print("="*40)
sol_general = solve_explanation_general_lp(delta_a1a2, pros_a1a2, cons_a1a2)
if sol_general["status"] == "feasible":
    display(pd.DataFrame(sol_general["explanation"]))
else:
    print(f"Result: {sol_general['message']}")

Comparison a1 > a2
Pros: ['A', 'E', 'F']
Cons: ['B', 'C', 'D']


course,A,B,C,D,E,F,G
a1,89.0,74.0,81.0,68.0,84.0,79.0,77.0
a2,71.0,84.0,91.0,79.0,78.0,73.5,77.0
Δ,144.0,-70.0,-70.0,-66.0,36.0,27.5,0.0



SOLVER 1: Type (1-m) [One Pro covers Many Cons]
Result: Model Infeasible.

SOLVER 2: Type (m-1) [Many Pros cover One Con]
Result (Strict): Model Infeasible. Feasible by global sum, but infeasible due to fragmentation (Bin Packing constraints).

SOLVER 3: Type (m-n) [General / Max Granularity]


Unnamed: 0,type,pros,cons,pros_weight,cons_weight,margin
0,(3-3),"[A, E, F]","[B, C, D]",207.5,-206,1.5
