# Prologue

The goal is to explain why candidate X is ranked higher than candidate Y. The ranking is based on a weighted sum of scores. Instead of showing the raw formula, we want to provide a (1-1) trade-off explanation.

Definition:
- Pros ($P$): Subjects where $x$ scores better than $y$ (Contribution $> 0$).
- Cons ($C$): Subjects where $x$ scores worse than $y$ (Contribution $< 0$).
- Trade-off (1-1): A pair $(p, c)$ where $p \in Pros$ and $c \in Cons$ such that the advantage in $p$ is strictly greater than the disadvantage in $c$ (i.e., $contribution(p) + contribution(c) > 0$).
- Explanation: A set of disjoint trade-offs that covers every element in the set of Cons.

In [2]:
import numpy as np
import pandas as pd
import networkx as nx
from scipy.optimize import linprog

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 140)

## Prologue data: Xavier (x) vs Yvonne (y)

Courses: A..G  
Weights and scores are exactly those shown in the prologue table.


In [3]:
courses = ["A","B","C","D","E","F","G"]

course_names = {
    "A": "Anatomie",
    "B": "Biologie",
    "C": "Chirurgie",
    "D": "Diagnostic",
    "E": "Epidemiologie",
    "F": "Forensic Pathology",
    "G": "Génétique",
}

w = {"A": 8, "B": 7, "C": 7, "D": 6, "E": 6, "F": 5, "G": 6}

x = {"A": 85, "B": 81, "C": 71, "D": 69, "E": 75, "F": 81, "G": 88}  # Xavier
y = {"A": 81, "B": 81, "C": 75, "D": 63, "E": 67, "F": 88, "G": 95}  # Yvonne

df_scores = pd.DataFrame(
    {
        "course": courses,
        "name": [course_names[c] for c in courses],
        "weight": [w[c] for c in courses],
        "x": [x[c] for c in courses],
        "y": [y[c] for c in courses],
    }
)
df_scores


Unnamed: 0,course,name,weight,x,y
0,A,Anatomie,8,85,81
1,B,Biologie,7,81,81
2,C,Chirurgie,7,71,75
3,D,Diagnostic,6,69,63
4,E,Epidemiologie,6,75,67
5,F,Forensic Pathology,5,81,88
6,G,Génétique,6,88,95


## Step 1 — Compute contributions


In [4]:
df = df_scores.copy()
df["delta"] = df["weight"] * (df["x"] - df["y"])
df["sign"] = np.where(df["delta"] > 0, "pro", np.where(df["delta"] < 0, "con", "neutral"))
df[["course","name","weight","x","y","delta","sign"]]


Unnamed: 0,course,name,weight,x,y,delta,sign
0,A,Anatomie,8,85,81,32,pro
1,B,Biologie,7,81,81,0,neutral
2,C,Chirurgie,7,71,75,-28,con
3,D,Diagnostic,6,69,63,36,pro
4,E,Epidemiologie,6,75,67,48,pro
5,F,Forensic Pathology,5,81,88,-35,con
6,G,Génétique,6,88,95,-42,con


## Step 2 — Build feasible (1–1) trade-offs

A feasible edge $(p,c)$ exists iff:
- $p \in pros(x,y)$, $c \in cons(x,y)$
- $\Delta_p + \Delta_c > 0$

We will build the edge list $A$.


In [5]:
pros = df.loc[df["delta"] > 0, "course"].tolist()
cons = df.loc[df["delta"] < 0, "course"].tolist()
neutral = df.loc[df["delta"] == 0, "course"].tolist()

delta = dict(zip(df["course"], df["delta"]))

A = []
for p in pros:
    for c in cons:
        margin = delta[p] + delta[c]
        if margin > 0:
            A.append((p, c, margin))

print("pros:", pros)
print("cons:", cons)
print("neutral:", neutral)
print("\nNumber of feasible (1–1) trade-offs:", len(A))

df_A = (
    pd.DataFrame(A, columns=["P (pro)", "C (con)", "margin"])
      .sort_values("margin", ascending=False)
)
df_A


pros: ['A', 'D', 'E']
cons: ['C', 'F', 'G']
neutral: ['B']

Number of feasible (1–1) trade-offs: 6


Unnamed: 0,P (pro),C (con),margin
3,E,C,20
4,E,F,13
1,D,C,8
5,E,G,6
0,A,C,4
2,D,F,1


## Step 3 — Linear optimization model (LP, integral by structure)

Decision variables: $z_{p,c} \in [0,1]$ for each feasible pair $(p,c) \in A$.

Constraints:
1. Each $c \in cons$ is covered exactly once:
$$
\sum_{p} z_{p,c} = 1
$$
2. Each $p \in pros$ is used at most once:
$$
\sum_{c} z_{p,c} \le 1
$$

Objective: maximize total margin $\sum (\Delta_p+\Delta_c) z_{p,c}$.


In [6]:
def solve_explanation_1_1_lp(delta, pros, cons, feasible_edges):
    edges = [(p, c) for (p, c, m) in feasible_edges]
    margins = np.array([m for (_, _, m) in feasible_edges], dtype=float)

    n = len(edges)
    if n == 0:
        return {"status": "infeasible", "message": "No feasible trade-off edges.", "edges": edges}

    c_obj = -margins

    A_eq = []
    b_eq = []
    for con in cons:
        row = np.zeros(n)
        for j, (p, c) in enumerate(edges):
            if c == con:
                row[j] = 1.0
        A_eq.append(row)
        b_eq.append(1.0)
    A_eq = np.vstack(A_eq) if A_eq else None
    b_eq = np.array(b_eq) if b_eq else None

    A_ub = []
    b_ub = []
    for pro in pros:
        row = np.zeros(n)
        for j, (p, c) in enumerate(edges):
            if p == pro:
                row[j] = 1.0
        A_ub.append(row)
        b_ub.append(1.0)
    A_ub = np.vstack(A_ub) if A_ub else None
    b_ub = np.array(b_ub) if b_ub else None

    bounds = [(0.0, 1.0)] * n

    res = linprog(
        c=c_obj,
        A_ub=A_ub, b_ub=b_ub,
        A_eq=A_eq, b_eq=b_eq,
        bounds=bounds,
        method="highs",
    )

    out = {"raw_result": res, "edges": edges, "margins": margins}
    if res.success:
        z = res.x
        chosen = [(edges[j][0], edges[j][1], margins[j], z[j]) for j in range(n) if z[j] > 0.5]
        out.update({"status": "feasible", "chosen": chosen, "z": z})
    else:
        out.update({"status": "infeasible", "message": res.message})
    return out

sol = solve_explanation_1_1_lp(delta, pros, cons, A)
sol["status"], sol.get("message", "")


('feasible', '')

## Step 4 — Extract the (1–1) explanation and verify the definition


In [7]:
def verify_explanation(chosen_pairs, pros, cons):
    P_used = [p for (p, c, m, z) in chosen_pairs]
    C_used = [c for (p, c, m, z) in chosen_pairs]

    ok_cover = (sorted(C_used) == sorted(cons))
    ok_disjoint_pro = (len(P_used) == len(set(P_used)))
    ok_disjoint_con = (len(C_used) == len(set(C_used)))

    return {
        "covers_all_cons": ok_cover,
        "disjoint_pros": ok_disjoint_pro,
        "disjoint_cons": ok_disjoint_con,
    }

if sol["status"] == "feasible":
    chosen = sol["chosen"]
    df_chosen = pd.DataFrame(
        [{"P": p, "C": c, "margin": m, "ΔP": delta[p], "ΔC": delta[c]} for (p,c,m,_) in chosen]
    ).sort_values("margin", ascending=False)
    display(df_chosen)

    checks = verify_explanation(chosen, pros, cons)
    print("Checks:", checks)

    explanation = [(p,c) for (p,c,_,_) in chosen]
    print("\nExplanation E =", explanation)


Unnamed: 0,P,C,margin,ΔP,ΔC
2,E,G,6.0,48,-42
0,A,C,4.0,32,-28
1,D,F,1.0,36,-35


Checks: {'covers_all_cons': True, 'disjoint_pros': True, 'disjoint_cons': True}

Explanation E = [('A', 'C'), ('D', 'F'), ('E', 'G')]


## Step 5 — Certificate of non-existence (Hall-violation subset)

When no explanation (1–1) exists, the feasible edges $A$ define a bipartite graph between:
- Left = cons(x,y)
- Right = pros(x,y)

A (1–1) explanation exists **iff** there is a matching that covers **all cons**.

If not, we return a **Hall certificate**: a subset $S \subseteq cons$ with
$$
|N(S)| < |S|
$$
where $N(S)$ is the neighborhood of $S$ in the feasible-edge graph.


In [8]:
def hall_certificate_from_feasible_edges(cons, pros, feasible_edges):
    G2 = nx.Graph()
    G2.add_nodes_from(cons, bipartite=0)
    G2.add_nodes_from(pros, bipartite=1)
    G2.add_edges_from([(c, p) for (p,c,_) in feasible_edges])

    matching = nx.algorithms.bipartite.matching.maximum_matching(G2, top_nodes=cons)
    matched_cons = [c for c in cons if c in matching]

    if len(matched_cons) == len(cons):
        return {"status": "feasible", "matching": matching, "certificate": None}

    matched_edges = {(c, matching[c]) for c in cons if c in matching}

    from collections import deque
    unmatched = [c for c in cons if c not in matching]
    Z_left = set(unmatched)
    q = deque(unmatched)
    Z_right = set()

    while q:
        c = q.popleft()
        for p in G2.neighbors(c):
            if (c, p) in matched_edges:
                continue
            if p in Z_right:
                continue
            Z_right.add(p)
            if p in matching:
                c2 = matching[p]
                if c2 not in Z_left:
                    Z_left.add(c2)
                    q.append(c2)

    S = Z_left
    N = set()
    for c in S:
        N |= set(G2.neighbors(c))

    return {
        "status": "infeasible",
        "matching_size": len(matched_cons),
        "cons_size": len(cons),
        "hall_S": sorted(S),
        "hall_neighbors": sorted(N),
        "hall_sizes": {"|S|": len(S), "|N(S)|": len(N)},
        "matching": matching,
    }

if sol["status"] == "infeasible":
    cert = hall_certificate_from_feasible_edges(cons, pros, A)
    cert
else:
    print("Feasible on the prologue instance. Hall certificate not needed here.")


Feasible on the prologue instance. Hall certificate not needed here.


# Question 1 — (1–1) explanations via linear optimization

These steps where already done in the prologue, but will be repeated here with the different table:

1. **Formulate a linear optimization program** that computes a **(1–1) explanation** of the comparison $x \succ y$ **when it exists**.
2. **Return a certificate of non-existence** when no (1–1) explanation exists.
3. **Implement the formulation using an optimization solver** (SciPy HiGHS LP solver), and extract the explanation.

In addition, we will use the larger candidate table and give the required **simple argument** that there is **no (1–1) explanation** for $w \succ w'$.

## Step 0 — Data (courses, weights, and candidate scores)

We encode the table given in the statement (candidates $x, y, z, t, u, v, w, w'$) and the weights.

In [9]:
courses = ["A","B","C","D","E","F","G"]

weights = {"A": 8, "B": 7, "C": 7, "D": 6, "E": 6, "F": 5, "G": 6}

scores = {
    "x":  {"A":85,"B":81,"C":71,"D":69,"E":75,"F":81,"G":88},
    "y":  {"A":81,"B":81,"C":75,"D":63,"E":67,"F":88,"G":95},
    "z":  {"A":74,"B":89,"C":74,"D":81,"E":68,"F":84,"G":79},
    "t":  {"A":74,"B":71,"C":84,"D":91,"E":77,"F":76,"G":73},
    "u":  {"A":72,"B":75,"C":66,"D":85,"E":88,"F":66,"G":93},
    "v":  {"A":71,"B":73,"C":63,"D":92,"E":76,"F":79,"G":93},
    "w":  {"A":79,"B":69,"C":78,"D":76,"E":67,"F":84,"G":79},
    "w'": {"A":57,"B":76,"C":81,"D":76,"E":82,"F":86,"G":77},
}

df_scores = pd.DataFrame(
    [{"candidate": cand, **scores[cand]} for cand in scores.keys()]
).set_index("candidate")[courses]

df_scores.loc[list(scores.keys())]

Unnamed: 0_level_0,A,B,C,D,E,F,G
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
x,85,81,71,69,75,81,88
y,81,81,75,63,67,88,95
z,74,89,74,81,68,84,79
t,74,71,84,91,77,76,73
u,72,75,66,85,88,66,93
v,71,73,63,92,76,79,93
w,79,69,78,76,67,84,79
w',57,76,81,76,82,86,77


## Step 1 — Compute contributions $\Delta_k$ and the sets pros/cons/neutral

For a comparison $x \succ y$, define:
\[
\Delta_k = w_k (x_k - y_k).
\]

- $pros = \{k : \Delta_k > 0\}$
- $cons = \{k : \Delta_k < 0\}$
- $neutral = \{k : \Delta_k = 0\}$

In [10]:
def compute_deltas(candidate_x: str, candidate_y: str, df_scores: pd.DataFrame, weights: dict):
    x = df_scores.loc[candidate_x].to_dict()
    y = df_scores.loc[candidate_y].to_dict()
    delta = {k: weights[k] * (x[k] - y[k]) for k in courses}

    pros = [k for k in courses if delta[k] > 0]
    cons = [k for k in courses if delta[k] < 0]
    neutral = [k for k in courses if delta[k] == 0]

    df = pd.DataFrame(
        {
            "course": courses,
            "x": [x[k] for k in courses],
            "y": [y[k] for k in courses],
            "w": [weights[k] for k in courses],
            "Δ = w(x−y)": [delta[k] for k in courses],
        }
    )
    return delta, pros, cons, neutral, df

delta_xy, pros_xy, cons_xy, neutral_xy, df_xy = compute_deltas("x", "y", df_scores, weights)
df_xy

Unnamed: 0,course,x,y,w,Δ = w(x−y)
0,A,85,81,8,32
1,B,81,81,7,0
2,C,71,75,7,-28
3,D,69,63,6,36
4,E,75,67,6,48
5,F,81,88,5,-35
6,G,88,95,6,-42


## Step 2 — Build feasible (1–1) trade-offs

A (1–1) trade-off is a pair $(p,c)$ with:
- $p \in pros$,
- $c \in cons$,
- and **positive combined contribution**:
$$
\Delta_p + \Delta_c > 0.
$$

We compute the feasible edge list $A$.

In [12]:
def feasible_tradeoffs_1_1(delta: dict, pros: list, cons: list):
    A = []
    for p in pros:
        for c in cons:
            m = delta[p] + delta[c]
            if m > 0:
                A.append((p, c, m))
    return A

A_xy = feasible_tradeoffs_1_1(delta_xy, pros_xy, cons_xy)
pd.DataFrame(A_xy, columns=["P (pro)","C (con)","margin"]).sort_values("margin", ascending=False)

Unnamed: 0,P (pro),C (con),margin
3,E,C,20
4,E,F,13
1,D,C,8
5,E,G,6
0,A,C,4
2,D,F,1


## Step 3 — Linear optimization program for a (1–1) explanation

In [13]:
def solve_explanation_1_1_lp(delta: dict, pros: list, cons: list, feasible_edges: list):
    edges = [(p, c) for (p, c, _) in feasible_edges]
    margins = np.array([m for (_, _, m) in feasible_edges], dtype=float)
    n = len(edges)

    if n == 0:
        return {"status": "infeasible", "message": "No feasible trade-off edges.", "edges": edges}

    c_obj = -margins

    A_eq = []
    b_eq = []
    for con in cons:
        row = np.zeros(n)
        for j, (p, c) in enumerate(edges):
            if c == con:
                row[j] = 1.0
        A_eq.append(row)
        b_eq.append(1.0)

    A_ub = []
    b_ub = []
    for pro in pros:
        row = np.zeros(n)
        for j, (p, c) in enumerate(edges):
            if p == pro:
                row[j] = 1.0
        A_ub.append(row)
        b_ub.append(1.0)

    res = linprog(
        c=c_obj,
        A_ub=np.vstack(A_ub) if A_ub else None,
        b_ub=np.array(b_ub) if b_ub else None,
        A_eq=np.vstack(A_eq) if A_eq else None,
        b_eq=np.array(b_eq) if b_eq else None,
        bounds=[(0.0, 1.0)] * n,
        method="highs",
    )

    out = {"raw_result": res, "edges": edges, "margins": margins}
    if res.success:
        z = res.x
        chosen = [(edges[j][0], edges[j][1], margins[j], z[j]) for j in range(n) if z[j] > 0.5]
        out.update({"status": "feasible", "chosen": chosen, "z": z})
    else:
        out.update({"status": "infeasible", "message": res.message})
    return out

sol_xy = solve_explanation_1_1_lp(delta_xy, pros_xy, cons_xy, A_xy)
sol_xy["status"], sol_xy.get("message","")

('feasible', '')

## Step 4 — Extract and verify the (1–1) explanation (when feasible)

In [14]:
def verify_explanation(chosen_pairs, pros, cons):
    P_used = [p for (p, c, m, z) in chosen_pairs]
    C_used = [c for (p, c, m, z) in chosen_pairs]
    return {
        "covers_all_cons_exactly": sorted(C_used) == sorted(cons),
        "pros_disjoint": len(P_used) == len(set(P_used)),
        "cons_disjoint": len(C_used) == len(set(C_used)),
    }

if sol_xy["status"] == "feasible":
    chosen = sol_xy["chosen"]
    df_chosen = pd.DataFrame(
        [{"P": p, "C": c, "ΔP": delta_xy[p], "ΔC": delta_xy[c], "margin": m} for (p,c,m,_) in chosen]
    ).sort_values("margin", ascending=False)
    display(df_chosen)
    print("Checks:", verify_explanation(chosen, pros_xy, cons_xy))

Unnamed: 0,P,C,ΔP,ΔC,margin
2,E,G,48,-42,6.0
0,A,C,32,-28,4.0
1,D,F,36,-35,1.0


Checks: {'covers_all_cons_exactly': True, 'pros_disjoint': True, 'cons_disjoint': True}


## Step 5 — Certificate of non-existence (when infeasible)

A (1–1) explanation exists **iff** the bipartite graph $(cons, pros, A)$ has a matching covering all nodes in **cons**.

When infeasible, we output a **Hall certificate**: a subset $S \subseteq cons$ such that
\[
|N(S)| < |S|,
\]
where $N(S)$ is the set of pros connected to $S$ via feasible trade-offs.

In [15]:
def hall_certificate(cons, pros, feasible_edges):
    G = nx.Graph()
    G.add_nodes_from(cons, bipartite=0)
    G.add_nodes_from(pros, bipartite=1)
    G.add_edges_from([(c, p) for (p, c, _) in feasible_edges])

    matching = nx.algorithms.bipartite.matching.maximum_matching(G, top_nodes=cons)
    matched_cons = [c for c in cons if c in matching]
    if len(matched_cons) == len(cons):
        return {"status": "feasible", "matching": matching, "certificate": None}

    matched_edges = {(c, matching[c]) for c in cons if c in matching}

    from collections import deque
    unmatched = [c for c in cons if c not in matching]

    Z_left = set(unmatched)
    Z_right = set()
    q = deque(unmatched)

    while q:
        c = q.popleft()
        for p in G.neighbors(c):
            if (c, p) in matched_edges:
                continue
            if p in Z_right:
                continue
            Z_right.add(p)
            if p in matching:
                c2 = matching[p]
                if c2 not in Z_left:
                    Z_left.add(c2)
                    q.append(c2)

    S = Z_left
    N = set()
    for c in S:
        N |= set(G.neighbors(c))

    return {
        "status": "infeasible",
        "hall_subset_S": sorted(S),
        "neighbors_N(S)": sorted(N),
        "sizes": {"|S|": len(S), "|N(S)|": len(N)},
        "matching_size": len(matched_cons),
    }

# Required part on the extended table: show no (1–1) explanation for $w \succ w'$

We now consider the comparison $w \succ w'$ using the table.

### Simple argument (the one-line proof)
In a (1–1) explanation, **each** course in $cons(w,w')$ must be paired with a **distinct** course in $pros(w,w')$ (disjointness).

Therefore, a necessary condition is:
$$
|pros(w,w')| \ge |cons(w,w')|.
$$

We will compute these sets and show that this condition fails, hence **no (1–1) explanation exists**.

In [16]:
delta_ww, pros_ww, cons_ww, neutral_ww, df_ww = compute_deltas("w", "w'", df_scores, weights)
df_ww

Unnamed: 0,course,x,y,w,Δ = w(x−y)
0,A,79,57,8,176
1,B,69,76,7,-49
2,C,78,81,7,-21
3,D,76,76,6,0
4,E,67,82,6,-90
5,F,84,86,5,-10
6,G,79,77,6,12


In [17]:
print("pros(w,w'):", pros_ww)
print("cons(w,w'):", cons_ww)
print("neutral(w,w'):", neutral_ww)
print("Counts: |pros| =", len(pros_ww), ", |cons| =", len(cons_ww))

pros(w,w'): ['A', 'G']
cons(w,w'): ['B', 'C', 'E', 'F']
neutral(w,w'): ['D']
Counts: |pros| = 2 , |cons| = 4


### Conclusion (simple argument)
Here, $|pros(w,w')| < |cons(w,w')|$.  
So it is **impossible** to cover every con course with a distinct pro course using only (1–1) disjoint pairs.  
Hence **no explanation of type (1–1)** exists for $w \succ w'$.

### Solver confirmation + certificate of non-existence (Hall certificate)

We also run the Q1 LP model and output a Hall-type certificate.

In [19]:
A_ww = feasible_tradeoffs_1_1(delta_ww, pros_ww, cons_ww)
pd.DataFrame(A_ww, columns=["P (pro)","C (con)","margin"]).sort_values("margin", ascending=False)

Unnamed: 0,P (pro),C (con),margin
3,A,F,166
1,A,C,155
0,A,B,127
2,A,E,86
4,G,F,2


In [20]:
sol_ww = solve_explanation_1_1_lp(delta_ww, pros_ww, cons_ww, A_ww)
sol_ww["status"], sol_ww.get("message","")

('infeasible',
 'The problem is infeasible. (HiGHS Status 8: model_status is Infeasible; primal_status is None)')

In [21]:
cert_ww = hall_certificate(cons_ww, pros_ww, A_ww)
cert_ww

{'status': 'infeasible',
 'hall_subset_S': ['B', 'C', 'E'],
 'neighbors_N(S)': ['A'],
 'sizes': {'|S|': 3, '|N(S)|': 1},
 'matching_size': 2}