# Bias and Constrained Learning Homework

In this homework we'll extend the constrained learning framework we used for mitigating bias in class to handle more complex situations. Specifically, we'll look at the case where the output prediction is not binary. As usual with these homeworks, there are three different levels which build on each other, each one corresponding to an increasing grade:

- The basic version of this homework involves implementing code to measure fairness over multiclass classification then measuring the results when using training a regular, unfair classifier. This version is good for a C.
- The B version of the homework involves training a classifier with some fairness constraints.
- For an A, we'll look at slightly more complicated approach to fair training.

First, we'll generate a dataset for which the sensitive attribute is binary and the output is multiclass.

In [1625]:
import numpy as np
import torch
from torch import nn, optim

# torch.set_printoptions(profile="full")
# np.set_printoptions(threshold=np.inf)

In [1626]:
output_classes = 5

def generate_data():

    dataset_size = 10000
    dimensions = 40

    rng = np.random.default_rng()
    A = np.concatenate((np.zeros(dataset_size // 2), np.ones(dataset_size // 2)))
    rng.shuffle(A)
    X = rng.normal(loc=A[:,np.newaxis], scale=1, size=(dataset_size, dimensions))
    random_linear = np.array([
        -2.28156561, 0.24582547, -2.48926942, -0.02934924, 5.21382855, -1.08613209,
        2.51051602, 1.00773587, -2.10409448, 1.94385103, 0.76013416, -2.94430782,
        0.3289264, -4.35145624, 1.61342623, -1.28433588, -2.07859612, -1.53812125,
        0.51412713, -1.34310334, 4.67174476, 1.67269946, -2.07805413, 3.46667731,
        2.61486654, 1.75418209, -0.06773796, 0.7213423, 2.43896438, 1.79306807,
        -0.74610264, 2.84046827,  1.28779878, 1.84490263, 1.6949681, 0.05814582,
        1.30510732, -0.92332861,  3.00192177, -1.76077192
    ])
    good_score = (X @ random_linear) ** 2 / 2
    qs = np.quantile(good_score, (np.array(range(1, output_classes))) / output_classes)
    Y = np.digitize(good_score, qs)

    return X, A, Y

X, A, Y = generate_data()

In [1627]:
print("Total:", [(Y == k).sum() for k in range(output_classes)])
print("A=0:", [((Y == k) & (A == 0)).sum() for k in range(output_classes)])
print("A=1:", [((Y == k) & (A == 1)).sum() for k in range(output_classes)])

Total: [np.int64(2000), np.int64(2000), np.int64(2000), np.int64(2000), np.int64(2000)]
A=0: [np.int64(1385), np.int64(1308), np.int64(1106), np.int64(818), np.int64(383)]
A=1: [np.int64(615), np.int64(692), np.int64(894), np.int64(1182), np.int64(1617)]


This last cell shows the total number of data points in each output category (it should be 2000 each) as well as a breakdown of each output category for the $A=0$ group and the $A=1$ group. Note that the $A=1$ group is much more likely to be assigned to the categories with higher index.

## Fairness Definition (C)

Let's write some code to measure a few different forms of bias in our classifier. Demographic parity, which requires $P(R = r \mid A = 0) = P(R = r \mid A = 1)$ for all possible output classes $0 \le r < K$, and predictive parity which requires $P(Y=r \mid A = 0, R = r) = P(Y=r \mid A = 1, R = r)$. In the the functions below,

- `R` is a matrix where each row represents a probability distribution over the classes `0` to `K - 1`. That is, `R` is the output of our neural network _after_ a softmax layer.
- `A` is a vector of sensitive attributes. Each element is either `0` or `1`.
- `Y` is a vector of measured output classes, each element is between `0` and `K - 1`.

These functions should return an array of length `K` where each element of the array represents a measure of bias for _one_ of the output classes. For example, for demographic parity, the value in the output array at index `i` should be $P(R = i \mid A = 1) - P(R = i \mid A = 0)$.

Note that predictive parity is a bit different than the equalized odds measure I included in the solution to the bias lab. In particular, in the lab we used filtering to represent conditional probabilities, so $P(R=1 \mid A=0)$ was measured by `probs[A==0].mean()` for example. Now we can't do that directly since the predictive parity expression is conditioned on $R$ which is continuous. You'll need to instead use Bayes' rule and/or the definition of conditional probability to rearrange the predicitive parity equation until it's something we can measure. It's quite tricky to do this for all classes in one call, so it's okay to loop over the classes and compute the predictive parity for each on separately.

In [None]:
# CONTIBUTORS: I helped and received help from Patrick Norton.

def demographic_parity(R, A):
    
    A0 = torch.from_numpy(A == 0)
    A1 = torch.from_numpy(A == 1)

    return R[A1].mean(dim=0) - R[A0].mean(dim=0)

# Bayes theorem is P(A | B) = (P(A) * P(B | A))/P(B)
# This looks like P(Y = r | A = 0, R = r) = (P(Y = r) * P(A = 0 \cap R = r | Y = r))/(P(A = 0 \cap R = r))
# We have that P(Y = r) is just the proportion of P(Y = r) to the number of data points because they're uniform and i.i.d

# NOTE: Do not use this to calculate loss. Demographic parity is fine because it has no
# explicit loops, but this one is not. The computation graph will get mangled.
# I tried implementing this without explicit loops and I couldn't get it working.
def predictive_parity(R, A, Y):
    
    # This isn't strictly necessary because P(Y = r) = 0.2 for all r
    # However, this is a generally-extensible approach.
    _labels, count = np.unique(Y, return_counts=True)
    prob_y = torch.tensor(count).float().softmax(dim=0)

    # Getting two of our conditions as torch tensors
    # We need to expand dims to get the broadcasting semantics to work
    A0 = torch.from_numpy(A == 0).long()
    A1 = torch.from_numpy(A == 1).long()

    # Getting the denominator
    prob_ra0 = torch.tensor([(R[:,i] * A0).mean() for i in range(0, output_classes)])
    prob_ra1 = torch.tensor([(R[:,i] * A1).mean() for i in range(0, output_classes)])

    # Reversing the condition for Bayes theorem
    prob_ray0 = torch.tensor([(R[:,i] * A0)[Y == i].mean() for i in range(0, output_classes)])
    prob_ray1 = torch.tensor([(R[:,i] * A1)[Y == i].mean() for i in range(0, output_classes)])
    
    # Applying Bayes theorem
    prob_bay_a0 = (prob_y*prob_ray0)/prob_ra0
    prob_bay_a1 = (prob_y*prob_ray1)/prob_ra1

    return prob_bay_a1 - prob_bay_a0

Now we'll train a classifier on this dataset without any fairness constraints for comparison. This code is already complete.

In [1629]:
class MLP(nn.Module):

    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(40, 256),
            nn.ReLU(),
            nn.Linear(256, 5)
        )

    def forward(self, x):
        return self.model(x)

In [1630]:
def train_unfair(lr=1e-1, epochs=200):
    
    network = MLP()
    loss = nn.CrossEntropyLoss()
    opt = optim.SGD(network.parameters(), lr=lr)
    data_in = torch.tensor(X).float()
    data_out = torch.tensor(Y)
    
    for i in range(epochs):
        preds = network(data_in)
        loss_val = loss(preds, data_out)
        opt.zero_grad()
        loss_val.backward()
        opt.step()

        if (i+1) % 100 == 0:
            acc = (preds.argmax(dim=1) == data_out).float().mean()
            probs = nn.functional.softmax(preds, dim=1)
            print("Epoch:", i, "Accuracy:", acc.item(), "Bias:", demographic_parity(probs, A))

    return network

In [1631]:
model = train_unfair(lr=5e-2, epochs=300)

Epoch: 99 Accuracy: 0.3684000074863434 Bias: tensor([-0.1424, -0.1156, -0.0387,  0.0700,  0.2267], grad_fn=<SubBackward0>)
Epoch: 199 Accuracy: 0.4424999952316284 Bias: tensor([-0.1479, -0.1202, -0.0407,  0.0713,  0.2375], grad_fn=<SubBackward0>)
Epoch: 299 Accuracy: 0.49559998512268066 Bias: tensor([-0.1496, -0.1217, -0.0408,  0.0723,  0.2397], grad_fn=<SubBackward0>)


In [1632]:
p = model(torch.tensor(X).float()).argmax(dim=1)
print("Total:", [(p == k).sum().item() for k in range(output_classes)])
print("A=0:", [((p == k) & (A == 0)).sum().item() for k in range(output_classes)])
print("A=1:", [((p == k) & (A == 1)).sum().item() for k in range(output_classes)])

Total: [2914, 1709, 1170, 1781, 2426]
A=0: [2347, 1363, 436, 456, 398]
A=1: [567, 346, 734, 1325, 2028]


  print("A=0:", [((p == k) & (A == 0)).sum().item() for k in range(output_classes)])
  print("A=1:", [((p == k) & (A == 1)).sum().item() for k in range(output_classes)])


This classifier is probably not going to be _extremely_ accurate, but you should be able to see the bias from the dataset reflected here. Let's also measure the bias using your two functions from above.

In [1633]:
p = torch.nn.functional.softmax(model(torch.tensor(X).float()), dim=1)
print("Demographic parity: ", demographic_parity(p, A))
print("Predictive parity: ", predictive_parity(p, A, Y))

Demographic parity:  tensor([-0.1496, -0.1217, -0.0408,  0.0723,  0.2398], grad_fn=<SubBackward0>)
Predictive parity:  tensor([-0.0053, -0.0230,  0.0295,  0.1400,  0.4467])


## Fair Training (B)

Now we'll extend our fair training approach from the lab to the multiclass setting. Now since we have a bias measure for _each_ possible output class, we essentially have `output_classes` constraints that we need to satisfy. We can handle this within our Lagrange multiplier framework by simply adding extra multipliers for each constraint. That is, our new learning problem is

$$
\arg\min_\beta \max_\lambda \left ( L(\beta) + \sum_i \lambda_i g_i(\beta) \right )
$$

$$
= \arg\min_\beta \max_\lambda \left ( L(\beta) + \sum_i \lambda_i \left ( P_\beta [ R = i \mid A = 1 ] - P_\beta [ R = i \mid A = 0 ] \right ) \right )
$$

Our `demographic_parity` function gives us a vector representing $g_i(\beta)$, so now all we need to do is replace our single parameter $\lambda$ from the lab with a vector then compute the dot product of $\lambda$ with our demographic parity measure.

In [1634]:
def train_fair(lr=1e-1, lam_lr=1, epochs=200):
    
    network = MLP()
    lam = nn.Parameter(torch.zeros(output_classes))
    loss = nn.CrossEntropyLoss()
    opt = optim.SGD(network.parameters(), lr=lr)
    lam_opt = optim.SGD([lam], lr=lam_lr, maximize=True)
    data_in = torch.tensor(X).float()
    data_out = torch.tensor(Y)
    
    for i in range(epochs):

        # Compute the loss value as defined in the Lagrangian above
        preds = network(data_in)
        loss_val = loss(preds, data_out)
        probs = nn.functional.softmax(preds, dim=1)
        bias = demographic_parity(probs, A)
        loss_val += lam.dot(bias)
        
        opt.zero_grad()
        lam_opt.zero_grad()
        loss_val.backward()
        opt.step()
        lam_opt.step()

        if (i+1) % 100 == 0:
            acc = (preds.argmax(dim=1) == data_out).float().mean()
            probs = nn.functional.softmax(preds, dim=1)
            print("Epoch:", i, "Accuracy:", acc.item(), "Bias:", demographic_parity(probs, A), "Lambda:", lam.max().item())

    return network

In [1635]:
model = train_fair(lr=5e-1, lam_lr=3e-1, epochs=300)

Epoch: 99 Accuracy: 0.41119998693466187 Bias: tensor([ 0.0425,  0.0285,  0.0051, -0.0022, -0.0739], grad_fn=<SubBackward0>) Lambda: 0.5786921381950378
Epoch: 199 Accuracy: 0.5008000135421753 Bias: tensor([ 0.0141,  0.0190,  0.0086,  0.0223, -0.0641], grad_fn=<SubBackward0>) Lambda: 0.9252098202705383
Epoch: 299 Accuracy: 0.5562999844551086 Bias: tensor([-0.0205,  0.0298,  0.0340,  0.0018, -0.0451], grad_fn=<SubBackward0>) Lambda: 1.0653448104858398


In [1636]:
p = model(torch.tensor(X).float()).argmax(dim=1)
print("Total:", [(p == k).sum().item() for k in range(output_classes)])
print("A=0:", [((p == k) & (A == 0)).sum().item() for k in range(output_classes)])
print("A=1:", [((p == k) & (A == 1)).sum().item() for k in range(output_classes)])

Total: [2052, 1918, 2147, 154, 3729]
A=0: [1187, 1015, 947, 97, 1754]
A=1: [865, 903, 1200, 57, 1975]


  print("A=0:", [((p == k) & (A == 0)).sum().item() for k in range(output_classes)])
  print("A=1:", [((p == k) & (A == 1)).sum().item() for k in range(output_classes)])


## Fair Training via KL-Divergence (A)

Let's look back at our definition of demographic parity for the multiclass setting: $P(R = r \mid A = 0) = P(R = r \mid A = 1)$ for all possible output classes $r$. we could also express this by asserting $P(\cdot \mid A = 0)$ and $P(\cdot \mid A = 1)$ should be identical probability distributions. A natural measure of bias then would be to compute the KL-divergence between these two distributions, since KL-divergence is a measure of how "different" two distributions are. That is, we'll now solve the problem

$$
\arg\min_\beta \max_\lambda \left ( L(\beta) + \lambda D_{\textrm{KL}} \left( P(\cdot \mid A = 0) \ \| \ P(\cdot \mid A = 1) \right) \right )
$$

However, this introduces a new complication. The KL-divergence is never negative and can only be zero if the two distributions are identical (we proved this in our first homework of the semester). That means there's no way for $\lambda$ to ever decrease, and it will just go up forever. We can solve this by allowing a small deviation in our constrained optimization problem:

$$
\begin{align}
\arg\min_\beta &\ L(\beta) \\
\text{s.t.} &\ D_{\textrm{KL}} \left( P(\cdot \mid A = 0) \ \| \ P(\cdot \mid A = 1) \right) \le \epsilon
\end{align}
$$

We can still represent this using a Lagrange multiplier:

$$
\arg\min_\beta \max_{\lambda \ge 0} \left ( L(\beta) + \lambda \left ( D_{\textrm{KL}} \left( P(\cdot \mid A = 0) \ \| \ P(\cdot \mid A = 1) \right) - \epsilon \right ) \right )
$$

Your task now is to represent this optimization problem in the code below. I've taken care of clipping $\lambda$ to zero for you since it's not something we've looked at in class.

In [None]:
def train_kl(lr=1e-1, lam_lr=1, epochs=300, epsilon=0.1):
    
    network = MLP()
    lam = nn.Parameter(torch.tensor(0.0))
    loss = nn.CrossEntropyLoss()
    opt = optim.SGD(network.parameters(), lr=lr)
    lam_opt = optim.SGD([lam], lr=lam_lr, maximize=True)
    data_in = torch.tensor(X).float()
    data_out = torch.tensor(Y)

    A0 = torch.from_numpy(A == 0)
    A1 = torch.from_numpy(A == 1)
    
    for i in range(epochs):

        # KL divergence is formally:
        # DKL(P || Q) = \sum_i P(R = i | A = 0)\log(P(R = i | A = 0)/P(R = i | A = 1))
        # where we sum over our output classes

        # Implement the loss function above here.
        preds = network(data_in)
        loss_val = loss(preds, data_out)
        probs = nn.functional.softmax(preds, dim=1)

        # This formulation allows us to maintain the computation graph
        probs_ra0 = probs[A0].mean(dim=0)
        probs_ra1 = probs[A1].mean(dim=0)
        log_probs = torch.log(probs_ra0/probs_ra1)
        kl_div = (probs_ra0 * log_probs).sum()
        loss_val += lam * (kl_div-epsilon)

        opt.zero_grad()
        lam_opt.zero_grad()
        loss_val.backward()
        opt.step()
        lam_opt.step()

        with torch.no_grad():
            lam.clamp_(min=0)

        if (i+1) % 100 == 0:
            acc = (preds.argmax(dim=1) == data_out).float().mean()
            print("Epoch:", i, "Accuracy:", acc.item(), "Bias: ", demographic_parity(probs, A), "Divergence:", kl_div.item(), "Lambda:", lam.item())

    return network

In [1638]:
model = train_kl(lr=3e-1, lam_lr=1, epsilon=0.02)

Epoch: 99 Accuracy: 0.48750001192092896 Bias:  tensor([-0.0760, -0.0635, -0.0291,  0.0298,  0.1388], grad_fn=<SubBackward0>) Divergence: 0.07084843516349792 Lambda: 2.327665328979492
Epoch: 199 Accuracy: 0.6399000287055969 Bias:  tensor([-0.0586, -0.0504, -0.0279, -0.0155,  0.1525], grad_fn=<SubBackward0>) Divergence: 0.061314746737480164 Lambda: 5.007400035858154
Epoch: 299 Accuracy: 0.666100025177002 Bias:  tensor([-0.0629, -0.0614, -0.0025,  0.0010,  0.1258], grad_fn=<SubBackward0>) Divergence: 0.05172822251915932 Lambda: 7.014566421508789


In [1640]:
print("Unfair: ")
model1 = train_unfair(epochs=300, lr=3e-1)
print("Fair: ")
model2 = train_fair(epochs=300, lr=3e-1, lam_lr=3e-1)
print("KL: ")
model3 = train_kl(epochs=300, lr=3e-1, lam_lr=3e-1)

Unfair: 
Epoch: 99 Accuracy: 0.4462999999523163 Bias: tensor([-0.1833, -0.1541, -0.0707,  0.0443,  0.3638], grad_fn=<SubBackward0>)
Epoch: 199 Accuracy: 0.6732000112533569 Bias: tensor([-0.1793, -0.1489, -0.0597,  0.0285,  0.3594], grad_fn=<SubBackward0>)
Epoch: 299 Accuracy: 0.7843999862670898 Bias: tensor([-0.1751, -0.1395, -0.0534,  0.0234,  0.3446], grad_fn=<SubBackward0>)
Fair: 
Epoch: 99 Accuracy: 0.34880000352859497 Bias: tensor([ 0.0305,  0.0232,  0.0130,  0.0072, -0.0739], grad_fn=<SubBackward0>) Lambda: 0.46702802181243896
Epoch: 199 Accuracy: 0.47200000286102295 Bias: tensor([ 0.0305,  0.0162, -0.0002,  0.0232, -0.0697], grad_fn=<SubBackward0>) Lambda: 0.7088242769241333
Epoch: 299 Accuracy: 0.5074999928474426 Bias: tensor([ 0.0231,  0.0093, -0.0030,  0.0320, -0.0614], grad_fn=<SubBackward0>) Lambda: 0.9131637215614319
KL: 
Epoch: 99 Accuracy: 0.446399986743927 Bias:  tensor([-0.1226, -0.1040, -0.0517,  0.0325,  0.2457], grad_fn=<SubBackward0>) Divergence: 0.2025537341833114