## Fairness metrics

Equal Opportunity Difference (EOD): EOD is a metric used to evaluate fairness in binary classification. It measures the difference between the true positive rate for the privileged group and the true positive rate for the unprivileged group.

Demographic Parity Difference (DPD): DPD is another metric used to evaluate fairness in binary classification. It measures the difference between the proportion of positive predictions for the privileged group and the proportion of positive predictions for the unprivileged group.

In [None]:
from fairness_indicators.metrics import (
    compute_base_rate,
    compute_confusion_matrix,
    compute_error_rate,
    compute_false_negative_rate,
    compute_false_positive_rate,
    compute_true_negative_rate,
    compute_true_positive_rate,
    fairness_indicator,
)

# Load your dataset and split into training and test sets
X_train, y_train = ...
X_test, y_test = ...

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Compute the EOD and DPD
eod = fairness_indicator(
    y_true=y_test,
    y_pred=model.predict(X_test),
    sensitive_features=X_test[:, 0],  # the sensitive feature(s) in your dataset
    base_rate=compute_base_rate(y_true=y_test, sensitive_features=X_test[:, 0]),
    function=compute_true_positive_rate,
)
dpd = fairness_indicator(
    y_true=y_test,
    y_pred=model.predict(X_test),
    sensitive_features=X_test[:, 0],
    base_rate=compute_base_rate(y_true=y_test, sensitive_features=X_test[:, 0]),
    function=compute_error_rate,
)

print(f"EOD: {eod:.3f}")
print(f"DPD: {dpd:.3f}")


## Robustness metrics

Adversarial Accuracy: Adversarial accuracy is a metric that measures the accuracy of a model on adversarial examples, which are inputs that are specifically crafted to fool the model. Adversarial accuracy is often used to evaluate the robustness of a model to adversarial attacks.

Margin: Margin is the difference between the probability assigned by the model to the predicted class and the probability assigned to the next most likely class. A large margin indicates that the model is more confident in its predictions, which can be a sign of robustness.

Lipschitz Constant: The Lipschitz constant measures the rate at which a function (in this case, the machine learning model) changes with respect to its input. A lower Lipschitz constant indicates that the model is less sensitive to small changes in the input, which can be a sign of robustness.

Error Rate Under Distributional Shift: This metric measures the difference in error rate between the training data and a test dataset that is drawn from a different distribution. A low difference in error rate indicates that the model is more robust to distributional shift.

To compute these metrics in Python, you may need to use additional libraries such as cleverhans, ART (Adversarial Robustness Toolbox), or foolbox, which provide built-in methods to generate adversarial examples and evaluate their impact on the model.

In [None]:
import numpy as np
import tensorflow as tf
import cleverhans
from cleverhans.attacks import FastGradientMethod
from sklearn.linear_model import LogisticRegression

# Load your dataset and split into training and test sets
X_train, y_train = ...
X_test, y_test = ...

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Create a TensorFlow session and graph
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=(None, X_test.shape[1]))
y = tf.placeholder(tf.float32, shape=(None,))

# Convert the logistic regression model to a TensorFlow graph
logits = tf.squeeze(model.predict_proba(x)[:, 1:])
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)
grads = tf.gradients(loss, x)[0]

# Create an adversarial example generator using the Fast Gradient Method
fgsm = FastGradientMethod(model_fn=lambda x: logits, sess=sess)
adv_x = fgsm.generate(x, eps=0.1, clip_min=0., clip_max=1.)

# Evaluate the model on clean and adversarial test data
clean_acc = model.score(X_test, y_test)
adv_acc = model.score(sess.run(adv_x, feed_dict={x: X_test}), y_test)

# Compute the margin
probs = model.predict_proba(X_test)[:, 1:]
margin = np.max(probs, axis=1) - np.sum(probs * (1 - y_test.reshape(-1, 1)), axis=1)

print(f"Clean accuracy: {clean_acc:.3f}")
print(f"Adversarial accuracy: {adv_acc:.3f}")
print(f"Margin: {np.mean(margin):.3f}")
