Consider a minimax game between a data pre-processor $G$ and a discriminative decision maker $H$.

$H$ want to discriminate people from protected label, but as a decision maker, he must

* Use only the data processed by $G$
* Not get a too bad prediction

On the other side $G$ want to pre-process the data so that $H$ can not discriminate it easily among the restriction mentioned above. We add a further restriction that $G$ can not use the protected labels, but he may access other similiar data with them. Therefore in the whole process of decision making, we never use protected labels. We can then write this game as:

$$ \min_G \max_H \text{Loss}\{cov[e(X), H(G(X))]\} + \text{Acc}\{H(G(X))\} $$

Where here e(X) is the propensity score estimated using other similiar data, here we assume it will not change under the minimax game. As we mentioned before, $cov[e(X), H(G(X))]$ is a measure of the fairness of $H$ under statistical parity. Then the `Loss` is an arbitrary loss and `ACC` a measure of accuracy of $H$. Here we use 2-norm for the loss and negetive cross-entropy for the accuracy.

We can train this adversarial learning problem as step optimization as below:

For $G$ it do:
$$ \min_G || cov[e(X), H(G(X))] ||^2 $$

Then for $H$ it do:
$$ \max_H \lambda || cov[e(X), H(G(X))] ||^2 + E[Y\log(H(G(X))) + (1-Y)\log(1 - H(G(X)))]$$

We use this idea to do experiments as below: (data from [UCI Machine Learning Repo: default of credit card clients Data Set](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients))

In [5]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

In [2]:
xls = pd.ExcelFile("../data/credit_default.xls")
data = xls.parse('Data', skiprows=1, index_col=None)
data = np.asarray(data)[:,1:]
default = data[:,-1]
sex = data[:,1]
credit = np.delete(data,[1,23],axis=1)

### First, we standardized the data and split to three sets:

In [105]:
import sklearn.model_selection as sk

credit_mean = np.mean(credit, axis=0)
credit_std = np.std(credit, axis=0)
credit = (credit - credit_mean) / credit_std

credit_train, credit_test, label_train, label_test = sk.train_test_split( credit,
                                                                          np.vstack([sex,default]).T,
                                                                          test_size=0.5,
                                                                          random_state=42 )

credit_train, credit_val, label_train, label_val = sk.train_test_split(credit_train, 
                                                                       label_train, 
                                                                       test_size=0.1, 
                                                                       random_state=42)

credit_test, credit_dis, label_test, label_dis = sk.train_test_split(credit_test, 
                                                                       label_test, 
                                                                       test_size=0.66, 
                                                                       random_state=42)

print(credit_dis.shape, credit_train.shape, credit_test.shape)

(9900, 22) (13500, 22) (5100, 22)


### Then we train the propensity score

In [106]:
credit_dis.shape

(9900, 22)

In [38]:
print(sum(label_dis[:,0]-1==1), sum(label_dis[:,0]-1==0))

5981 3919


In [45]:
5981 / 9910

0.6035317860746721

In [108]:
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Activation

with tf.Session() as session:
    model = Sequential()
    model.add(Dense(32, input_dim=22, activation="relu"))
    model.add(Dense(8, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))

    model.compile(optimizer='rmsprop',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    model.fit(credit_dis, label_dis[:,0]-1, epochs=100, verbose=0)

    p_scores = model.predict(credit_train)

In [145]:
p_scores = np.array(p_scores)
p_scores.shape

(13500, 1)

In [84]:
def leaky_relu(x, alpha=0.01):
    """Compute the leaky ReLU activation function.
    
    Inputs:
    - x: TensorFlow Tensor with arbitrary shape
    - alpha: leak parameter for leaky ReLU
    
    Returns:
    TensorFlow Tensor with the same shape as x
    """
    # TODO: implement leaky ReLU
    act = alpha * tf.minimum(x, 0) + tf.maximum(x, 0)
    return act

In [181]:
def discriminator(x):
    """Compute discriminator score for a batch of input images.
    
    Inputs:
    - x: TensorFlow Tensor of shape [batch_size, 16]
    
    Returns:
    TensorFlow Tensor with shape [batch_size, 1], containing the score 
    for an image being real for each input image.
    """
    with tf.variable_scope("discriminator"):
        # TODO: implement architecture
        h1 = tf.layers.dense(x, 64)
        r1 = leaky_relu(h1, 0.01)
        h2 = tf.layers.dense(r1, 64)
        r2 = leaky_relu(h2, 0.01)
        logits = tf.layers.dense(r2, 1)
        return logits
    
def generator(z):
    """Generate images from a random noise vector.
    
    Inputs:
    - z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
    
    Returns:
    TensorFlow Tensor of generated images, with shape [batch_size, 16].
    """
    with tf.variable_scope("generator"):
        # TODO: implement architecture
        h1 = tf.layers.dense(z, 64)
        r1 = tf.nn.relu(h1)
        h2 = tf.layers.dense(r1, 64)
        r2 = tf.nn.relu(h2)
        proc = tf.layers.dense(r2, 16, activation=tf.tanh)
        return proc
    
def gan_loss(p_scores, labels, logits, lam):
    """Compute the GAN loss.
    - D_loss: discriminator loss scalar
    - G_loss: generator loss scalar
    """
    # TODO: compute D_loss and G_loss
    G_loss = tf.reduce_mean(p_scores * tf.nn.sigmoid(logits)) - \
             tf.reduce_mean(p_scores) * tf.reduce_mean(tf.nn.sigmoid(logits))
    G_loss = lam * tf.abs(G_loss)
    H_loss = - G_loss + \
            tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits))
    return H_loss, G_loss

In [182]:
def get_solvers(learning_rate=1e-2, beta1=0.8):
    """Create solvers for GAN training.
    
    Inputs:
    - learning_rate: learning rate to use for both solvers
    - beta1: beta1 parameter for both solvers (first moment decay)
    
    Returns:
    - D_solver: instance of tf.train.AdamOptimizer with correct learning_rate and beta1
    - G_solver: instance of tf.train.AdamOptimizer with correct learning_rate and beta1
    """
    H_solver = tf.train.AdamOptimizer(learning_rate, beta1)
    G_solver = tf.train.AdamOptimizer(learning_rate, beta1)
    return H_solver, G_solver

tf.reset_default_graph()

# number of images for each batch
batch_size = 8192
lam = 10

# placeholder for images from the training dataset
x = tf.placeholder(tf.float32, [None, 22])
label = tf.placeholder(tf.float32, (None,1))
ps = tf.placeholder(tf.float32, (None,1))
# generated images
G_sample = generator(x)
logits = discriminator(G_sample)

# Get the list of variables for the discriminator and generator
H_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'generator') 

# get our solver
H_solver, G_solver = get_solvers()

# get our loss
H_loss, G_loss = gan_loss(ps, label, logits, lam)

# setup training steps
H_train_step = H_solver.minimize(H_loss, var_list=H_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
H_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'generator')

In [185]:
def run_a_gan(sess, G_train_step, G_loss, H_train_step, H_loss, G_extra_step, H_extra_step,\
              show_every=250, print_every=40, batch_size=8192, num_epoch=400):

    # compute the number of iterations we need
    max_iter = int(credit_train.shape[0]*num_epoch/batch_size)
    for it in range(max_iter):
        # every show often, show a sample result
        if it % show_every == 0:
            pred = sess.run(tf.nn.sigmoid(logits), feed_dict={x:credit_test})
            pred = np.asarray(pred)
            sex_test = label_test[:,0]
            def_test = label_test[:,1]
            bias = np.mean(pred[sex_test==1]) - np.mean(pred[sex_test==2])
            acc = np.mean((pred>0.5) == def_test)
            print('Iter: {}, Acc: {:.4}, Bias:{:.4}'.format(it,acc,bias))
        # run a batch of data through the network
        indexes = np.random.choice(credit_train.shape[0], batch_size)
        minibatch, minibatch_label = credit_train[indexes,:], label_train[indexes,1].reshape(-1,1)
        props = p_scores[indexes,:]
        _, H_loss_curr = sess.run([H_train_step, H_loss], 
                                  feed_dict={x: minibatch, 
                                             label: minibatch_label,
                                             ps: props})
        _, G_loss_curr = sess.run([G_train_step, G_loss], 
                                  feed_dict={x: minibatch, 
                                             label: minibatch_label,
                                             ps: props})

        # print loss every so often.
        # We want to make sure D_loss doesn't go to 0
        if it % print_every == 0:
            print('Iter: {}, H: {:.4}, G:{:.4}'.format(it,H_loss_curr,G_loss_curr))

In [186]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    run_a_gan(sess,G_train_step,G_loss,H_train_step,H_loss,G_extra_step,H_extra_step)

Iter: 0, Acc: 0.4801, Bias:5.633e-05
Iter: 0, H: 0.6967, G:0.009761
Iter: 40, H: 0.518, G:0.001156
Iter: 80, H: 0.5094, G:0.00192
Iter: 120, H: 0.5193, G:0.001156
Iter: 160, H: 0.5294, G:0.002105
Iter: 200, H: 0.5119, G:0.0002198
Iter: 240, H: 0.515, G:0.001064
Iter: 250, Acc: 0.7918, Bias:0.0003522
Iter: 280, H: 0.5283, G:0.001137
Iter: 320, H: 0.521, G:0.0008732
Iter: 360, H: 0.533, G:0.0001033
Iter: 400, H: 0.5302, G:0.0003207
Iter: 440, H: 0.5236, G:0.000394
Iter: 480, H: 0.5274, G:0.0008403
Iter: 500, Acc: 0.7918, Bias:-0.0001589
Iter: 520, H: 0.5328, G:0.001093
Iter: 560, H: 0.5269, G:0.0006115
Iter: 600, H: 0.5235, G:0.0002435
Iter: 640, H: 0.5165, G:5.171e-05
