# Fairness in GCNs
In this python notebook, we have explored the use of Graph Convolutional Networks(GCNs) for the [Alibaba](https://tianchi.aliyun.com/dataset/56) dataset.

The dataset was pre-processed using the code from Erasmo Purificato's [CatGCN notebook](https://colab.research.google.com/drive/1zsx4an6BKYhJ_UT-mSl1_qPB-zyjTmrA#scrollTo=xxzSlLj3LDIu).  
This pre-processing provided us with various .csv files, which are used to form graph data.  

The nodes represent the user ids, with the node features being attributes such as buy, gender, student, etc. The edges between the nodes have been created through various relations between the users such as items bought, items clicked on, etc.  

In this notebook, we have only focused on GCNs and used fairness methods from the [AIF360](https://github.com/Trusted-AI/AIF360) framework.

# Imports

### Install missing libraries

In [1]:
!pip install torch_geometric
!pip install torch torchvision
!pip install Blackboxauditing
!pip install aif360

Collecting torch_geometric
  Downloading torch_geometric-2.4.0-py3-none-any.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch_geometric
Successfully installed torch_geometric-2.4.0
Collecting Blackboxauditing
  Downloading BlackBoxAuditing-0.1.54.tar.gz (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: Blackboxauditing
  Building wheel for Blackboxauditing (setup.py) ... [?25l[?25hdone
  Created wheel for Blackboxauditing: filename=BlackBoxAuditing-0.1.54-py2.py3-none-any.whl size=1394755 sha256=4254a1bd39be76b26c1e7b3ee562920b1fa7b7b31793f46408e7e98910b2b04e
  Stored in directory: /root/.cache/pip/wheels/c0/4f/b1/80e1b0790df07536470758fe0a4f9ff8fa942fd9fe30bbb192
Successfully built Blackboxaudi

### Import necessary libraries

In [2]:
# Basic data processing libraries
import pandas as pd
import numpy as np
import os
import torch

# Graph data processing libraries
import networkx as nx
from torch_geometric.data import Data
from torch_geometric.utils import from_networkx

# Libraries for (G)NNs
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

# AIF360
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing
from aif360.algorithms.preprocessing import DisparateImpactRemover

pip install 'aif360[LawSchoolGPA]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'


### Mount and connect to personal Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


### Initialise paths

In [4]:
base_path = "/content/drive/MyDrive/ColabNotebooks/Winter Semester 2023 24/HCAT"
raw_data_path = os.path.join(base_path, "Dataset", "Alibaba")
catgcn_path = os.path.join(base_path, "models", "CatGCN")
input_ali_data_path = os.path.join(catgcn_path, "input_ali_data")

---
# Data Stuff

### Load the data from .csv files

In [5]:
# Load the data files
user_labels_path = os.path.join(input_ali_data_path, "user_labels.csv")
user_edges_path = os.path.join(input_ali_data_path, "user_edge.csv")

In [6]:
# Create dataframes to store the information from the .csv files
user_labels = pd.read_csv(user_labels_path)
user_edges = pd.read_csv(user_edges_path)

### Pre-processing the data

In [7]:
# Prepare the data for GNNs
node_features = torch.tensor(user_labels.iloc[:, 1:].values, dtype=torch.float)
edge_index = torch.tensor(user_edges.values, dtype=torch.long).t().contiguous()

In [8]:
def show_df_info(df):
    print(df.info())
    print('####### Repeat ####### \n', df.duplicated().any())
    print('####### Count ####### \n', df.nunique())
    print('####### Example ####### \n',df.head())

In [9]:
node_features

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 1., 1.],
        [0., 2., 1.,  ..., 0., 1., 1.],
        ...,
        [0., 0., 1.,  ..., 2., 0., 1.],
        [0., 4., 1.,  ..., 3., 0., 1.],
        [0., 0., 1.,  ..., 0., 0., 1.]])

In [10]:
show_df_info(user_labels)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 166958 entries, 0 to 166957
Data columns (total 8 columns):
 #   Column   Non-Null Count   Dtype
---  ------   --------------   -----
 0   uid      166958 non-null  int64
 1   gender   166958 non-null  int64
 2   age      166958 non-null  int64
 3   buy      166958 non-null  int64
 4   student  166958 non-null  int64
 5   city     166958 non-null  int64
 6   bin_age  166958 non-null  int64
 7   bin_buy  166958 non-null  int64
dtypes: int64(8)
memory usage: 10.2 MB
None
####### Repeat ####### 
 False
####### Count ####### 
 uid        166958
gender          2
age             7
buy             3
student         2
city            4
bin_age         2
bin_buy         2
dtype: int64
####### Example ####### 
    uid  gender  age  buy  student  city  bin_age  bin_buy
0    0       0    0    0        0     0        0        0
1    1       0    1    1        1     0        1        1
2    2       0    2    1        1     0        1        1
3    3

In [11]:
# Create torch-geometric data
data = Data(x=node_features, edge_index=edge_index)

In [12]:
num_nodes = node_features.size(0)
num_classes = 2 # Binarised gender values from the data
num_node_features = data.num_node_features

# Create masks for training, and testing
train_mask = torch.zeros(num_nodes, dtype=torch.bool)
test_mask = torch.zeros(num_nodes, dtype=torch.bool)

# 80 - 20 Train and Test data split
num_train = int(num_nodes * 0.8)
train_mask[:num_train] = True
test_mask[num_train:] = True

data.train_mask = train_mask
data.test_mask = test_mask

In [13]:
num_nodes

166958

In [14]:
# Labels from the data (in this case: Gender Classification)
data.y = torch.tensor(user_labels['gender'].values, dtype=torch.long)

---
# Utils

### Function to clone the dataset

In [15]:
def clone(data):
    """
    Create a new cloned torch-geometric data object.

    Args:
    data: Actual data to be cloned.

    Returns:
    A torch-geometric data object.
    """
    clone_data = Data()

    # Copy the data's features and edges
    clone_data.x = data.x.clone()
    clone_data.edge_index = data.edge_index.clone()

    # Mask the data similar to the original train-test split
    clone_data.train_mask = data.train_mask.clone()
    clone_data.test_mask = data.test_mask.clone()

    # Copy the labels
    clone_data.y = data.y.clone()

    return clone_data

### Custom Loss Functions

In [16]:
def weighted_cross_entropy(output, data):
    """
    A custom loss function to calculate a weighted-cross entropy loss.

    Args:
    output: Outputs from the model.
    data: The torch-geometric data object used for the model.

    Returns:
    A weighted cross-entropy loss.
    """
    target = data.y[data.train_mask]
    weights = data.instance_weights[data.train_mask]

    loss = F.cross_entropy(output, target, reduction='none')
    weighted_loss = loss * weights

    return weighted_loss.mean()

In [17]:
def fairness_aware_loss(output, data, sensitive_attr, weighted=False, alpha=0.01, beta=0.01, gamma=0.01, delta=0.01):
    """
    Custom loss function to calculate a fairness-aware loss.
    This includes measures for statistical parity, treatment equality, equal opportunity difference, and overall accuracy equality difference.

    Args:
    output: Outputs from the model.
    data: The torch-geometric data object used for the model.
    sensitive_attr: The sensitive attribute in the data (e.g., bin_age).
    weighted: Boolean value indicating re-weighing done to the data or not.
    alpha: Parameter for statistical parity regularizer strength.
    beta: Parameter for treatment equality regularizer strength.
    gamma: Parameter for equal opportunity difference regularizer strength.
    delta: Parameter for overall accuracy equality difference regularizer strength.

    Returns:
    A fairness-aware combined loss.
    """
    if weighted:
        # Weighted cross-entropy loss
        standard_loss = weighted_cross_entropy(output, data)
    else:
        # Standard cross-entropy loss
        target = data.y[data.train_mask]
        standard_loss = F.cross_entropy(output, target)

    labels = data.y[train_mask]
    pos_prob = torch.sigmoid(output[:, 1])
    neg_prob = 1 - pos_prob
    predictions = output.argmax(dim=1)

    # Statistical Parity Regularization
    sp_reg = torch.abs(pos_prob[sensitive_attr == 1].mean() - pos_prob[sensitive_attr == 0].mean())

    # Treatment Equality Regularization
    fp_diff = (neg_prob * (labels == 0) * (sensitive_attr == 1)).float().mean() - \
              (neg_prob * (labels == 0) * (sensitive_attr == 0)).float().mean()
    fn_diff = (pos_prob * (labels == 1) * (sensitive_attr == 1)).float().mean() - \
              (pos_prob * (labels == 1) * (sensitive_attr == 0)).float().mean()
    treatment_reg = torch.abs(fp_diff) + torch.abs(fn_diff)

    # Equal Opportunity Difference Regularization
    eod_reg = torch.abs((pos_prob * (labels == 1) * (sensitive_attr == 1)).float().mean() - \
                        (pos_prob * (labels == 1) * (sensitive_attr == 0)).float().mean())

    # Overall Accuracy Equality Difference Regularization
    oaed_reg = torch.abs((pos_prob * (sensitive_attr == 1)).float().mean() - \
                         (pos_prob * (sensitive_attr == 0)).float().mean())

    # Combine losses
    combined_loss = standard_loss + alpha * sp_reg + beta * treatment_reg + gamma * eod_reg + delta * oaed_reg

    return combined_loss

### Fairness Metrics

In [18]:
def calculate_fairness(label, predictions, sens_attr='bin_age'):
    """
    Calculate various fairness metrics.

    Args:
    label: Actual labels (binary).
    predictions: Model predictions (binary).
    sens_attr: Binary sensitive attribute for fairness evaluation.

    Returns:
    A dictionary containing SPD, EOD, OAED, and TED values.
    """

    labels = torch.tensor(user_labels[label].values, dtype=torch.long)
    sensitive_attribute = torch.tensor(user_labels[sens_attr].values, dtype=torch.long)

    predictions = predictions.float()
    labels = labels.float()
    sensitive_attribute = sensitive_attribute.float()

    def statistical_parity_difference():
        prob_group_1 = predictions[sensitive_attribute == 1].mean()
        prob_group_0 = predictions[sensitive_attribute == 0].mean()
        return abs(prob_group_1 - prob_group_0)

    def equal_opportunity_difference():
        tpr_group_1 = predictions[(labels == 1) & (sensitive_attribute == 1)].mean()
        tpr_group_0 = predictions[(labels == 1) & (sensitive_attribute == 0)].mean()
        return abs(tpr_group_1 - tpr_group_0)

    def overall_accuracy_equality_difference():
        acc_group_1 = (predictions[sensitive_attribute == 1] == labels[sensitive_attribute == 1]).float().mean()
        acc_group_0 = (predictions[sensitive_attribute == 0] == labels[sensitive_attribute == 0]).float().mean()
        return abs(acc_group_1 - acc_group_0)

    def treatment_equality_difference():
        fn_group_1 = ((predictions == 0) & (labels == 1) & (sensitive_attribute == 1)).sum()
        fp_group_1 = ((predictions == 1) & (labels == 0) & (sensitive_attribute == 1)).sum()

        fn_group_0 = ((predictions == 0) & (labels == 1) & (sensitive_attribute == 0)).sum()
        fp_group_0 = ((predictions == 1) & (labels == 0) & (sensitive_attribute == 0)).sum()

        ratio_group_1 = fn_group_1 / fp_group_1 if fp_group_1 != 0 else float('inf')
        ratio_group_0 = fn_group_0 / fp_group_0 if fp_group_0 != 0 else float('inf')

        return abs(ratio_group_1 - ratio_group_0)

    # Calculating each fairness metric
    spd = statistical_parity_difference()
    eod = equal_opportunity_difference()
    oaed = overall_accuracy_equality_difference()
    ted = treatment_equality_difference()

    return {
        'Statistical Parity Difference': spd,
        'Equal Opportunity Difference': eod,
        'Overall Accuracy Equality Difference': oaed,
        'Treatment Equality Difference': ted
    }

### Train and Test

In [19]:
# Train the model
def training(model, data, optimizer, epochs, weighted=False, fairness=False, alpha=0.01, beta=0.01, gamma=0.01, delta=0.01):
    """
    Helper function to train the GNN model.

    Args:
    model: Initialized GNN model.
    data: The torch_geometric data used to train the model.
    optimizer: Optimizer used to train the model.
    weighted: Boolean value indicating re-weighing done to the data or not.
    fairness: Boolean value indicating whether to use fairness-aware loss or not.
    alpha: Parameter for statistical parity regularizer strength.
    beta: Parameter for treatment equality regularizer strength.
    gamma: Parameter for equal opportunity difference regularizer strength.
    delta: Parameter for overall accuracy equality difference regularizer strength.

    Returns:
    -
    """
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        out = model(data)

        if fairness:
            loss = fairness_aware_loss(out[data.train_mask], data, data.x[data.train_mask, -1],
                                       weighted=weighted, alpha=alpha, beta=beta, gamma=gamma, delta=delta)
        elif weighted:
            loss = weighted_cross_entropy(out[data.train_mask], data)
        else:
            criterion = torch.nn.CrossEntropyLoss()
            loss = criterion(out[data.train_mask], data.y[data.train_mask])

        loss.backward()
        optimizer.step()

        if epoch % 10 == 0:
            print(f'Epoch {epoch} | Loss: {loss.item()}')

In [20]:
# Test the model
def test(model, data):
    """
    Helper function to test the trained GNN model.
    Prints the Accuracy, as well as various fairness metrics values.
    For fairness metrics used: Check the calculate_fairness method

    Args:
    model: Trained GNN model.
    data: The torch_geometric data used to train the model.

    Returns:
    -
    """
    model.eval()
    with torch.no_grad():
      out = model(data)

    _, pred = model(data).max(dim=1)
    correct = int(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
    accuracy = correct / int(data.test_mask.sum())
    print(f'Accuracy: {accuracy}')

    # Convert model outputs to binary predictions
    predictions = out.argmax(dim=1)

    # Fairness calculated for gender-classification task with bin_age as the sensitive attribute
    fairness_metrics = calculate_fairness(label='gender', predictions=predictions, sens_attr='bin_age')

    # Print the fairness metrics
    for metric, value in fairness_metrics.items():
        print(f"{metric}: {value}")

    return accuracy, fairness_metrics

---
# AIF360 Pre-processing

In [21]:
# Convert data to a format suitable for AIF360
dataset = BinaryLabelDataset(df=user_labels, label_names=['gender'], protected_attribute_names=['bin_age'])

### Re-weighing the data w.r.t. a sensitive attribute

In [22]:
# Apply the reweighing
RW = Reweighing(unprivileged_groups=[{'bin_age': 0}], privileged_groups=[{'bin_age': 1}])
dataset_transf = RW.fit_transform(dataset)

# Create a copy of the dataset
rw_data = clone(data)

# Add weights to the PyTorch Geometric Data object
rw_data.instance_weights = torch.tensor(dataset_transf.instance_weights, dtype=torch.float)

---
# GCN Models

In [23]:
# GCN class that takes in the data as an input for dimensions of the convolutions
class GCN(torch.nn.Module):
    def __init__(self, data):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(data.num_node_features, 16)
        self.conv2 = GCNConv(16, 2) # 2 output classes for gender

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)

### Base Model

In [24]:
# Instantiate the model, define loss function and optimizer
gcn_model = GCN(data)
gcn_optimizer = torch.optim.Adam(gcn_model.parameters(), lr=0.01)

In [25]:
# Train the first model: GCN, standard data, cross-entropy loss
training(model=gcn_model, data=data, optimizer=gcn_optimizer, epochs=50)

Epoch 0 | Loss: 1.192900538444519
Epoch 10 | Loss: 0.5714607834815979
Epoch 20 | Loss: 0.5072408318519592
Epoch 30 | Loss: 0.46354928612709045
Epoch 40 | Loss: 0.429584801197052


In [27]:
# Test the first model: GCN, standard data, cross-entropy loss
print("Here are the values for the GCN model with the standard dataset and cross-entropy loss: ")
print()
test(gcn_model, data)
print()

Here are the values for the GCN model with the standard dataset and cross-entropy loss: 

Accuracy: 0.8521801629132726
Statistical Parity Difference: 0.046637266874313354
Equal Opportunity Difference: 0.06927275657653809
Overall Accuracy Equality Difference: 0.08258169889450073
Treatment Equality Difference: 1.336693286895752



### Second model: GCN, standard data, fairness-aware loss (alpha=0.01)

In [27]:
# Instantiate the second model, define loss function and optimizer
gcn_model2 = GCN(data)
gcn_optimizer2 = torch.optim.Adam(gcn_model2.parameters(), lr=0.01)

In [25]:
# Train the second model: GCN, standard data, fairness-aware loss (alpha=0.01)
training(model=gcn_model2, data=data, optimizer=gcn_optimizer2, epochs=50, fairness=True)

Epoch 0 | Loss: 0.6220347881317139
Epoch 10 | Loss: 0.5531500577926636
Epoch 20 | Loss: 0.5012115240097046
Epoch 30 | Loss: 0.4466976225376129
Epoch 40 | Loss: 0.4097009301185608


In [30]:
# Test the second model: GCN, standard data, fairness-aware loss (alpha=0.01)
print("Here are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.01): ")
print()
test(gcn_model2, data)

Here are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.01): 

Accuracy: 0.8378653569717297
Statistical Parity Difference: 0.030072450637817383
Equal Opportunity Difference: 0.06199532747268677
Overall Accuracy Equality Difference: 0.08194565773010254
Treatment Equality Difference: 2.69785213470459


(0.8378653569717297,
 {'Statistical Parity Difference': tensor(0.0301),
  'Equal Opportunity Difference': tensor(0.0620),
  'Overall Accuracy Equality Difference': tensor(0.0819),
  'Treatment Equality Difference': tensor(2.6979)})

### Ignore

Test to check if stronger fairness-constraint produces a better model:

In [None]:
gcn_model3 = GCN(data)
gcn_optimizer3 = torch.optim.Adam(gcn_model3.parameters(), lr=0.01)

training(model=gcn_model3, data=data, optimizer=gcn_optimizer3, epochs=30, fairness=True, alpha=0.05)

print("Here are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.05): ")
print()
test(gcn_model3, data)

In [None]:
gcn_model4 = GCN(data)
gcn_optimizer4 = torch.optim.Adam(gcn_model4.parameters(), lr=0.01)

training(model=gcn_model4, data=data, optimizer=gcn_optimizer4, epochs=30, fairness=True, beta=0.005)
# print("\nHere are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.005): ")
print()
test(gcn_model4, data)
print()

In [None]:
print("\nHere are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.005): ")
print()
test(gcn_model4, data)


Here are the values for the GCN model with the standard dataset and fairness-entropy loss(alpha=0.005): 

Accuracy: 0.847747963584092
Statistical Parity Difference: 0.047549135982990265
Equal Opportunity Difference: 0.053668081760406494
Overall Accuracy Equality Difference: 0.08105164766311646
Treatment Equality Difference: 2.612381935119629


### Third model: GCN, re-weighed data, weighted-cross-entropy loss (alpha=0.01)

In [24]:
# Instantiate the third model, define loss function and optimizer
rw_data_gcn_model = GCN(rw_data)
rw_data_gcn_model_optimizer = torch.optim.Adam(rw_data_gcn_model.parameters(), lr=0.01)

In [25]:
# Train the third model: GCN, re-weighed data, weighted-cross entropy loss
training(model=rw_data_gcn_model, data=rw_data, optimizer=rw_data_gcn_model_optimizer, epochs=50, weighted=True)

Epoch 0 | Loss: 0.6312071681022644
Epoch 10 | Loss: 0.5407678484916687
Epoch 20 | Loss: 0.4680361747741699
Epoch 30 | Loss: 0.4228784739971161
Epoch 40 | Loss: 0.3963108956813812


In [26]:
# Test the third model: GCN, re-weighed data, weighted-cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted-cross-entropy loss: ")
print()
test(rw_data_gcn_model, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted-cross-entropy loss: 

Accuracy: 0.8576305701964543
Statistical Parity Difference: 0.034996867179870605
Equal Opportunity Difference: 0.1110086739063263
Overall Accuracy Equality Difference: 0.08673566579818726
Treatment Equality Difference: 0.6095795631408691



In [None]:
# Test the third model: GCN, re-weighed data, weighted-cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted-cross-entropy loss: ")
print()
test(rw_data_gcn_model, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted-cross-entropy loss: 

Accuracy: 0.8468495448011499
Statistical Parity Difference: 0.030316658318042755
Equal Opportunity Difference: 0.10094630718231201
Overall Accuracy Equality Difference: 0.08911752700805664
Treatment Equality Difference: 0.06582164764404297


### Fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss

In [40]:
# Instantiate the fourth model, define loss function and optimizer
rw_data_gcn_model2 = GCN(rw_data)
rw_data_gcn_model_optimizer2 = torch.optim.Adam(rw_data_gcn_model2.parameters(), lr=0.01)

In [28]:
# Train the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
training(model=rw_data_gcn_model2, data=rw_data, optimizer=rw_data_gcn_model_optimizer2, epochs=50, weighted=True, fairness=True)

Epoch 0 | Loss: 0.5843248963356018
Epoch 10 | Loss: 0.5047746896743774
Epoch 20 | Loss: 0.44967859983444214
Epoch 30 | Loss: 0.4109334349632263
Epoch 40 | Loss: 0.3880102038383484


In [29]:
# Test the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: ")
print()
test(rw_data_gcn_model2, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: 

Accuracy: 0.8555342597029229
Statistical Parity Difference: 0.034720584750175476
Equal Opportunity Difference: 0.1112838089466095
Overall Accuracy Equality Difference: 0.08701056241989136
Treatment Equality Difference: 0.6815862655639648



In [31]:
# Train the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
training(model=rw_data_gcn_model2, data=rw_data, optimizer=rw_data_gcn_model_optimizer2, epochs=50, weighted=True,
         fairness=True, alpha=0.01, beta=0.005, gamma=0.015, delta=0.012)

Epoch 0 | Loss: 0.745595395565033
Epoch 10 | Loss: 0.5941913723945618
Epoch 20 | Loss: 0.5534152984619141
Epoch 30 | Loss: 0.5050016641616821
Epoch 40 | Loss: 0.4620010554790497


In [32]:
# Test the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: ")
print()
test(rw_data_gcn_model2, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: 

Accuracy: 0.8442441303306181
Statistical Parity Difference: 0.02938065677881241
Equal Opportunity Difference: 0.09115374088287354
Overall Accuracy Equality Difference: 0.08772581815719604
Treatment Equality Difference: 1.3014488220214844



In [34]:
# Train the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
training(model=rw_data_gcn_model2, data=rw_data, optimizer=rw_data_gcn_model_optimizer2, epochs=50, weighted=True,
         fairness=True, alpha=0.01, beta=0.012, gamma=0.015, delta=0.015)

Epoch 0 | Loss: 0.7213656306266785
Epoch 10 | Loss: 0.5258520841598511
Epoch 20 | Loss: 0.4605921804904938
Epoch 30 | Loss: 0.424991250038147
Epoch 40 | Loss: 0.40024101734161377


In [35]:
# Test the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: ")
print()
test(rw_data_gcn_model2, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: 

Accuracy: 0.8580797795879253
Statistical Parity Difference: 0.042138680815696716
Equal Opportunity Difference: 0.0997222363948822
Overall Accuracy Equality Difference: 0.0845213532447815
Treatment Equality Difference: 0.3458895683288574



In [None]:
# Train the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
training(model=rw_data_gcn_model2, data=rw_data, optimizer=rw_data_gcn_model_optimizer2, epochs=50, weighted=True,
         fairness=True, alpha=0.01, beta=0.012, gamma=0.02, delta=0.015)

In [39]:
# Test the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: ")
print()
test(rw_data_gcn_model2, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: 

Accuracy: 0.8528689506468615
Statistical Parity Difference: 0.04595881700515747
Equal Opportunity Difference: 0.0724097192287445
Overall Accuracy Equality Difference: 0.08293139934539795
Treatment Equality Difference: 1.1784725189208984



In [41]:
# Train the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
training(model=rw_data_gcn_model2, data=rw_data, optimizer=rw_data_gcn_model_optimizer2, epochs=50, weighted=True,
         fairness=True, alpha=0.01, beta=0.012, gamma=0.018, delta=0.018)

Epoch 0 | Loss: 0.6623578667640686
Epoch 10 | Loss: 0.5442702770233154
Epoch 20 | Loss: 0.47677215933799744
Epoch 30 | Loss: 0.42851555347442627
Epoch 40 | Loss: 0.39934295415878296


In [42]:
# Test the fourth model: GCN, re-weighed data, weighted- and fairness-aware cross entropy loss
print("Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: ")
print()
test(rw_data_gcn_model2, data)
print()

Here are the values for the GCN model with the re-weighed dataset and weighted- and fairness-aware cross-entropy loss: 

Accuracy: 0.8569717297556301
Statistical Parity Difference: 0.03602193295955658
Equal Opportunity Difference: 0.11044105887413025
Overall Accuracy Equality Difference: 0.08629560470581055
Treatment Equality Difference: 0.6496486663818359



---
---
---
---
---
# Extras


### DI Remover

In [None]:
# Convert your data to a format suitable for AIF360
dataset = BinaryLabelDataset(df=user_labels, label_names=['gender'], protected_attribute_names=['bin_age'])

# Apply the Disparate Impact Remover
DIR = DisparateImpactRemover(repair_level=1.0)
dataset_transf = DIR.fit_transform(dataset)

# Extract the transformed features
transformed_features = dataset_transf.features

In [None]:
rw_data.x = torch.tensor(transformed_features, dtype=torch.float)

### DI GCN

In [None]:
class GCN2(torch.nn.Module):
    def __init__(self):
        super(GCN2, self).__init__()
        self.conv1 = GCNConv(rw_data.num_node_features, 16)
        self.conv2 = GCNConv(16, 2)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = F.relu(self.conv1(x, edge_index))
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)

In [None]:
# Instantiate the model, define loss function and optimizer
gcn_model = GCN(rw_data)
gcn_optimizer = torch.optim.Adam(gcn_model.parameters(), lr=0.01)

In [None]:
training(model=gcn_model, data=rw_data, optimizer=gcn_optimizer, epochs=100)

Epoch 0 | Loss: 2247.96337890625
Epoch 10 | Loss: 1941.6514892578125
Epoch 20 | Loss: 1676.3948974609375


KeyboardInterrupt: 

In [None]:
test(gcn_model, rw_data)

### Loss

In [None]:
def fairness_aware_loss(output, data, sensitive_attr, weighted=False, alpha=0.01):
    """
    A custom loss function to calculate a fairness-aware loss.
    The fairness-factor measures the disparity in predictions between +ve and -ve sensitive attribute group.

    Args:
    output: Outputs from the model.
    data: The torch-geometric data object used for the model.
    sensitive_attr: The sensitive attribute in the data (in our case: bin_age)
    weighted: Boolean value indicating re-weighing done to the data or not.
    alpha: Parameter to control the strength of the fairness regularizer.

    Returns:
    A fairness-aware combined loss.
    """
    if weighted:
        # Call the weighted-cross entropy loss
        standard_loss = weighted_cross_entropy(output, data)
    else:
        # Call standard cross-entropy loss
        target = data.y[data.train_mask]
        standard_loss = F.cross_entropy(output, target)

    pos_prob = torch.sigmoid(output[:, 1])

    fairness_reg = torch.abs(pos_prob[sensitive_attr == 1].mean() - pos_prob[sensitive_attr == 0].mean())
    combined_loss = standard_loss + alpha * fairness_reg

    return combined_loss