# Reweighing Technique for Census Income Dataset
This notebook was adapted from AIF360's example https://github.com/Trusted-AI/AIF360/blob/main/examples/demo_reweighing_preproc.ipynb

### Part 1: Install Dependencies

In [1]:
!pip install gower 'aif360[all]'

Collecting gower
  Downloading gower-0.1.2-py3-none-any.whl.metadata (3.7 kB)
Collecting aif360[all]
  Downloading aif360-0.6.1-py3-none-any.whl.metadata (5.0 kB)
Collecting skorch (from aif360[all])
  Downloading skorch-1.1.0-py3-none-any.whl.metadata (11 kB)
Collecting jupyter (from aif360[all])
  Downloading jupyter-1.1.1-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting sphinx-rtd-theme (from aif360[all])
  Downloading sphinx_rtd_theme-3.0.2-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting igraph[plotting] (from aif360[all])
  Downloading igraph-0.11.8-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting lime (from aif360[all])
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fairlearn~=0.7 (from aif360[all])
  Downloading fairlearn-0.12.0-py3-none-any.whl.metadata 

You can skip this next cell. It is mainly used for development and deletes all the notebook's variables.

In [39]:
!reset

[m[?7h[4l>7[r[?1;3;4;6l8

Import packages and define functions to evaluate performance and indvidual fairness. Before you run the next cell, you'll need to put 'adult.data', 'adult.names' and 'adult.test' in '/usr/local/lib/python3.11/dist-packages/aif360/data/raw/adult'

In [2]:
from IPython.display import Markdown, display
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_score, accuracy_score, classification_report, recall_score, f1_score

from aif360.algorithms.preprocessing import Reweighing
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions\
        import load_preproc_data_adult

def eval_performance(y_test, y_pred):
    # Evaluate performance
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    # Display metrics
    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')
    print("\nClassification Report:\n", classification_report(y_test, y_pred))

from sklearn.neighbors import NearestNeighbors
import gower

def eval_ind_fairness(x_train, y_train, x_test, y_pred):
  # Compute Gower distance matrix for test samples w.r.t training data
  gower_distances = gower.gower_matrix(x_test, x_train)  # Shape: (num_test_samples, num_train_samples)

  # Find k nearest neighbors (excluding self)
  k = 5  # Adjust as needed
  neighbors = np.argsort(gower_distances, axis=1)[:, 1:k+1]  # Get indices of k nearest neighbors

  # Compute consistency score: Fraction of nearest neighbors with same prediction
  consistencies = []
  for i, neigh_indices in enumerate(neighbors):
      neighbor_preds = y_train[neigh_indices]  # Get predictions of k neighbors from training labels
      consistency = np.mean(neighbor_preds == y_pred[i])  # Fraction with same prediction
      consistencies.append(consistency)

  # Calculate overall consistency score
  individual_fairness_score = np.mean(consistencies)
  return individual_fairness_score

  vect_normalized_discounted_cumulative_gain = vmap(
  monte_carlo_vect_ndcg = vmap(vect_normalized_discounted_cumulative_gain, in_dims=(0,))


### Part 2: Load the Data

In [3]:
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
dataset_orig = load_preproc_data_adult(['sex'])

np.random.seed(1)

  df['sex'] = df['sex'].replace({'Female': 0.0, 'Male': 1.0})


Split the data into training and test sets.

In [4]:
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.8], shuffle=True)

### Part 3: Train and Evaluate the Baseline

In [5]:
scale_orig = StandardScaler()
x_train = scale_orig.fit_transform(dataset_orig_train.features)
y_train = dataset_orig_train.labels.ravel()
w_train = dataset_orig_train.instance_weights.ravel()

lmod = LogisticRegression()
lmod.fit(x_train,
         y_train,
         sample_weight=dataset_orig_train.instance_weights)

In [6]:
x_test = scale_orig.fit_transform(dataset_orig_test.features)
y_test = dataset_orig_test.labels.ravel()

y_pred = lmod.predict(x_test)

In [7]:
# Evaluate performance of baseline model
eval_performance(y_test, y_pred)

Accuracy: 0.8071
Precision: 0.6592
Recall: 0.3934
F1 Score: 0.4927

Classification Report:
               precision    recall  f1-score   support

         0.0       0.83      0.94      0.88      7443
         1.0       0.66      0.39      0.49      2326

    accuracy                           0.81      9769
   macro avg       0.75      0.66      0.69      9769
weighted avg       0.79      0.81      0.79      9769



In [8]:
individual_fairness_score = eval_ind_fairness(x_train, y_train, x_test, y_pred)
print(f'Individual Fairness Consistency Score (with categorical features): {individual_fairness_score:.4f}')

Individual Fairness Consistency Score (with categorical features): 0.8029


### Part 4: Train and Evaluate the Model Trained on Reweighted Dataset

Run the reweighing algorithm.

In [9]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
RW.fit(dataset_orig_train)
dataset_rw_train = RW.transform(dataset_orig_train)

In [10]:
scale_orig = StandardScaler()
x_train_rw = scale_orig.fit_transform(dataset_rw_train.features)
y_train_rw = dataset_rw_train.labels.ravel()
w_train_rw = dataset_rw_train.instance_weights.ravel()

lmod_rw = LogisticRegression()
lmod_rw.fit(x_train_rw,
            y_train_rw,
            sample_weight=dataset_rw_train.instance_weights)

In [11]:
x_test = scale_orig.fit_transform(dataset_orig_test.features)
y_test = dataset_orig_test.labels.ravel()

y_pred_rw = lmod_rw.predict(x_test)

In [12]:
# Evaluate performance of model trained on reweighted dataset
eval_performance(y_test, y_pred_rw)

Accuracy: 0.7912
Precision: 0.5800
Recall: 0.4458
F1 Score: 0.5041

Classification Report:
               precision    recall  f1-score   support

         0.0       0.84      0.90      0.87      7443
         1.0       0.58      0.45      0.50      2326

    accuracy                           0.79      9769
   macro avg       0.71      0.67      0.69      9769
weighted avg       0.78      0.79      0.78      9769



In [13]:
individual_fairness_score_rw = eval_ind_fairness(x_train_rw, y_train_rw, x_test, y_pred_rw)
print(f'Individual Fairness Consistency Score (with categorical features): {individual_fairness_score_rw:.4f}')

Individual Fairness Consistency Score (with categorical features): 0.8016


### Part 5: Comments

Comments:
- The evaluation scores in the baseline presented here differ slightly from data_exploration.ipynb. This is probably due to the fact that 'load_preproc_data_adult' modifies the format of the tabular data. For more details you can take a look in the debugger or run:
```
print(dataset_orig_train.feature_names)
```
- **Model trained on reweighted data performs slightly worse:** This is expected. Utility - Fairness tradeoff
-**The change in individual fairness score is negligible:** This is expected. The reweighing strategy is used to address Group Fairness on the level of protected attributes, not individual fairness.


---


Future Work:
- Implement Group Fairness scoring.
- Implement other fairness scoring.
- Maybe we can look at other protected attributes. In this example I've just chosen 'sex'.