## Detecting and mitigating Age and Sex bias on credit decisions

### Step 1 -  Importing the Required Libraries 

In [78]:
import numpy as np
from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

### Step 2 - Load dataset, specifying protected attribute, and split dataset into train and test


In [79]:
dataset_orig = GermanDataset(
    protected_attribute_names=['age'],                           
    privileged_classes=[lambda x: x >= 25],     
    features_to_drop=['personal_status', 'sex'] 
   )

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [80]:
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

### Step 3 - Compute fairness metric on original training dataset

In [81]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.154574


### Step 4 - Mitigate bias by transforming the original dataset

In [82]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train) 

### Step 5 - Compute fairness metric on transformed dataset

In [83]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.000000


### Conclusion
Taking the reference of the tutorial and teh above code and following the same steps.

- I made changes to the dataset and re-evaluated it using the same measurement method as the original training dataset.
- I utilized the BinaryLabelDatasetMetric class and the mean difference method to determine if any bias had been reduced. 
- The results showed that my mitigation measures were effective, as the difference in mean outcomes had decreased to 0.0.
- Previously, the affluent group had a 15 percent advantage, but now the results indicate equality between the two groups in terms of mean outcomes.

###  Using sex attribute to detect and mitigate bias

Taking the reference of the tutorial and teh above code and following the same steps.

I loaded the dataset and specified the sex property as the protected attribute. Since age was not necessary for our analysis, it was dropped from the dataset. Next, I divided the original dataset into training and testing sets. To minimize bias, I set the privileged variable 
- (1) as male and the unprivileged variable 
- (0) as female for the sex property. 

These two variables were used to identify and minimize any potential biases in the dataset.

In [92]:
dataset_orig_s = GermanDataset(
    protected_attribute_names=['sex'],
     privileged_classes=[lambda x: x == 'male'],     
    features_to_drop=['personal_status', 'age'] 
   )

In [93]:
dataset_orig_train_s, dataset_orig_test_s = dataset_orig_s.split([0.7], shuffle=True)

In [94]:
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

In [95]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train_s, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.058012


Taking the reference of the tutorial and teh above code and following the same steps.
- I found that the privileged group (males) had a 6% advantage in positive outcomes in the training dataset. 
- This bias was undesirable and needed to be addressed. To mitigate this bias in the training dataset, I used a pre-processing mitigation technique. 
- This involved considering males as the positive class with a 6% bias and taking steps to minimize this bias in subsequent analyses.

### Step 4i - Mitigate bias by transforming the original dataset

- To address the gender bias present in the original dataset, I applied a pre-processing technique known as reweighing. 
- This involved assigning different weights to entities in the dataset to ensure fairness and minimize the effects of the bias.

- By using this technique, I aimed to reduce the impact of gender on the outcomes of the analysis and create a more equitable dataset.

In [96]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train_s) 

- The algorithm I used to mitigate biasness was the Reweighing Algorithm, which I had applied before building the model. 
- This algorithm transformed the dataset by assigning different weights to entities to ensure fairness and equity in positive outcomes on the protected attribute for both the privileged and unprivileged groups. 
- By applying this technique, I aimed to create a more balanced and unbiased dataset that would lead to more accurate and reliable analyses.

### Step 5i - Compute fairness metric on transformed dataset

In [97]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.000000


- After applying the reweighing technique to mitigate bias in the dataset, I evaluated its effectiveness using the same measures as the original training dataset. 
- The results showed that the mitigation step was successful, as the difference in mean outcomes had decreased to 0.0.
- Previously, the privileged group had a 6% advantage, but now the results indicate equality between the two groups in terms of mean outcomes. 
- This suggests that the pre-processing technique of reweighing was effective in reducing the bias and creating a more fair and equitable dataset.

- In my analysis, I identified that historical bias in datasets could result in unfair and unjust findings when building models based on that data. 
- Specifically, in my scenario, males were more likely to receive greater resources due to traditional biases in the data. 
- This is because traditional machine learning approaches prioritize accuracy over fairness, leading to biased results. 
- Nevertheless, I also demonstrated how basic bias mitigation strategies like reweighing can be implemented to eliminate bias from datasets, leading to models with equal accuracy and significantly higher fairness measures. 
- These bias mitigation techniques are crucial for any organization that seeks to automate decision-making processes for populations with protected characteristics and ensure that the resulting models are fair and unbiased.

### Understanding of Mitigation and Bias

As a human, I strive to make informed decisions by carefully considering the potential benefits and drawbacks of different options. However, sometimes our instincts may guide us towards certain decisions, which may not always be the best ones. Despite our belief that decision-making is a rational process, research has shown that implicit biases can subconsciously influence our conclusions without our awareness. This can have significant implications for learning leaders within an organization, who may be unaware of the biases that are impacting their objectivity and fairness.
Similarly, when dealing with imbalanced datasets, we often have a majority class in the target variable. To address this issue, we can use various machine learning techniques to generate a proper training dataset and reduce bias in our models.

### Challenges Encountered
During the course of this analysis, I encountered several challenges. Firstly, the decision to assume that males were more privileged than females based on their sex was a potential issue since it could be viewed as a biased assumption. Choosing the opposite assumption and assuming females to be more positively biased could have been another option.

Additionally, I faced technical challenges with the libraries used for implementing AI fairness measures, which required the installation of TensorFlow for proper functioning. Reading the dataset and making certain choices was also difficult, and I referred to external sources like the UCI Machine Learning Repository for help.

Lastly, while loading the dataset, I encountered issues with the dataset location as it needed to be in the conda environment dataset folder, which was not initially the case. I had to download the dataset and place it in the specified path to proceed with the analysis.