In [None]:
# Convert the target variable (Price) into categories (low price, high price)
# For example, consider prices below the median as 'Low' and above the median as 'High'

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load  cleaned dataset (this is already in stock_df)
stock_df = pd.read_csv('cleaned.csv')

# Create a binary target: 0 for 'Low' Price and 1 for 'High' Price
median_price = stock_df['Price'].median()
stock_df['Price_Bin'] = stock_df['Price'].apply(lambda x: 1 if x > median_price else 0)

# Prepare features (X) and target (y)
X = stock_df[['US_GDP', 'US_HousingPriceIndex', 'US_INDPRO', 'US_UNRATE', 'US_CPI']]
y = stock_df['Price_Bin']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [2]:
# Define the sensitive attribute
# Create a binary feature for US_UNRATE (0: Low Unemployment, 1: High Unemployment)
median_unrate = stock_df['US_UNRATE'].median()
stock_df['Unemployment_Bin'] = stock_df['US_UNRATE'].apply(lambda x: 1 if x > median_unrate else 0)


In [3]:
#pip install aif360

In [4]:
# Import AIF360 libraries
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing

# Convert the dataset into AIF360 BinaryLabelDataset
dataset = BinaryLabelDataset(
    df=stock_df[['US_GDP', 'US_HousingPriceIndex', 'US_INDPRO', 'US_UNRATE', 'US_CPI', 'Unemployment_Bin', 'Price_Bin']],
    label_names=['Price_Bin'],
    protected_attribute_names=['Unemployment_Bin'],  # Sensitive attribute
    favorable_label=1,  # 'High Price' is favorable
    unfavorable_label=0  # 'Low Price' is unfavorable
)

# Metrics for the dataset
metric = BinaryLabelDatasetMetric(dataset, privileged_groups=[{'Unemployment_Bin': 0}], unprivileged_groups=[{'Unemployment_Bin': 1}])

# Calculate fairness metrics
print(f"Disparate Impact: {metric.disparate_impact()}")
print(f"Statistical Parity Difference: {metric.statistical_parity_difference()}")
print(f"Consistency: {metric.consistency()}")


pip install 'aif360[AdversarialDebiasing]'
pip install 'aif360[AdversarialDebiasing]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'
pip install 'aif360[inFairness]'
pip install 'aif360[Reductions]'


Disparate Impact: 0.3947402429568364
Statistical Parity Difference: -0.43058747816493514
Consistency: [0.93084746]




1. **Disparate Impact: 0.3947**
   - **Interpretation**: Disparate impact measures the ratio of favorable outcomes (e.g., "High Price") for the unprivileged group (high unemployment) to the privileged group (low unemployment).
   - **Threshold**: A value of **1** indicates fairness, while a value below **0.8** (or 80%) suggests potential bias.
   - **In this case**: A disparate impact of **0.39** means that the unprivileged group (high unemployment) is **39%** as likely to receive favorable outcomes compared to the privileged group. This indicates **significant bias** against the unprivileged group.

2. **Statistical Parity Difference: -0.4306**
   - **Interpretation**: Statistical parity difference measures the difference in the probability of favorable outcomes between the privileged and unprivileged groups.
   - **Threshold**: A value close to **0** suggests fairness, while a negative value indicates the unprivileged group is less likely to receive favorable outcomes.
   - **In this case**: A statistical parity difference of **-0.43** shows that the unprivileged group (high unemployment) is much less likely to receive favorable outcomes, further indicating bias.

3. **Consistency: 0.9308**
   - **Interpretation**: Consistency measures whether similar individuals receive similar predictions. A value close to **1** indicates that the model is consistent in its predictions.
   - **In this case**: A consistency score of **0.93** means that the model is relatively consistent in treating similar individuals similarly, which is a positive indication.

### Summary:
- **Bias Detected**: The **disparate impact** and **statistical parity difference** values indicate significant bias against individuals in the unprivileged group (high unemployment). The unprivileged group has much lower chances of receiving favorable outcomes.
- **Consistency**: model shows good consistency, meaning that similar data points are treated similarly. However, this consistency doesnâ€™t necessarily mean fairness, as it may be consistently biased.

### Next Steps:
1. **Bias Mitigation**: mitigate the bias detected using techniques such as **reweighing**, **preprocessing** the data, or **post-processing** the model's predictions to ensure fairer outcomes.
2. **Reevaluate**: After mitigation, reevaluate these fairness metrics to see if the disparate impact and statistical parity difference have improved.



In [14]:
# Apply Reweighing to mitigate bias
reweighing = Reweighing(unprivileged_groups=[{'Unemployment_Bin': 1}], privileged_groups=[{'Unemployment_Bin': 0}])
dataset_transf = reweighing.fit_transform(dataset)

# Check metrics after reweighing
metric_transf = BinaryLabelDatasetMetric(dataset_transf, privileged_groups=[{'Unemployment_Bin': 0}], unprivileged_groups=[{'Unemployment_Bin': 1}])
print(f"Disparate Impact after Reweighing: {metric_transf.disparate_impact()}")


Disparate Impact after Reweighing: 1.0000000000000004
