# Welcome to the Bias-Athon 2025!

This notebook introduces the learning objectives, prepares the datasets, and provides an overview of the workshop.


## Learning Objectives
 The workshop aims to:
 
 - Explore the impact of biased data on model performance.

 - Investigate concept shift and covariate shift to understand model drift and its implications.

 - Apply debiasing techniques to mitigate identified biases.

 - Use and evaluate a simple neural network on biased and unbiased datasets.

 - Learn basic deployment considerations to monitor and handle model degradation.


## Schedule (2 Hours)

 ### Hour 1: Data Preparation and Visualization (Notebook 1)
 
 - Load and preprocess the WiDS dataset.
 
 - Introduce biased datasets:
   - Increase SpO2 values for Black patients by 10%.
   - Drop serum lactate measurements for Black patients.
   - Apply a combination of the above biases.
 
 - Visualize the datasets:
   - Explore feature distributions before and after introducing biases.
   - Generate "Table One" summarizing key variables across datasets.
   - Highlight potential bias and fairness issues.
 
 ### Hour 2: Model Building and Bias Evaluation (Notebook 2)
 
 - Load a trained simple neural network and decision tree on the original dataset (Go over created Model Card -- refer to pdf as example for creation).

 - Evaluate its performance across demographic groups (e.g., by race/ethnicity).
 
 - Load a trained the models on one of the biased datasets and compare results:
   - Highlight performance differences and fairness metrics.
 
 - BDiscuss concept drift by redefining mortality (e.g., changing the target threshold or criteria).
 
 - Apply a debiasing technique to mitigate bias (e.g., reweighting by demographic group prevalence).
 
 - Discuss how to monitor model drift in deployment settings.

 ## Materials

 - **WiDS dataset** - Download the dataset ("training_v2.csv") [here](https://www.kaggle.com/competitions/widsdatathon2020/data).

 - **Data Dictionary** - Refer to the provided documentation for variable definitions.

 - **Bias-Athon GitHub Repository** - Clone the repository for all notebooks and datasets.


## Dataset Preparation

# Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")

# Step 2: Load Data

In [None]:

data = pd.read_csv("data/training_v2.csv")

# Step 3: Create Biased Datasets


In [None]:
# Increase SpO2 values for Black patients by 10%
delta_to_add = 10
data['d1_spo2_min_new'] = data.apply(
    lambda row: row.d1_spo2_min + delta_to_add if row.ethnicity == 'African American' and row.d1_spo2_min + delta_to_add <= 100
    else (100 if row.ethnicity == 'African American' else row.d1_spo2_min),
    axis=1
)

# Drop serum lactate measurements for Black patients
data['d1_lactate_max_new'] = data.apply(
    lambda row: np.nan if row.ethnicity == 'African American' else row.d1_lactate_max,
    axis=1
)

# Combine both biases
data['d1_spo2_min_combined'] = data['d1_spo2_min_new']
data['d1_lactate_max_combined'] = data['d1_lactate_max_new']