## Professor Smith's Disease X theory

Professor Smith is a researcher at Imaginary Hospital. She has noticed that some of her patients are contracting a mysterious disease. She has called the disease 'X' for now.

She has collected data from some of her patients and used it as an input for an AI model that predicts whether the patient has the disease.

First, let's load the data that she has collected

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

%pip install openpyxl
%pip install imbalanced-learn
%pip install scikit-learn

In [None]:
patient_data = pd.read_excel('subject_data_FIFAI.xlsx')
print(patient_data)

# # If reading excel failes, try:
# patient_data = pd.read_csv('subject_data_FIFAI.csv')
# print(patient_data)

#### The spreadsheet contains the following data about the patients:

- race (Black or White)
- sex (Male or Female), 
- age (years)
- Body Mass Index (BMI), 
- resting heart rate (HR)
- the real or ground truth of whether the patient has Disease X (GT)
- the prediction from an AI model

For disease labels, 0 means the patient doesn't have the disease and 1 means they do have the disease.

#### Let's calculate how accurate the predictions from the model are

$$
Accuracy = \frac{TP + TN}{P + N} = \frac{\text{Number of correct predictions}}{\text{Total}}
$$

In [None]:

# This is the number of correct predictions
number_correct_predictions = len(np.where(patient_data.GT.values == patient_data.Predicted.values)[0])

# Find the total number of patients
total = len(patient_data)

# Use the number of correct predictions and the total number of patients to find the accuracy
accuracy = number_correct_predictions/ total

print('The overall accuracy is {}%'.format(accuracy*100))


<div class="alert alert-block alert-info">
<b>What do you think of the overall prediction accuracy?</b> 
</div>

#### We can also calculate the overall sensitivity and specificity for the predictions using these equations:

$$
Sensitivity = \frac{TP}{TP + FN}
$$

$$
Specificity = \frac{TN}{TN + FP}
$$

In [None]:
# Using TP as an example, find the TN, FP and FN

TP= len(np.where((patient_data.GT.values == 1) & (patient_data.Predicted.values==1))[0])
TN= len(np.where((patient_data.GT.values == 0) & (patient_data.Predicted.values==0))[0])
FP= len(np.where((patient_data.GT.values == 0) & (patient_data.Predicted.values==1))[0])
FN= len(np.where((patient_data.GT.values == 1) & (patient_data.Predicted.values==0))[0])


# Use the TP, FN, TN, and FP to find the sensitivity and specificity

sensitivity = TP/(TP+FN)
specificity = TN/(TN+FP)

print('The overall sensitivity is {}%'.format(sensitivity*100))
print('The overall specificity is {}%'.format(specificity*100))


<div class="alert alert-block alert-info">
<b>How do you interpret these results? </b>
</div>

### Professor Smith has noticed that some of her patients catch the disease more frequently than others.

Now let's work out what kinds of patients we have so we can investigate. Find the number of black, white, male and female patients in the data.

In [None]:
no_subjects_race = patient_data['Race'].value_counts()
print(no_subjects_race)

no_subjects_sex = patient_data['Sex'].value_counts()
print(no_subjects_sex)

#### Professor Smith thinks the difference in her patients getting the disease may be because some of her patients are healthier than others. 

The difference may be in the <span style='color:Blue'> male and female  </span>subjects. Let's calculate the accuracy for each of those groups.

In [None]:
# Using the female subjects as an example, find the number of male subjects

female_subjects = patient_data.where(patient_data['Sex']=='F').dropna()
male_subjects = patient_data.where(patient_data['Sex']=='M').dropna()


# This is very similar to the first question! Find the number of correct predictions (where the GT is equal to)
# the predicted value
correct_predictions_female = len(np.where(female_subjects.GT.values == female_subjects.Predicted.values)[0])

accuracy = correct_predictions_female/ len(female_subjects)

print('The accuracy for females is {:.1f}%'.format(accuracy*100))


# Do the same for the male subjects
correct_predictions_male = len(np.where(male_subjects.GT.values == male_subjects.Predicted.values)[0])

accuracy = correct_predictions_male/ len(male_subjects)

print('The accuracy for males is {:.1f}%'.format(accuracy*100))



#### The difference may instead be between <span style='color:Blue'> old and young </span> patients. Let's consider the patients over 50 the old patients and those below 50 the young patients.

In [None]:
# Find the old and young subjects
old_subjects = patient_data.where(patient_data['Age']>50).dropna()
young_subjects = patient_data.where(patient_data['Age']<50).dropna()

# Find the number of correct predictions for the old subjects
correct_predictions_old = len(np.where(old_subjects.GT.values == old_subjects.Predicted.values)[0])

accuracy = correct_predictions_old/ len(old_subjects)

print('The accuracy for old subjects is {:.1f}%'.format(accuracy*100))


# Find the number of correct predictions for the young subjects
correct_predictions_young = len(np.where(young_subjects.GT.values == young_subjects.Predicted.values)[0])

accuracy = correct_predictions_young/ len(young_subjects)

print('The accuracy for young subjects is {:.1f}%'.format(accuracy*100))


#### The difference may instead be between the <span style='color:Blue'> Black and White  </span> subjects. Calculate the accuracy for those groups

In [None]:
# Find the black and white subjects
black_subjects = patient_data.where(patient_data['Race']=='B').dropna()
white_subjects = patient_data.where(patient_data['Race']=='W').dropna()

# Find the number of correct predictions for the black subjects
correct_predictions_black = len(np.where(black_subjects.GT.values == black_subjects.Predicted.values)[0])

accuracy = correct_predictions_black/ len(black_subjects)

print('The accuracy for black subjects is {:.1f}%'.format(accuracy*100))


# Find the number of correct predictions for the white subjects
correct_predictions_white = len(np.where(white_subjects.GT.values == white_subjects.Predicted.values)[0])

accuracy = correct_predictions_white/ len(white_subjects)

print('The accuracy for white subjects is {:.1f}%'.format(accuracy*100))

<div class="alert alert-block alert-info">
<b>How do you interpret these differences in accuracy for the groups? </b>
</div>

<div class="alert alert-block alert-warning">
<b>What advice would you give to Professor X to improve the predictions? </b>
</div>

We can also calculate different <span style='color:green'> **fairness metrics** </span> for the subgroups. Below are the equations for different fairness metrics covered in the slides:

1. Equal opportunity: equal change of positive classes being assigned positive predictions for each group
$$
Sensitivity = TPR = \frac{TP}{TP + FN}
$$


2. Equalised odds: equal true positive and false positive rates
$$
Sensitivity = TPR = \frac{TP}{TP + FN}
$$

$$
FPR = 1 - Specificity = 1- \frac{TN}{TN + FP}
$$


3. Demographic parity: equal change of positive predictions for each group
$$
PPR = \frac{TP +FP}{TP + FP + TN + FN}
$$

Can you calculate and compare each of these metrics for the black and white subgroups?

In [None]:
# Find the TP, TN, FP and FN for each of the subgroups
TP_black = len(np.where((black_subjects.GT.values == 1) & (black_subjects.Predicted.values==1))[0])
TN_black = len(np.where((black_subjects.GT.values == 0) & (black_subjects.Predicted.values==0))[0])
FP_black = len(np.where((black_subjects.GT.values == 0) & (black_subjects.Predicted.values==1))[0])
FN_black = len(np.where((black_subjects.GT.values == 1) & (black_subjects.Predicted.values==0))[0])

# Use the TP, TN, FP and FN to calculate the fairness metrics for the black subjects
sensitivity_black = TP_black/(TP_black+FN_black)
specificity_black = TN_black/(TN_black+FP_black)
FPR_black = 1 - specificity_black
PPR_black = (TP_black + FP_black)/(TP_black+FN_black+TN_black+FP_black)

# Repeat for the white subjects
TP_white = len(np.where((white_subjects.GT.values == 1) & (white_subjects.Predicted.values==1))[0])
TN_white = len(np.where((white_subjects.GT.values == 0) & (white_subjects.Predicted.values==0))[0])
FP_white = len(np.where((white_subjects.GT.values == 0) & (white_subjects.Predicted.values==1))[0])
FN_white = len(np.where((white_subjects.GT.values == 1) & (white_subjects.Predicted.values==0))[0])

# Use the TP, TN, FP and FN to calculate the fairness metrics for the white subjects
sensitivity_white = TP_white /(TP_white +FN_white)
specificity_white  = TN_white /(TN_white +FP_white )
FPR_white  = 1 - specificity_white
PPR_white  = (TP_white  + FP_white )/(TP_white +FN_white +TN_white +FP_white)

print('Equal opportunity: Sensitivity for black subjects is {:.1f}% and {:.1f}% for white subjects'.format(sensitivity_black*100, sensitivity_white*100))
print('Equal odds: FPR for black subjects is {:.1f}% and {:.1f}% for white subjects'.format(FPR_black*100, FPR_white*100))
print('Demographic parity: PPR for black subjects is {:.1f}% and {:.1f}% for white subjects'.format(PPR_black*100, PPR_white*100))




<details>

<summary>Bonus exercise: how healthy are the patients?</summary>

### How healthy are these patients?

It might be useful to find out whether the groups have different levels of health. Let's find the mean values of the BMI and HR for the black, white, male and female subjects.

    The average BMI can be calculated using the Numpy mean function:
```python
average = np.mean(array)
```

Your code should look something like this:
```python
    
BMI_black = np.mean(...)
BMI_white = np.mean(...)
BMI_female = np.mean(...)
BMI_male = np.mean(...)

print('The average BMI for black subjects is {}'.format(...)
print('The average BMI for white subjects is {}'.format(...)
print('The average BMI for male subjects is {}'.format(...)
print('The average BMI for female subjects is {}'.format(...)
```
</details>

### Can we improve the model?

Professor Smith wants to develop a more accurate method of diagnosing the disease. She wants to test another AI model to see whether it can predict if the patients have disease X. 
    
Below is some code to run a model called a Ridge Classifier (you don't need to know the details but you can read about it [here](https://www.geeksforgeeks.org/ridge-classifier/) if you're interested). 
    
First, let's split the data into training and testing set using [scikit-learn's train_test_split function](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). You should split the data so that the proportion of Black and White subjects remains the same. 

In [None]:
from sklearn.linear_model import RidgeClassifier
from sklearn.model_selection import train_test_split

# Split the data using the train_test_split function
X_train, X_test, y_train, y_test = train_test_split(patient_data, patient_data['GT'], test_size=0.2, shuffle=True, stratify = patient_data['Race'])

print('\nTraining set:\n {}'.format(X_train['Race'].value_counts()))
print('\nTest set\n {}'.format(X_test['Race'].value_counts()))

print('\nThere are {} disease positive subjects and {} disease negative in the training set'.format(len(np.where(y_train == 1)[0]), len(np.where(y_train == 0)[0])))
print('\nThere are {} disease positive subjects and {} disease negative in the test set'.format(len(np.where(y_test == 1)[0]), len(np.where(y_test == 0)[0])))


####  Now let's train the model! We can use any scikit-learn classifer but let's start with the Ridge Classifier

Make sure you have included all the measures of health (Age, BMI and HR)

In [None]:
#Train the model
model = RidgeClassifier(alpha=0.1).fit(X_train[['Age', 'HR', 'BMI']], y_train)

#Make some predictions based on the test set
predictions = model.predict(X_test[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions = len(np.where(y_test == predictions)[0])

#Calculate the accuracy
accuracy = correct_predictions/ len(X_test)

print('The overall accuracy is {}%'.format(accuracy*100))

#### Let's compare accuracy for Black and White subjects

In [None]:
# You can use the black and white subjects 

black_subjects = X_test.where(X_test['Race']=='B').dropna()
white_subjects = X_test.where(X_test['Race']=='W').dropna()

#Make some predictions based on the Black test data
predictions_black = model.predict(black_subjects[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions_black = len(np.where(black_subjects['GT'] == predictions_black)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions_black/ len(black_subjects)

print('The overall accuracy for Black subjects is {}%'.format(accuracy*100))

#Make some predictions based on the White test data
predictions_white = model.predict(white_subjects[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions_white  = len(np.where(white_subjects['GT'] == predictions_white )[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions_white / len(white_subjects)

print('The overall accuracy for White subjects is {}%'.format(accuracy*100))

####  We should also test the model on an external dataset. This will ensure that the model is not overfitting to the training data.

In [None]:
# Test on external test set
external_test = pd.read_excel('subject_data_external.xlsx')
# external_test = pd.read_csv('subject_data_external.csv')

#Make some predictions based on the patient data
external_predictions = model.predict(external_test[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions = len(np.where(external_test['GT'] == external_predictions)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions/ len(external_test)

print('The overall accuracy is {}%'.format(accuracy*100))

#### Let's compare accuracy for Black and White subjects

In [None]:
black_subjects_external = external_test.where(external_test['Race']=='B').dropna()
white_subjects_external = external_test.where(external_test['Race']=='W').dropna()

#Make some predictions based on the patient data
external_predictions_black = model.predict(black_subjects_external[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions_black = len(np.where(black_subjects_external['GT'] == external_predictions_black)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions_black/ len(black_subjects_external)

print('The overall accuracy for Black subjects is {}%'.format(accuracy*100))

#Make some predictions based on the patient data
external_predictions_white = model.predict(white_subjects_external[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions_white  = len(np.where(white_subjects_external['GT'] == external_predictions_white)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions_white / len(white_subjects_external)

print('The overall accuracy for White subjects is {}%'.format(accuracy*100))

<div class="alert alert-block alert-success">
<b>What do you think about the accuracy of the test set and external test set?</b>
</div>

####  Balancing the data

Let's try and balance the number of positive and negative disease classes in the training dataset. We can do this by using the SMOTE method. We can use the `imblearn` package to do this

You should balance the datasets by race, not disease label

In [None]:

from imblearn.over_sampling import SMOTE
from collections import Counter

# Create an instance of SMOTE
oversample = SMOTE()

# Use the method to oversample the black subjects
# HINT: you may need to use the .map() function from pandas to convert the race labels to integers
X, y = oversample.fit_resample(patient_data.drop(columns=['Race', 'Sex']), patient_data['Race'].map({'W': 1, 'B': 0}) )

counter = Counter(y)
print(f'White subjects: {counter[0]} and Black subjects: {counter[1]}')

# Split the data into training and testing sets again
X_train, X_test, y_train, y_test = train_test_split(X[['Age', 'HR', 'BMI']], X['GT'], test_size=0.2, shuffle=True, stratify = y)

print('\nThere are {} disease positive subjects and {} disease negative in the training set'.format(len(np.where(y_train == 1)[0]), len(np.where(y_train == 0)[0])))
print('\nThere are {} disease positive subjects and {} disease negative in the test set'.format(len(np.where(y_test == 1)[0]), len(np.where(y_test == 0)[0])))



####  Let's retrain the model to see how balancing the proportions of races in the dataset has affected performance.

In [None]:
# Re-train the model
model = RidgeClassifier(alpha=0.1).fit(X_train, y_train)

#Make some predictions based on original test set
predictions = model.predict(X_test)

#Find the correct predictions
correct_predictions = len(np.where(y_test == predictions)[0])

#Calculate the accuracy
accuracy = correct_predictions/ len(X_test)

print('The overall accuracy is {}%'.format(accuracy*100))


Test on the external test set

In [None]:

#Make some predictions based on the external dataset
predictions = model.predict(external_test[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions = len(np.where(external_test['GT'] == predictions)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions/ len(external_test)

print('The overall accuracy is {}%'.format(accuracy*100))

#### Let's compare accuracy for Black and White subjects for the external data again

In [None]:
black_subjects_external = external_test.where(external_test['Race']=='B').dropna()
white_subjects_external = external_test.where(external_test['Race']=='W').dropna()

#Make some predictions based on the patient data
predictions_black = model.predict(black_subjects_external[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions_black = len(np.where(black_subjects_external['GT'] == predictions_black)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions_black/ len(black_subjects_external)

print('The overall accuracy for Black subjects is {}%'.format(accuracy*100))

#Make some predictions based on the patient data
predictions = model.predict(white_subjects_external[['Age', 'HR', 'BMI']])

#Find the correct predictions
correct_predictions = len(np.where(white_subjects_external['GT'] == predictions)[0])
                                         
#Calculate the accuracy
accuracy = correct_predictions/ len(white_subjects_external)

print('The overall accuracy for White subjects is {}%'.format(accuracy*100))