# ETHICS & EXPLAINABLE AI - DEEP DIVE INTO ML EXPLAINABILITY IN HEALTH

# CONTENTS
- ## Background
- ## The Problem
- ## Data Collection 
- ## Model Training
- ## Conclusion

# BACKGROUND

# Good Medical Practice at its core requires a demonstration of respect for human life by every doctor. 
The 4 principles of biomedical ethics (1) are:- 
1. **Beneficience** - to do good. To act in the best interests of the patient
2. **Non-maleficience** - to do no harm 
3. **Autonomy** - the patient has the right to make their own decisions (to accept, refuse or choose a treatment proposed by the clinician) assuming patient capacity.
4. **Justice** - patients must be treated fairly with respect to allocation of scarce resources

These core principles as the foundation of ethical contexts within healthcare, have had a profound impact on physician decision making. They are also helpful as initial guiding principles regarding the development and deployment of AI technologies within health. 

As clinicians, we are individually responsible and accountable for our practice and if required, must justify our decisions and actions. 

We assign values against various things (benefit vs harm in patient lives, people's time). Making the right decision in the best interests of the patient is therefore not always quite so clear cut. 

ML in Healthcare 
ML is a set of techniques used by computers to learn from different types of data, identify patterns and make or improve predictions based on the data. Black box models (models that cannot be understood by looking at their parameters) are increasingly being used within healthcare. This is in order to more accurately decipher the complexities which exist within health-specific data and make accurate predictions.

This project aims to provide tools which can be used to decipher black box models (which include deep learning algorithms) and thus ensure clinicians are supported and augmented by such tools without undue influence.

Explainability within this context is therefore the degree to which a human can consistently predict the model's result. This will occur if the model's behaviour is easily understood. This is essential for trust in this proposed human-machine partnership, as well as for patient empowerment for joint decision making regarding care provision that is in their best interests.

# THE PROBLEM

The focal point of the project was an exploration of some of the methods currently in use to decipher such black box models. These methods include Feature Importance Plots, Partial Dependence plots and Shap Values. 

The problem was framed as 

- **A Supervised classification problem** with a goal of predicting against the target variable (patient outcome) which was present in the original data. 3 algorithms (Decision Tree, Random Forest and a Deep Neural Network) were trained to perform this task
- **An Explainability problem** with the goal of further deciphering decisions made by the black box model. 3 global model-agnostic explainability techniques were used (Feature Importance Plots (FIP), Partial Dependence plots (PDP) and Shap values. Both FIP and PDP's describe average model behaviour while SHAP values is a method which explains individual predictions from the black box model. 

# DATA COLLECTION

Datasets used in this project are publicly accessible and obtained from https://data.world/deviramanan2016/nki-breast-cancer-data
Anonymised metadata was obtained from 272 patients with breast cancer - containing genomic information, treatment options (chemotherapy, hormonal therapy and/or surgery) and finally, outcome of the patient. 

Some of the features recorded within the datasets include:- 

- **AGE** (of each patient)
- **TREATMENT TYPE** (chemotherapy, hormonal therapy and/or surgery)
- **DIAM** (tumour size)
- **POSNODES** (number of positive i.e. affected lymph nodes)
- **ANGIOINV** (number of blood vessels infiltrated by the tumour)
- **TIMERECURRENCE** (time taken for the tumour to recur post treatment)
- **GRADE** (tumour grade)
- **HISTTYPE** (histopathology tumour type)
- **EVENTDEATH** (patient outcome)

## LIMITATIONS OF THE DATASET INCLUDED:- 
- Sparsity of datasets - from 272 patients alone which limits ability of some of more powerful algorithms eg deep neural networks.
- Datasets not collected for the express purpose of this project, with a resulting mismatch between some of the more recent features strongly predictive of poor patient outcomes & those present within the dataset (inclusive of specific biomarkers)
- No information provided on data collection process to ascertain representation/generalisability of results from this cohort of patients to the general population (with the sparsity already indicating bias)

# MODEL TRAINING

The focus here was to train 3 models to predict against patient outcomes using the features present in the data. 

A Supervised (Classification) Learning technique was used to create models that described the relationship between the feature inputs and the pre-defined target variable.  

Decision tree (accuracy = 34%), Random Forest (accuracy = 90%) and Deep Neural Network (accuracy = 65%) models were all trained on the dataset. The relatively lower accuracy of the Deep Neural Network (in comparison with that of the Random Forests) is likely due to the sparse number of datasets and the use of 1 single training epoch in training the deep neural network. 

![image.png](attachment:a824b22d-627e-4434-a97f-ec57479d8d0c.png)

# 3 global model-agnostic explainability techniques were subsequently used:

## 1. Feature Importance Plot  
This plot demonstrates the relative importance of each feature input present in the data, on the target variable. 

![image.png](attachment:8ca6ee9a-228a-43e0-be19-9c80454a4f6b.png)

## 2. Partial Dependence Plot
This plot shows the marginal effect of each feature input on the dependent/target variable. "Time to recurrence" and "survival" were the 2 features identified as the most important input predictors (but were assumed to contain similar information). The "survival" time will rightly correlate with "time to recurrence", with more aggressive tumours recurring within a shorter time period and more likely than not, will correlate with shorter survival durations. The next most predictive feature input was thus selected (esr1).

![image.png](attachment:e0c835f7-bf16-40cd-9479-f8c309708ddb.png)

## 3. SHAP (SHapley Additive exPlanations) 
This is based on game theory and is used to calculate individual predictions. It calculates the contribution of each feature in the dataset to an individual outcome /prediction from the model. This is one way to obtain an in-depth look into any model's decision. I used data from one specific patient (located on row 10 of the dataframe) 

![image.png](attachment:b8d17830-f434-4439-b9da-eb440e115547.png)

## 3a. SHAP Decision Plot 
I observed impact of each variable on model decision of event outcome of "0" obtained for the patient on the 10th row. This is how the decision plot works.

![image.png](attachment:bfb9602c-9c6a-437a-aeaa-e1c3896c017f.png)

## 3b. SHAP Force Plot 
Force plots permit a visual of impact of each feature on model prediction (0 or 1) for a specific instance (for a single patient in the cohort). Graph below consists of multiple force plots, each plot explaining the prediction of an instance.

![image.png](attachment:f4ff8889-8c63-4e7c-a325-08cb4c18f714.png)

## 3c. SHAP Dependency Plot 
This shows interaction effects between features after accounting for individual feature effects. For each data instance plot of "time recurrence" on x-axis, the corresponding Shapley value is shown on the y-axis. Outcome of interaction effects between esr1 levels and time-to(tumour)-recurrence also shown.

![image.png](attachment:21b01eb2-11ea-4623-8bf8-7210d85ddb00.png)

# CONCLUSION 

Machine Learning has great potential to enable improved patient outcomes (amongst other benefits) within healthcare. Acceptance and trust from clinical decision makers will likely increase if the algorithms are developed and deployed in accordance with human values. 

Explainability of complex ML models is required for clinician trust and adoption.
Healthcare is complex with its own specific nuances which are captured within health datasets. Complex and opaque algorithms are thus increasingly required to model such complexities as exist within the data. 

It is entirely reasonable to accept recommendations from less complex but more transparent/explainable models if outcomes do not differ significantly from those of black box models. But we must not reject more complex models outright if there is an option to better understand their recommendations. 

if the objective is to augment human intelligence - we must ensure that the human stakeholders responsible for patient care provision (as well as the patients themselves) within a human-machine partnership are equipped to effectively participate in the process of care provision and improving patient outcomes.