# Interpretability

One of the reasons that linear regression and variants are so popular is because they are *interpretable*. When you see the equation for the regression, you can immediately see how the variables are interacting and be able to think through how the model works.

In many applications, this is a really important feature to have. If your model is being used a decision support tool, it is important that there is trust in the model, or else it may not get used.

### What is interpretability?

Many machine learning models, as they get more complicated, become less interpretable. Modern neural networks can contain millions of parameters, and it is infeasible to trace the computations as they happen. 

People want interpretable models, but it is difficult to quantify what that actually means.

> The demand for interpretability arises when
there is a mismatch between the formal objectives of supervised learning (test set predictive performance) and the real
world costs in a deployment setting.

**Lipton 2016 *The Mythos of Model Interpretability***

![](./assets/blackbox.png)

## Trust:

Thought experiment 1:
    
    A. Doctors can diagnose a particular condition with 90% accuracy and can tell you why
    B. A black box model can diagnose it with 92% accuracy

Thought experiment 2:
    
    A. Doctors can diagnose a particular condition with 70% accuracy and can tell you why
    B. A black box model can diagnose it with 72% accuracy

Thought experiment 3:
    
    A. Doctors can diagnose a particular condition with 70% accuracy and can tell you why
    B. A black box model can diagnose it with 80% accuracy

If the model gets examples wrong that a human would normally get right, perhaps human supervision is warranted. Is predictive accuracy the only thing that we care about?

## Transferability

A model created by Caruana et al(2015)  
 * http://people.dbmi.columbia.edu/noemie/papers/15kdd.pdf 
 
showed that for Pneumonia patients, patients that had Asthma were at *lower* risk than those without asthma. This was due to the fact that people with asthma were directly admitted to intensive care units and so received more aggressive treatment. Therefore, it matters the context with which the data was gathered. 

## Potential properties of interpretable models

### Transparency

Informally, this is the opposite of blackbox-ness

Three ways that a model can be transparent:
    1. *simulatability* Can a human, in a reasonable time frame, step through the entire model?
    2. *decomposability* Do each of the features make sense? Are they intuitive? Do each of the coefficients have meaning?
    3. *algorithmic transparency* Do we understand how the algorithms work that are used to fit the models? Linear Regression we know has a unique solution and converges, Deep neural networks we are unsure of.

### Post-hoc interpretability

Can we explain the output of a model after it has already been trained? Through text, pictures, etc.

Lam et al 2018 [Automated Detection of Diabetic Retinopathy using Deep Learning](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961805/)

![](./assets/retinal.png)

![](./assets/explanation.png)

## Chest XRAY
https://medium.com/@jrzech/what-are-radiological-deep-learning-models-actually-learning-f97a546c5b98

![](./assets/cardiomegaly1.png)

![](./assets/placement.png)

![](./assets/portable.png)

Free book on interpretable machine learning: https://christophm.github.io/interpretable-ml-book/

# Clinical Risk Scores

Clinical risk scores are scores that are derived from patient characteristics, lab values, etc. that can be used to predict an adverse outcome in a patient. If this sounds like machine learning, it often is! However, many of these models, which are still in use today, often did not have that much data to back them up. 

https://www.mdcalc.com/

#### Examples:

[Clinical Score for Predicting Recurrence After Hepatic Resection for Metastatic Colorectal Cancer](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1420876/)

Published: 1999

Sample Size: 1001 patients
(metastatic colorectal cancer)

1 point for each factor: 
 * node-positive primary
 * disease-free interval from primary to metastases < 12 months
 * number of heptatic tumors > 1
 * largest hepatic tumor > 35cm
 * carcinoembryonic antigen level

![](./assets/metastatic_crc.png)

[$CHADS_2$](https://www.ncbi.nlm.nih.gov/pubmed/11401607) Risk for Atrial Fibrillation Stroke

Published: 2001

Sample: 1733 patients, aged 65 to 95 years

2 points for:
 * history of stroke or TIA

1 point for:
 * recent CHF
 * hypertension
 * age > 75
 * Diabetes

![](./assets/chads2.png)

[MELD: Model for End-Stage Renal Disease](https://www.ncbi.nlm.nih.gov/pubmed/11172350)
Published: 2001

Sample: ~260 patients

`9.57 * log(creatinine) + 3.78 * log(total bilirubin) + 11.2 * log(INR) + 6.43`
