# User Study on Interpretability - Tutorial

#### Thank you for participating in our study! 
 
The study is structured as follows:
1. **Tutorial**: Overview of background concepts that will be used in the rest of the study.
2. **Setup Description**: Presentation of the dataset used, descriptive statistics, etc. 
3. **Main Component**: Using the interpretability tool to answer questions about an ML model.
4. **Follow-Up**: Questionnaire and interview. 

Please keep this tutorial open and handy while you complete the study. **You're welcome (and encouraged!) to refer back to it at any point.**


## Concepts covered in this tutorial

1. Background
    * Odds
    * Odds-Ratios
2. Weight of Evidence (focus of the study)
    * Definition
    * Binary vs. Multiclass
    * Grouping Features
    * Sequential Explanations
    * Example
    
---

## Odds and Odd Ratios

The study will be focused on machine learning classifiers, so we will introduce odds and related concepts using an example from classification. 

Suppose we are trying to predict whether a person has the flu ($Y=1$ if they do, $Y=0$ if they don't) based on binary indicators $X$ of symptoms (e.g., $X_{\text{cough}}=1$ if they person has a cough, and $0$ otherwise). 

Throughout this section we will use "being sick" and "having the flu" interchangeably for $Y=1$, and "being healthy" and "not having the flu" for $Y=0$.

Let's suppose the overall probability of being sick is $P(Y=1)=0.2$. Visually:

<img align="center" width="900" height="200" src="./images_tutorial/Ball_Diagrams/Ball_Diagrams.001.tight.jpeg">

---

### Odds

The *odds* are an alternative way to express probabilities, very commonly used in the context of betting markets.

The odds of a person having the flu are defined as:
$$ O(Y=1) = \frac{P(Y=1)}{P(Y=0)} $$ 
For example, if $P(Y=1) = 0.2$ then $O(Y=1)=\frac{0.2}{0.8}=\frac{1}{4}$, so we say the odds of being sick are one-to-four, which we write as 1:4. 

The odds of being healthy are simply reversed (i.e., the reciprocal of $O(Y=1)$):
$$ O(Y=0) = \frac{P(Y=0)}{P(Y=1)} $$ 
If, as before, $P(Y=0)=0.8$ then $O(Y=0)=4$, so the odds of being healthy are 4:1.


<img align="center" width="900" height="200" src="./images_tutorial/Ball_Diagrams/Ball_Diagrams.002.tight.jpeg">

---


### Odds Ratio
Now, in order to express the association between some symptom(s) (e.g., $X_{\text{cough}}$) and the outcome $Y$, we can use the *odds ratio*: 
$$ \text{OR}(Y=1 : X_{\text{cough}}=1) = \frac{O(Y=1 \mid X_{\text{cough}}=1)}{O(Y=1)}$$
which tells us how the odds of having the flu change when conditioning on a symptom ($X_{\text{cough}}=1$, here).

For example, suppose the odds of having the flu ($Y=1$) are 2:3 if a person has a cough (compared to overall 1:4 odds). Then the odds ratio is $\frac{8}{3}$:

<img align="center" width="450" height="100" src="./images_tutorial/Ball_Diagrams/Ball_Diagrams.004.half.png">

which can be interpreted as:

* **Interpretation 1:** "The odds of *having the flu* roughly triple when the person has a cough" 

Again, odds ratios of 'healthy' vs 'sick' are reciprocals of each other:
<img align="center" width="900" height="200" src="./images_tutorial/Ball_Diagrams/Ball_Diagrams.004.tight.png">

**Crucially**, the odds ratio can also be written in another (mathematically equivalent, by Bayes' rule) way:

$$ \text{OR}(Y=1 : X_{\text{cough}}=1) = \frac{P(X_{\text{cough}} =1 \mid Y=1)}{P(X_{\text{cough}}=1 \mid Y=0)} $$

Using this for the example above, $\text{OR}(Y=1 : X_{\text{cough}}=1) = \frac{8}{3}$ can also be interpreted as:

* **Interpretation 2:** "A person is almost three times as likely to *have a cough* if they have the flu compared to when they are healthy"

When considering multiple conditioning variables (symptons), it is useful to take the logarithm of odds and odds ratios, so that they can be **added** (instead of multiplied). For example, if $X_{\text{cough}}$ and $X_{\text{fever}}$ are independent, then:

$$\log  \text{OR}(Y=1 : X_{\text{cough}} , X_{\text{sneezing}} )  = \log  \text{OR}(Y=1 : X_{\text{cough}} )  +  \log  \text{OR}(Y=1 : X_{\text{sneezing}} )$$

This makes it easy to understand how individual features affect the outcome $Y$.

***

### Checkpoint Questions

Suppose that for a given patient:
* $\log  \text{OR}(Y=1 : X_{\text{cough}}=0 ) = -1$
* $\log  \text{OR}(Y=1 : X_{\text{headache}}=1 ) = 0$
* $\log \text{OR}(Y=1 : X_{\text{chills}} =1)=2$

**Q1:** How would you interpret these facts?  

**Q2:** All symptoms conisdered, what is the (log) odds ratio of this person being infected? How would you interpret this? (either of the two **interpretations** is ok)

*** 


## Weight Of Evidence

<img align="center" width="500" height="500" src="./images_tutorial/woe_balance.png">

### Definition

The log of the odds ratio is sometimes referred to as ***the Weight of Evidence*** (WoE for short). In a nutshell, the Weight of Evidence is used to quantify feature importance, and it attempts to answer the question:
> "does the *evidence* speak in favor or against a certain *hypothesis*?"

In this study we will use WoE to 'explain' predictions of machine learning classifiers, so the 'evidence' will be the input features (e.g., symptoms), the 'hypothesis' will usually be the model's prediction (e.g., Y='sick') and the question we seek to answer is:

> "**according to the model**, how much does the input speak in favor of a certain prediction"

Since the WoE is nothing but the log odds ratio, recall that it can be expressed (and interpreted) in two different ways:
$$ \text{woe}(Y=1 : X_{\text{cough}}=1) \quad \overset{\text{def.}}{=} \quad \log \text{OR}(Y=1 : X_{\text{cough}}=1) = \log \frac{O(Y=1 \mid X_{\text{cough}}=1)}{O(Y=1)} =  \log \frac{P(X_{\text{cough}}=1 \mid Y=1)}{P(X_{\text{cough}}=1 \mid Y=0)}$$

Colloquially, using the language of the Weight of Evidence literature, we would say:
* $\text{woe}(Y=1 : X_{\text{cough}}=1) > 0 \qquad \Longrightarrow \qquad$ the presence of cough ***speaks in favor*** of this patient having the disease ($Y=1$)


* $\text{woe}(Y=1 : X_{\text{cough}}=1) < 0  \qquad \Longrightarrow \qquad$ the presence of cough ***speaks against*** this patient having the disease ($Y=1$)

But formally, we know we can interpret these quantities formally using either of the **two interpretations** of the odds ratio provided before.

***



### Binary vs. Multiclass

Since $Y$ is binary in our example so far, there's only two possible hypotheses: 
* $Y=1$: the patient has the flu (let's say this is the 'primary hypothesis') or
* $Y=0$: the patient does thot have the flu (the 'alternative hypothesis').  

Since there are only two hypotheses, evidence *against* one of these is evidence *in favor* of the other.

But what if we had multi-class classification problem? E.g., suppose the model must instead predict one of $K$ possible conditions, and that for a given patient the model predicts $Y=$'flu', which we take as the primary hypothesis. The alternative hypothesis $h'$ could be:
* All the other possible diseases, e.g., $h': \;\; Y \in \{\text{'cold'}, \text{'strep'}, \text{'allergies'},\dots\}$
* Another specific disease, e.g., $h': \;\; Y=\text{'cold'}$
* Any other subset of diseases, e.g., 'viral' or 'bacterial'

Each of these might shed light on different aspects of the prediction.

[//]: <> "how much more likely is a given hypothesis h over an alternative h' given evidence e"
[//]: <> "we denote this by $\text{woe}(h/h' : e)$"
[//]: <> "If this quantity is positive, we say that the evidence $e$ speaks in favor of $h$ (and against $h'$). If it's negative, then the roles reverse: $e$ speaks against $h$ (and in favor of $h'$)."

---
 
### Interpreting and Visualizing WoE Scores

This table provides a simple rule of thumb to decide on how "significant" the WoE is:

|  Weight of Evidence Score  | Odds in favor of hypothesis           | Strength of Evidence|
| ------------- |:-------------:| -----:|
| $ < 1.15$      | less than 3:1 | Not worth mentioning |
| $1.15$ to $2.3$  | between 3:1 and 10:1      |  Substantial |
| $2.3$ to $4.61$ | between 10:1 and 100:1      |    Strong |
| $>4.61$  | more than 100:1     |  Decisive |

Remember: a negative WoE indicates that the evidence speaks in favor of the alternative hypothesis $h'$, so we can use this table to now quantify strength of evidence *against* $h$ (in *favor* of $h'$).

[//]: <> "according to the model, how much more likely is class A over B given than the input is X"
[//]: <> "As before, a postitive WoE score indicates that the features $X$ make the reference class (A) more likely than the alternative one (B)."

Here's what a full example might look like:

<img align="center" width="900" height="900" src="./images_tutorial/Simple_WoE_Diagram/Simple_WoE_Diagram.001.png">

---

### WoE of Individual Features and Feature Groups

<!--- (<img align="center" width="500" height="500" src="./images_tutorial/dendrogram.png">) --->

In the plot above, we showed the WoE score of each feature. But when the number of features is large, and there is a meaningful way to group them, it is often convenient to show WoE scores **aggregated by group** of features.

For our running example, a sensible grouping of the six symptoms would be:  
* 'respiratory' (cough, dispnea)
*  overall 'body' feeling (aches, weakness)
* 'temperature' (chills, fever). 

In that case, we could instead display:

<img align="center" width="900" height="900" src="./images_tutorial/Simple_WoE_Diagram/Simple_WoE_Diagram.002.png">

which might let us quickly realize that the most decisive factors supporting this prediction are respiratory.

---

### Sequential Explanations

So far we have shown 'one-shot' explanations: the WoE of the predicted class vs all other classes. But when there's multiple classes, it is sometimes useful to **break down** the explantion into various 'steps'. 

For our diagnosis example, suppose the model predicts 'flu'. It might be illustrative to understand:
1. What evidence points to *viral diseases* ('flu', 'avian flu', etc.) instead of *bacterial* ones ('strep', etc).
2. What evidence singles out common 'flu' over other viral diseases.

For this purpose, we can use the Weight of Evidence iteratively with increasingly refined hypotheses: 

First, we produce an explanation for why the model would predict 'viral' instead of 'bacterial':

<img align="center" width="900" height="900" src="./images_tutorial/Simple_WoE_Diagram/Simple_WoE_Diagram.003.step1.png">

Note that the total WoE does favor 'viral'. Next, we produce an explanation for why the model predicted 'flu' an not any other label in the 'viral' class:

<img align="center" width="900" height="900" src="./images_tutorial/Simple_WoE_Diagram/Simple_WoE_Diagram.003.step2.png">

Of course, we could group the diseases in some other way (e.g., severe vs. mild, contagious vs. non-contagious, etc), leading to different WoE sequences.


---

### Prior and Posterior Probabilities

The prior class probabilities are key towards understanding the predictions of a model, especially with unbalanced datasets.

Understanding how these interact with the input when the model makes a prediction is simple with the Weight-of-Evidence, thanks to the identity:

$$ \underbrace{\log \frac{P(Y=1 \mid X_1, X_2, \dots)}{P(Y=0 \mid X_1, X_2, \dots)}}_{\text{Posterior log-odds (i.e. odds of predictions)}} \quad = \quad \underbrace{\log\frac{P(Y=1)}{P(Y=0)}}_{\text{Prior log-odds}} \quad + \quad\underbrace{\log \frac{P(X_1 \mid Y=1)}{P(X_1 \mid Y=0)} \quad+ \quad\log \frac{P(X_2 \mid Y=1)}{P(X_2 \mid Y=0)} \quad + \cdots}_{\textit{Weight of evidence scores}} $$

Therefore, adding up all the WoE scores plus the log priors, we (approximately) recover the posterior probabilities that the model outputs.

So, to take advantage of this, our Tool will display the prior log-odds at every step. 

Note that these can be negative even for the most likely class, if there's more than two classes and none of them has prior probability $>0.5$, e.g., in this example no class has prior probability $>0.5$:

<img align="center" width="900" height="900" src="./images_tutorial/priors.png?modified=123">

but the plot on the right, which instead uses 'chance' (1/6 for 6-way classification), makes it immediately clear which classes are a priori more likely.

---

### A Real Example

Consider the following example. A patient has fever and cough, but no other symptoms, i.e.,
$$ X_{\text{fever}} =1, X_{\text{cough}}=1, \text{  and  } X_i=0 \text{ for every other symptom}$$

and the machine learning model predicts $Y = \text{'flu'}$.

The following is plot produced by our WoE-Explainer tool for this example:

<img align="center" width="900" height="900" src="./images_tutorial/example_expl.png?modified=12345678">

Note that:
* They features are now displayed vertically (instead of horizontally), **but the meaning and interpretation remains the same as before**.
* As before, blue and red bars denote positive and negative weight-of-evidence, respectively.
* The shade of the bars encodes the degree significance of WoE according to the table above.

---

### Checkpoint Questions
Based on the WoE explanation above, please answer the following questions:

**Q3:** According to this classifier, does having a fever increase or decrease the odds of having a flu? What about a cough?

**Q4:** Do you think the prediction would change if now $X_{\text{fever}}=0$? Why/why not?


<!---
From this plot, we can draw the following conclusions:
* The prior log-odds are positive, i.e., absent other information the model is more likely to predict than not. 
* The values of the `econ` and `usage` attributes provide substantial evidence in favor of class A 
* The values of the `demo` and `safety` attributes provide substantial evidence against class A (equiv., in favor of class B)
* The rest of the attributes dont provide substantial evidence 
* Despite opposing effects, the total evidence **in favor** of A is stronger that that **against** it (think about stacking the positive and negative bars together), so the model predicts class A
--->


--- 
## Key Takeaways:

The Weight-of-Evidence (WoE) ... 
1. Helps us answer: "which features speak in favor/against the prediction of the model?"
2. Is log of odd ratio, **additive** over features
3. Interpretation: $ \qquad
  \text{woe}(Y=c\mid X) \quad \left. \begin{cases} > 0 \\ <0 \end{cases}\right\} 
  \quad \Longrightarrow \quad  X \quad \text{'speaks'} \quad
  \left.
  \begin{cases}
    \text{in favor of} \\
    \text{against}
  \end{cases}  
  \right\} 
  \text{ class } c
$
4. Can be computed for **individual** features or **groups** of features
5. Can be **one-shot** (predicted class vs 'rest') or **sequential** (compare to subsets of classes)

---