# User Study on Interpretability - Tutorial

#### Thank you for participating in our study! 
 
The study is structured as follows:
1. Tutorial: Overview of the main concepts that will be used in the rest of the study.
2. Setup Description: Presentation of the dataset used, descriptive statistics, etc. 
3. Main Component: Using the interpretability tool to answer questions about an ML model.
4. A follow-up questionnaire and interview. 

Please keep this tutorial open and handy while you complete the study. **You're welcome (and encouraged!) to refer back to it at any point.**


## Odds, Log Odds, and Odd Ratios

Odds are an intuitive way to express chance, very commonly used in the context of betting markets.

For the purposes of this study we define odds and related concepts in terms of classification. Supose we have a contiuous variable $X\in\mathbb{R}$ and a binary one $Y \in \{0,1\}$.

In this setting, the odds of $Y$ taking value $1$ are:
$$ O(Y=1) = \frac{P(Y=1)}{P(Y=0)} $$ 

For example, if $P(Y=1)=0.8$, then $O(Y=1)=4$, and we would say that the odds of $Y=1$ are four to one, which can be written as 4:1. 

We can also use odds to express the association between $X$ and $Y$, by means of their *odds ratio*: 
$$ \text{OR}(Y=1 : X) = \frac{O(Y=1 \mid X)}{O(Y)}$$

It turns out (using Bayes rule) that this is mathematically equivalent to:
$$ \text{OR}(Y=1 : X) = \frac{P(X \mid Y=1)}{P(X \mid Y=0} $$

Therefore, we can interpret odds ratios in two complementary ways. For example, $\text{OR}(Y=1 : X=x)=2$ can be interpreted as:
* The odds of $Y=1$ double when conditioning on $X=x$
* The variable $X$ is twice as likely to take the value $x$ when $Y=1$ than when $Y=0$


It is common to work with the logarithm of the odds ratio instead, which allows us to use addition (instead of multiplication) when combining odds of various events. For example, if $X_1$ and $X_2$ are independent:

$$\log  \text{OR}(Y=1 : X_1 , X_2 )  = \log  \text{OR}(Y=1 : X_1 )  +  \log  \text{OR}(Y=1 : X_2 )$$

[//]: #  "\log  \frac{O(Y=1 \mid X_1 , X_2 )}{O(Y=1)}  = \log  \frac{O(Y=1 \mid X_1 )}{O(Y=1)}  + \log  \frac{O(Y=1 \mid X_2 )}{O(Y=1)}"


## Weight Of Evidence

#### Introduction

The Weight of Evidence (WoE for short) is a simple but fundamental concept from information theory, used to quantify variable importance. In the context of Machine Learning, we will use it to produce a score for the importance of features towards a specific model prediction.

In a nutshell, the Weight of Evidence score attempts to answer the question:
> "does the *evidence* speak in favor or against a certain *hypothesis*?"

In our running example, the 'evidence' could be the conditioning variable $X$ and the 'hypothesis' would be one of $Y=1$ or $Y=0$. 

In its most basic form, the WoE is nothing but the log odds ratio, which as discussed before, can be expressed in two different ways:
$$ \text{woe}(Y=1 : X=x) = \log \frac{O(Y=1 \mid X=x)}{O(Y=1)} = \log \frac{P(X=x \mid Y=1)}{P(X=x \mid Y=0)}$$

In the language of the Weight of Evidence literature, we say:
* $\text{woe}(Y=1 : X=x) > 0 $ means $X=x$ *speaks in favor* of $Y$ taking value 1
* $\text{woe}(Y=1 : X=x) < 0 $ means $X=x$ *speaks against* $Y$ taking value 1

Since $Y$ is binary in our example, evidence *against* $Y=1$ is evidence *in favor* of $Y=0$ (the 'alternative hypothesis').

For multi-class classification (i.e., $Y \in K = \{1,\dots,k\}$) we must specify what the alternative hypothesis $h'$ is. It could be:
* All the other possible values of $Y$, i.e., $h': ( Y \in K\setminus y )$
* Another specific value, i.e., $h': (Y=1')$
* Any other subset of values, i.e.,    $h': (Y\in S \subseteq K)$


[//]: <> "how much more likely is a given hypothesis h over an alternative h' given evidence e"
[//]: <> "we denote this by $\text{woe}(h/h' : e)$"
[//]: <> "If this quantity is positive, we say that the evidence $e$ speaks in favor of $h$ (and against $h'$). If it's negative, then the roles reverse: $e$ speaks against $h$ (and in favor of $h'$)."


This table provides a simple rule of thumb to decide on how "significant" the WoE is:

|  Weight of Evidence Score  | Odds in favor of hypothesis           | Strength of Evidence|
| ------------- |:-------------:| -----:|
| $ < 1.15$      | less than 3:1 | Not worth mentioning |
| $1.15$ to $2.3$  | between 3:1 and 10:1      |  Substantial |
| $2.3$ to $4.61$ | between 10:1 and 100:1      |    Strong |
| $>4.61$  | more than 100:1     |  Decisive |

Remember: a negative WoE indicates that the evidence speaks in favor of the alternative hypothesis $h'$, so we can use this table to now quantify strength of evidence *against* $h$ (in *favor* of $h'$).

#### WoE To Explain Classifiers

The WoE can be used to provide *explanations* for the predictions of classifiers. In this case, the hypotheses are predictions of the model (e.g., class 1) and the evidence is the input (e.g., the values of the features for that input), so the question we seek to answer is:

> "**according to the model**, how much does the input speak in favor of a certain prediction"


[//]: <> "according to the model, how much more likely is class A over B given than the input is X"
[//]: <> "As before, a postitive WoE score indicates that the features $X$ make the reference class (A) more likely than the alternative one (B)."

To make this more specific, imagine the following scenario:
* A black-box machine learning model $f$
* The data has $d$ features
* The outcome variable is categorical, with $k$ classes $\mathsf{K}=\{1,\dots,k\}$

Suppose that when fed an input $\mathbf{x} = (x_1, \dots, x_d)$, the model predicts $f(\mathbf{x}) = y_{\text{pred}}$. We're interested in understanding the influence of the input towards this prediction. Our tool uses the weight of evidence for this purpose.

As mentioned above, we have some freedom in choosing the hypothesis and the alternative hypothesis. We will always mention specifically what these are. Common scenarios are:


Usually, we will be interested in the evidence in favor of $y_{\text{pred}}$. For multi-class classification, there is some flexibility in choosing the alternate hypothesis. And obvious choice is to consider all other classes $\mathsf{K} \setminus y_{\text{pred}}$, so we would compute:
$$\text{woe}( y_{\text{pred}} \;/\; \mathsf{K} \setminus y_{\text{pred}} : \mathbf{x})$$
But we might instead (or in addition) be interested in contrasting $y_{\text{pred}}$ against a specific alternative class (e.g. the true class, if $y_{\text{pred}} \neq y_{\text{true}})$:

$$\text{woe}( y_{\text{pred}} \;/\; y_{\text{true}} : \mathbf{x})$$

In either case, our tool will first display the WoE for the entire input:
$$\text{woe}( y_{\text{pred}} \;/\; y' : \mathbf{x})$$
and then will break this down into WoE scores for each feature:
$$\text{woe}( y_{\text{pred}} \;/\; y' : x_i)$$

Since the WoE is additive, the sum of the individual WoE scores will conicide with the total WoE. 

#### Aggregating Features into Groups

When there is a large number of features, and there is a meaningful way to group them, it is often convenient to instead display woe aggregating according to these groups. For example, suppose there are various features that are all realted to demographics, which we collectively denote as $x_{\text{demo}}$. Then, we might be interested in analyzing their combined effect on the prediction via
$$\text{woe}( y_{\text{pred}} / y' : x_{\text{demo}})$$

#### Sequential Explanations

For multi-class classification, it is sometimes useful to break down the explantion into various 'steps'. For example, if the model puts high and similar probability in two classes, and low probability in the others, it might be interesting to first understand which features provide the model with evidence in favor of the top two classes (step 1) and then which features choose between these two (step 2).


#### An Example

The following plots show WoE scores for of a ML model that predicts income ($y\in \{\text{low}, \text{high}\}$) based on various features. 

Here's what a typical WoE explanation looks like:

![alt text](example_expl.png "Title")

From this plot, we can draw the following conclusions:
* The prior log-odds are positive, i.e., absent other information the model is more likely to predict than not. 
* The values of the `econ` and `usage` attributes provide substantial evidence in favor of class A 
* The values of the `demo` and `safety` attributes provide substantial evidence against class A (equiv., in favor of class B)
* The rest of the attributes dont provide substantial evidence 
* Despite opposing effects, the total evidence **in favor** of A is stronger that that **against** it (think about stacking the positive and negative bars together), so the model predicts class A


## Key take-aways:

1. #### The Weight of Evidence (WoE): "how does the presence of some input features (the *evidence*) affect the prediction of a model (hypothesis)"
2. #### WoE is expressed in terms of log odd ratios, so it is additive over variables (features)
4. #### You can choose to aggregate the WoE scores by feature group or keep them individually for each feature
5. #### You can choose to show one-shot (predicted-vs-other WoE) or sequential (WoE ratios for subsets of the classes at a time) 