# [Interpretable Machine Learning](http://people.csail.mit.edu/beenkim/icml_tutorial.html)

Sun, 8/5/2017

Been Kim (Google Brain)

Finale Doshi-Velez (Harvard)

*As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination.*

Relevant papers:

1. [*Interactive and Interpretable Machine Learning Models for Human Machine Collaboration*](http://people.csail.mit.edu/beenkim/papers/BKimPhDThesis.pdf)
2. [*Examples are Not Enough, Learn to Criticize! Criticism for Interpretability*](http://people.csail.mit.edu/beenkim/papers/KIM2016NIPS_MMD.pdf)
3. [*Towards A Rigorous Science of Interpretable Machine Learning*](https://arxiv.org/pdf/1702.08608.pdf)

### Machine learning is a powerful tool with serious consequences

* Cost-effective Health Care built models to predict probability of death for patients (Cooper et al. 97)
  * Trained two models:
    * Neural network
    * Logistic regression
  * Neural net performed better, however, logistic regression extracted this pattern:
    * `HasAsthma(x) -> LowerRisk for pneumonia(x)`
  * So, knowing neural nets performed better, what *other* rules did it learn? __And how can we find out?__

### Why is this relevant now?

* Widespread data collection + vast computation resources
  * ML everywhere!
  
### Example

*Why not just use a decision tree?*
* Can you explain the logic of the tree build (i.e., the *actual* rationale)?
* Can you guess which feature was "more important?"

*Well, what if we learn rule sets instead?*
* Can still be extremely complex:
```
IF (sunny and hot) OR (cold and snowing) AND NOT (...)
```

### So do decision trees/rule lists, etc. not work?

* __They may work for your use case!__
* *However*, there is no one right answer

## What is interpretability

* Interpretation is the process of giving __explanations__ (TO HUMANS)

  ### 1. Why and when?
    - When there is fundamental __underspecification__ in the problem
      - More data or more clever algorithm will not help
      - E.g., safety in autonomous vehicles (what is safety?)
        - Not hitting a person?
        - Stopping within 2 feet of someone?
      - E.g., debugging
        - Why doesn't something work?
      - E.g., mismatched objectives and multi-objective trade-offs
        - What you optimize is not what you meant to optimize
      - E.g., science
        - You want to discover something new, but you don't know what it is.
      - E.g., __legal/ethics__
        - We're legally required for regulatory purposed to provide an explanation
        
    - When would we __not__ want interpretability (or, rather, when is it not as important)?
      - No significant consequences
      - Sufficiently well-studied problem
      - Prevent gaming the system
      
  ### 2. How can we achieve interpretability?
    - Three stages:
    
      #### a. Prior to building models
        - Visualization (*note to reader: check out Google's new open source ["Facets"](https://pair-code.github.io/facets/) tool!*)
        - Exploratory data analysis
        
      #### b. When building a new model
        - Select an interpretable method:
        
          i. Rule-based, per-feature-based (parametric)
            - __However__, subject to data density and mis-represented subpopulations!
            - It might not be as interpretable as you think (imagine a decision tree of depth 10&mdash;how do you really rationalize what's going on?)
            - Each feature must be independently interpretable for model to be interpretable
            
          ii. Case-based (similar to clustering)
            - "I recommend treatment X because it worked for patients like you"
            - __However__, there may not be good, representative examples
            - Humans might over-generalize
            
          iii. Sparsity-based
            - Model correlations across subtrees
            - __However__, just because it's sparse doesn't mean it's interpretable. Over-sparsity might spawn randomness.
            
          iv. Monotonicity
            - Learn piecewise monotonic functions
          
      #### c. After building a model (if you already have one):
      
        - Sensitivity analysis, gradient-based methods
          - "What would happen to output $y$ if we perturb the input $x \rightarrow x + \epsilon$" (see Riberiro et al. '16)
          - See Koh et al. '17
          
        - Saliency/attribution maps:
          - Give me the features in the input space that mattered for the classification ($\frac{\partial y}{\partial x_{ij}}$)
          
          - Derivative might not sum up for non-linear functions
          - __However__, model may not allow sensitivity analysis.
          
        - Mimic models
          - Model compression or distillation (Bucila et al. '06, Ba et al. '14, Hinton et al. '15)
          - Visual explanations (Hendricks et al. '16)
          - __However__, you might not be able to distill (there may not be a simpler model at all), and there might be a gap between what the actual model is doing and what your mimic model is doing
        
        - Investigation on hidden layers
          - Deep dream
          - Deconvolution net
          - Network dissection
          - __However__, there may be a lack of actional insights
          
  ### 3. How do we measure the explanation quality (i.e., *what is good?*)
    - "You know it when you see it"
    - Benchmark against human performance
    - We need something more general...
    
      - Function-based
        - Can we use some proxy such as sparsity, monotonicity, or non-negativity?
        - Easy to formalize, optimize and evaluate (*but might not solve a real world need!*)
        
      - Application based
        - Does providing interpretability assist with down-stream tassks, such as increasing fairness, safety, scientific discovery, or productivity?
        - Can be very costly and difficult to compare work A to B!
        
      - __Cognition-based__
        - What factor should change to change the outcome?
        - What are the discriminative features?
        - `What [INPUT|WEIGHT|COST] would change the [PREDICTIONS FOR|CLUSTER OF] x?`
        - E.g., forward simulation

### Cognition-based interpretability efficacy

1. Problem-related factors
  - Global vs. local
    - Local meaning related to the specific observation ("what happened to this guy? Why this prediction?")
  
  - Time budget
    - Different problems have a different time criticality
  
  - Severity of underspecifcation
    - How much risk does ambiguity pose?
    
2. Method-related factors
  - Cognitive chunks
    - Representation of information that you get to choose
    
  - Audience training
    - The expert's background will affect what cognitive chunks are selected/identified

## Take-aways for State Farm

Obviously State Farm has a huge need for model interpretability in order to overcome model regulatory challenges and to get through ERM model validation. However, Been raises a good point: *how do we define model interpretability?* 

Traditionally, and mostly to the actuarial team, this means just using GLM or other simple, linear models that have intrinsic feature selection terms/penalties and feature importances. But, perhaps altering how we look at model interpretability could allow our data scientists to build more complex models so long as we explore interpretability practices such as sensitivity analysis (or partial dependency maps, etc.). This also colors the way we design features for our models; if each feature is not independently "interpretable" or if the subspace density is too small, our interpretation of the model might be impacted (i.e., pneumonia risk in adults > 100&mdash;risk doesn't actually decrease, we just see less observations of adults within this demographic due to other factors).

__Final considerations__
1. Models should not be interpretable for the sake of being so, since it's an expensive, convoluted process to validate. Only models that truly require interpretability should make these considerations
2. Features should be examined (within reason) for regions of subspace sparsity, and that should be considered when trying to explain a model
3. Specifically consider "audience training" when interpreting models for business partners, and look for their validation while being aware of their potential biases
4. Know what granularity really should be reasonable for model interpretability&mdash;do you need to actually look at each individual observation?