<a href="https://colab.research.google.com/github/chaitragopalappa/MIE590-690D/blob/main/4_XAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# XAI (ExplainableAI)

Sources:   
* Books/Manuscripts
  * [[IML] Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, by Christoph Molnar](https://christophm.github.io/interpretable-ml-book/)  
  * Rami Ibrahim and M. Omair Shafiq, Explainable Convolutional Neural Networks: A Taxonomy, Review, and Future Directions, ACM Computing Surveys, Vol. 55, No. 10, Article 206, 2023 https://dl.acm.org/doi/pdf/10.1145/3563691

* Python Packages  
[SHAP](https://github.com/shap/shap#deep-learning-example-with-gradientexplainer-tensorflowkeraspytorch-models) [API](https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html)  
[CAPTUM](https://captum.ai/docs/attribution_algorithms)  
[TF-Explain](https://pypi.org/project/tf-explain/) [GitHUB](https://github.com/sicara/tf-explain)  

---
---


# Interpretability and Explainability
What is the differnce between **interpretability** and **explainability**?
They are typically used interchangeably though some differences maybe applicable in a context.
Interpretability is mapping a blackbox model  (such as deep learning) into an interpretable form.
Explainability is a stronger term requiring interpretability and additionally explainining the decisions made by the model, which is important for engineering applications where model outputs are used for informing decisions.

Methods used do not change.
We will refer to the general area as XAI

---
---

### **[XAI Taxonomy](https://christophm.github.io/interpretable-ml-book/images/taxonomy.jpg)**
Taxonomy is evolving as rapidly as the field is. Generally, techniques can be catergorized as below
* Intrinsic: could be by design e.g., linear regression, or decision trees).
* Post-hoc: Attached to explain the model **(most deep learning methods need this type of techniques**)
  * model-agnostic: applicable to a range of ML models
    * global: interpretation of model generally
    * local: interpretation of specific predictions
  * model-specific: applicable to specific type of ML models

Another categorization of mostly post-hoc methods specific to deep learning:
* feature attribution,
* layer attribution, or
* neuron attribution.
(There are other newly emerging techniques on concept attribution for CNN) (see [CAPTUM](https://captum.ai/docs/attribution_algorithms)

---
---

## Intrinsic method: LInear regression interpretation
<p align="center">
  <img src="https://christophm.github.io/interpretable-ml-book/limo_files/figure-html/fig-linear-weights-plot-1.png" width="35%" />
  <img src="https://christophm.github.io/interpretable-ml-book/limo_files/figure-html/fig-linear-effects-1.png" width="35%" />
</p>

Source: [IML](https://christophm.github.io/interpretable-ml-book/)  Weight = coefficient;   Effect = weight X feature value

<p align="center"><em>Figure 6.1: Linear regression estimates for bike rental data. Weights are displayed as points and the 95% confidence intervals as lines.  Figure 6.2: Effect plot for linear regression results for the bike rental data. The boxplots show the distribution of effects (= feature value times feature weight) across the data per feature. </em></p>

* To make the estimated weights more comparable, scale the features (zero mean and standard deviation of one) before fitting the linear model.
* The weights depend on the scale of the features, i.e., it will be different if you change the features unit, e.g., a person’s height in meter v. centimeter will hvae differnt weights. Though the weight will change, the actual effect will not.

---
---

# Post-hoc Local model-agnostic methods

Model agnostic
* Ceteris paribus (CP) plots
* Individual conditional expectation (ICE)
* LIME (Local Interpretable Model-Agnostic Explanations)(for classification problems)
* Anchor (for classification problems)
* Shapley values
* [SHAP (SHapley Additive exPlanation)](https://papers.nips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf)

Model-specific: Models with differntiable functions (deep learning)(most gradient-based)
* Integrated gradients
* DeepLift
* DeepLiftSHAP
* GradientSHAP
* Guided GradCAM
* Layer GradCAM

---


**Ceteris paribus plots**
Core concept: How does changing individual features change the predicted values?

<table align="center">
  <tr>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://christophm.github.io/interpretable-ml-book/ceteris-paribus_files/figure-html/fig-whatif-bike-1.png" width="400px" />
        <figcaption>Ceteris Paribus Profile for Bike Rentals: <br> How changing individual features changes the <br>predicted number of bike rentals.</figcaption>
      </figure>
    </td>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://christophm.github.io/interpretable-ml-book/ceteris-paribus_files/figure-html/fig-cp-bike-models-1.png" width="400px" />
        <figcaption>Comparing Multiple Models with CP Profiles: <br>Ceteris paribus curves for the bike prediction task for different models:<br> linear model, random forest, SVM, and decision tree</figcaption>
      </figure>
    </td>
  </tr>
</table>

Source: [IML](https://christophm.github.io/interpretable-ml-book/)  
Dataset: [Bike rental prediction](https://christophm.github.io/interpretable-ml-book/data.html#bike-data)

Strengths
* Simple to do and interpret
* Complement attribution-based methods that do not show how sensitive the prediction function to local changes (like SHAP or LIME) by providing a complete picture to explaining individual predictions.
* Ceteris paribus plots are flexible building blocks.

Limitations
* Ceteris paribus plots only show us one feature change at a time.
* Interpretation suffers when features are correlated.



---

**Individual Conditional Expectation (ICE)**

**ICE** are CP for every instance (sample): The values for each line (each instance) is computed by keeping all other features the same.

Suppose we have 2 features, x1, x2 and N samples. For a plot of x2 against y, we will have N lines, each with x1 fixed at value of that sample, while x2 varies in its full range. Repeat for all features.

**Centered ICE**
It can be hard to tell whether the ICE curves differ between data points because they start at different predictions. A simple solution is to center the curves at a certain point in the feature and display only the difference in the prediction to this point

**Derivative ICE**
Instead of actual value, plot the partial derivatives. If there are no interaction terms, the derivatives will not change for every sample.

---

<table align="center">
  <tr>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/ICE_quadratic%20function.png" width="400px" />
        <figcaption>ICE: y=x1^2 + x2^2.</figcaption>
      </figure>
    </td>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/CenteredICE_quadratic.png" width="400px" />
        <figcaption>Centered ICE : y=x1^2 + x2^2</figcaption>
      </figure>
    </td>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/derivativeICE_quadratic.png" width="400px" />
        <figcaption>Derivative ICE: y=x1^2 + x2^2</figcaption>
      </figure>
    </td>
  </tr>
</table>

-----

<table align="center">
  <tr>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/ICE_interaction_term.png" width="400px" />
        <figcaption>ICE: y=x1^2 + x2^2 + x1 x2.</figcaption>
      </figure>
    </td>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/CenteredICE_INteractionTerm.png" width="400px" />
        <figcaption>Centered ICE : y=x1^2 + x2^2 + x1 x2</figcaption>
      </figure>
    </td>
    <td style="text-align: center; padding: 8px;">
      <figure>
        <img src="https://raw.githubusercontent.com/chaitragopalappa/MIE590-690D/main/images/derivativeICE_interactionTerm.png" width="400px" />
        <figcaption>Derivative ICE: y=x1^2 + x2^2 + x1 x2</figcaption>
      </figure>
    </td>
  </tr>
</table>

---

**LIME (Local Interpretable Model-Agnostic Explanations) (works for tabular data, images, and text)**

Originally developed for interpreting blackbox classifiers: Ribeiro et.al., “Why Should I Trust You?” Explaining the Predictions of Any Classifier, ACM, 2016  https://arxiv.org/pdf/1602.04938 ; [Related Blog](https://www.oreilly.com/content/introduction-to-local-interpretable-model-agnostic-explanations-lime/) ; [Python files](https://github.com/marcotcr/lime)

Recent applications have expanded to regression problems as well: https://github.com/marcotcr/lime/tree/ce2db6f20f47c3330beb107bb17fd25840ca4606

LIME is an interpretable "local" surrogate model of a black-box model- the local surrogate could be linear, Lasso, decision tree etc. Works by generating a new dataset consisting of perturbed samples and the corresponding predictions of the black box model. On this new dataset, LIME then trains an interpretable model, which is weighted by the proximity of the sampled instances to the instance of interest. The learned model is "local" surrogate  (or local fidelity) not a global surrogate.
$$ explanation(x)=arg min_{g\in G} L(\hat{f},g,\pi_x)+\Omega(g)$$
* $g$ is the explanation model for instance $x$ (e.g., linear regression model)
  * $G$ is a class of linear models, such that, for some perturbed instance $z'$, we have  $g(z') = w_g · z'$
* $\hat{f}$  is the orginal prediction model (e.g., neural net)
* $\Omega$ is model complexity (defined by the user, e.g., prefer fewer features),
* $L$ is loss (e.g., mean squared error) between $g$ and $\hat{f}$ and
* $\pi_x$ is the proximity measure defining how large the neighborhood around instance $x$ is that we consider for the explanation.
  * LIME currently uses an exponential smoothing kernel to define the neighborhood.
  * $\pi_x(z) = exp(-D(x, z)^2 / \sigma^2)$ ;
  * $D$ is distance function; (e.g. cosine distance for text, L2 distance for tabular and images)
  * $\sigma$ is width; a small kernel width means that an instance must be very close to influence the local model; a larger kernel width means that instances that are farther away also influence the model.

Strengths:
* MOdel agnostic (actual black-box model could be any model)
* Works on tabular data, images, and text data
* Easy to use

Limitations
* Unstable if linear approximations are not sufficient even in a local neighborhood

---


**LIME Examples**

<p align="center">
  <img src="https://raw.githubusercontent.com/marcotcr/lime/master/doc/images/lime.png" width="50%" /> "
  
</p>
 Figure 3: "*Toy example to present intuition for LIME. The black-box model’s complex decision function f (unknown to LIME) is represented by the blue/pink background, which cannot be approximated well by a linear model. The bold red cross is the instance being explained. LIME samples instances, gets predictions using f , and weighs them by the proximity to the instance being explained (represented here by size). The dashed line is the learned explanation that is locally (but not globally) faithful.*"

 Source: Ribeiro et.al., “Why Should I Trust You?” Explaining the Predictions of Any Classifier, ACM, 2016  https://arxiv.org/pdf/1602.04938

 * [Code 1 - RandomForests Regression](https://marcotcr.github.io/lime/tutorials/Using%2Blime%2Bfor%2Bregression.html)
  
```
Intercept 23.9047475063 # intercept of local surrogate
Prediction_local [ 22.32579479] # prediction from local surrogate
Right: 23.1073 # Actual value -from original prediction model

# Range feature, Coefficients from local linear surrogate
[('6.99 < LSTAT <= 11.43', 1.7571320048618118),
 ('6.21 < RM <= 6.62', -1.5638211582388033),
 ('NOX > 0.62', -0.77384372989110417),
 ('19.10 < PTRATIO <= 20.20', -0.60756112694664299),
 ('2.08 < DIS <= 3.17', -0.39085870918058263)]
```
* [Code 2- Classification using RandomForests and XGBoost](https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html) ;
  



 ---
 ---

**Anchor (for classfication problem)**

Original article: Anchors: Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11491 ; [Python](https://github.com/marcotcr/anchor?tab=readme-ov-file)

Instead of using surrogate linear models used by LIME, this method uses easy-to-understand IF-THEN rules, called anchors. LIME results do not indicate how faithful they are, as LIME solely learns a linear decision boundary that best approximates the model given a perturbation space. Given the same perturbation space, the anchors approach constructs explanations whose coverage is adapted to the model’s behavior, and the approach clearly expresses their boundaries. Thus, they are faithful by design and state exactly for which instances they are valid. This property makes anchors particularly intuitive and easy to comprehend.

See vizual example comparing Anchor with LIME : https://christophm.github.io/interpretable-ml-book/anchors.html

---
Example: Titanic dataset: predict whether or not a passenger survived the Titanic disaster.

Anchor is used to explain why the model predicts 'survived' for this specifc person  
**Feature |     Value**  
Age |	20  
Sex |	female  
Class |	first  
Ticket price |	300$  
More attributes 	…  
Survived |	true  

And the corresponding anchors explanation is:
IF SEX = female AND Class = first THEN PREDICT Survived = true WITH PRECISION 97% AND COVERAGE 15%

---

**Mathematically**  
An **anchor** $A$ is a rule that “anchors” a prediction $\hat{f}(\mathbf{x})$ if, for all samples $\mathbf{z}$ that satisfy the same rule $A$, the model predicts the same outcome with high probability:

$$
\mathbb{E}_{\mathcal{D}_\mathbf{x}(\mathbf{z}\mid A)}[1_{\hat f(\mathbf{x}) = \hat f(\mathbf{z})}] \ge \tau
\quad \text{and} \quad A(\mathbf{x}) = 1
$$

where:  
- $\mathbf{x}$: instance being explained  
- $A$: the anchor rule (set of feature predicates)  
- $\hat{f}$: the black-box prediction model  
- $\mathcal{D}_\mathbf{x}(\mathbf{z}\mid A)$: distribution of perturbed samples satisfying $A$  
- $\tau$: desired precision threshold (0–1)


Anchors guarantee that the rule $A$ holds with **high precision** and **confidence**:

$$
\mathbb{P}( \mathrm{prec}(A) \ge \tau ) \ge 1 - \delta
\quad \text{where} \quad
\mathrm{prec}(A) = \mathbb{E}_{\mathcal{D}_\mathbf{x}(\mathbf{z}\mid A)}[1_{\hat f(\mathbf{x}) = \hat f(\mathbf{z})}]
$$

Here $\delta$ is a small value (e.g., 0.05) representing the acceptable uncertainty.

The **coverage** of an anchor $A$ is the expected fraction of instances for which the rule applies:

$$
\mathrm{cov}(A) = \mathbb{E}_{\mathcal{D}_{(\mathbf{x})}}[A(\mathbf{z})]
$$

---

**Findings Anchors**
Objective function: Find the anchor rule $A$ that maximizes coverage while satisfying the precision constraint:
$$
\underset{A : \mathbb{P}(\mathrm{prec}(A)\ge \tau)\ge1 - \delta}{\max} \; \mathrm{cov}(A)
$$
Anchors utilizes reinforcement learning techniques (specifically multi-armed bandits (MABs)) in combination with a graph search algorithm to reduce the number of model calls (and hence the required runtime) to a minimum while still being able to recover from local optima. To this end, neighbors, or perturbations, are created and evaluated for every instance that is being explained. Doing so allows the approach to disregard the black box’s structure and its internal parameters so that these can remain both unobserved and unaltered.

Strengths:
* Easy to interpret
* Anchors are Subsettable: trade-off between precision and coverage
* Works even when model is non-linear around nieghborhood of the instance being interpreted
* Model-agnostic
* Anchors shed light on the robustness of the original prediction model to changes in features
* It is efficient and be parallelized by using MABs that allow bacth sampling (e.g., BatchSAR)

Limitations
* Highly ocnfigurable : Needs hyperparameter tuning - beam width and precision threshold; Perturbation functions need to be explicity defined for each use case.
* Many scenarios requrie discretization, which in some cases maynot work.
* Computational complexity as with any perturbation based methods - multiple calls to prediction (though use of MAB reduces this to some extent)

**Code**: [Python Package](https://github.com/marcotcr/anchor?tab=readme-ov-file)

---
---

**SHapley values** (for classfication and regression)

Computes feature contributions for a specific prediction.

---
**Before we go to Shapley let us understand feature contributions in Linear Models**

The linear model is given by $
\hat{f}(\mathbf{x}) = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p
$

where **x** is the instance for which we want to compute the contributions.  
Each $x_j$ is a feature value, with $j = 1, \ldots, p$.  
The $\beta_j$ is the weight corresponding to feature $j$.


The contribution $\phi_j$ of the $j$-th feature to the prediction $\hat{f}(\mathbf{x})$ is the difference between the feature effect and the average effect:

$$
\phi_j(\hat{f}) = \beta_j x_j - \beta_j \mathbb{E}[X_j]
$$

where $\beta_j \mathbb{E}[X_j]$ is the mean effect estimate for feature $j$.  
  

Now we know how much each feature contributed to the prediction.  If we sum all the feature contributions for one instance, the result is the following:

$$
\sum_{j=1}^{p} \phi_j(\hat{f})
= \sum_{j=1}^{p} (\beta_j x_j - \beta_j \mathbb{E}[X_j])
$$

$$
= \left( \beta_0 + \sum_{j=1}^{p} \beta_j x_j \right)
- \left( \beta_0 + \sum_{j=1}^{p} \beta_j \mathbb{E}[X_j] \right)
$$

$$
= \hat{f}(\mathbf{x}) - \mathbb{E}[\hat{f}(\mathbf{X})]
$$

This is the predicted value for the data point **x** minus the average predicted value.  Feature contributions can be negative.

----


**What are Shapley values?**  
IN blackbox models we do not have such coefficients. Shapley values from game theory method provides a suitable solution to compute similar feature attribution for any blackbox model (model agnostic).

The Shapley value is the average marginal contribution of a feature value across all possible coalitions.
The Shapley value, coined by Shapley (1953), is a method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation.

**Example**: Suppose we trained a machine learning model to predict apartment prices (y). ( From persepctive of game-theory this is the **game**)

* Suppose for a certain apartment with **X =[area of 50m, 2nd floor, park nearby, cats banned]**, it predicts **y = €300,000**
  * Goal: **explain this prediction**.
* Suppose the average prediction for all apartments is €310,000.
* Difference between  actual and average is the **payout** (€300,000 - €310,000 = -€10,000
  * Goal: explain how each feature (**players**) contributed to this payout
  * The answer could be: The park-nearby contributed €30,000; area-50 contributed €10,000; floor-2nd contributed €0; cat-banned contributed -€50,000. The contributions add up to -€10,000, the final prediction minus the average predicted apartment price.*" These would be the **Shapley values specific to this "instance"**.

*NOte: A player can be an individual feature value, typical for tabular data. A player can also be a group of feature values, e.g., to explain an image, pixels can be grouped into superpixels, and the prediction distributed among them.*



---

**How to calculate Shapley value for one feature?**
* The Shapley value is the average marginal contribution of a feature value across all possible coalitions. A coalition is a subset of features.

All possible coalitions for "Cat-banned" feature

    {} (empty coalition)
    {park-nearby}
    {area-50}
    {floor-2nd}
    {park-nearby,area-50}
    {park-nearby,floor-2nd}
    {area-50,floor-2nd}
    {park-nearby,area-50,floor-2nd}

* For each of these coalitions, we compute the predicted apartment price with and without the feature value cat-banned and take the difference to get the marginal contribution.
  * We replace the feature values of features that are not in a coalition with a reference value (e.g., the mean of the feature in the training data, or a randomly sampled value).
* The Shapley value is the (weighted) average of all marginal contributions.

[See figure for visual for coalition](https://christophm.github.io/interpretable-ml-book/images/shapley-instance-intervention.jpg) Figure 17.2: One sample repetition to estimate the contribution of cat-banned to the prediction when added to the coalition of park-nearby and area-50.

If we estimate the Shapley values for all feature values, we get the complete feature distribution for the payout (prediction minus the average).

**NOte**:  The sum of Shapley values yields the difference of actual and average prediction.

The results are often plotted as a bar graph, for bike rental prediction: to predict the number of rented bikes for a day, given weather and calendar information [Figure 17.6](https://christophm.github.io/interpretable-ml-book/shapley.html#fig-shapley-bike-plot)

---
**Shapley value Properties**  
The Shapley value is the only attribution method that satisfies the properties Efficiency, Symmetry, Dummy, and Additivity, which together can be considered a definition of a fair payout.
* **Efficiency** The feature contributions must add up to the difference of prediction for $x$ and the average.
* **Symmetry** The contributions of two feature values should be the same if they contribute equally to all possible coalitions.
* **Dummy** A feature that does not change the predicted value – regardless of which coalition of feature values it is added to – should have a Shapley value of 0
* **Additivity** The sum of the Shapley values for two combined games is the same as the sum of the individual games' Shapley values. This is useful in random forests (where the prediction is an average of many decision trees). The Additivity property guarantees that for a feature value, you can calculate the Shapley value for each tree individually, average them, and get the Shapley value for the feature value for the random forest.

---
**Methods to estimate Shapley values**  
All possible coalitions (sets) of feature values have to be evaluated with and without the j-th feature to calculate the exact Shapley value, which can be computationally challenging as the number of features increase, a MOnte Carlo approximation was proposed, to randomly select a limited (user-input number) of coalitions.

---
Strengths:
* The difference between the prediction and the average prediction is "fairly distributed" among the feature values of the instance; this arises from the Efficiency property of Shapley values that ensures the sum of the individual values for all players equals the total value of the cooperative game (i.e., here the Shapley values will sum up to the total outcome). It thus provides a full explanation of all features.
* Allows for contrastive explanation: instead of comparing a prediction to a average prediction of the entire dataset (as discussed above) we could compare it to the average of subset of data or a specific data; this tyoe of contrastiveness is lacking in LIME and Anchor
* Grounded in solid theory see refernce source for details: https://christophm.github.io/interpretable-ml-book/shapley.html (LIME assume lienarity in local neighborhood but no theory on why that would work)

Limitations"
* Returns a simple value per feature, but no prediction model like LIME or Anchor
* COmputationally expensive 2^k for each feature for each comapritive analysis
* Shapley value explanations are not to be interpreted as local in the sense of gradients or neighborhood. (Bilodeau et al. 2024) For example, a positive Shapley value doesn’t mean that increasing the feature would increase the prediction. Instead, the Shapley value has to be interpreted with respect to the reference dataset that was used for the estimation.
* Like many other permutation-based interpretation methods, the Shapley value method suffers from inclusion of unrealistic data instances when features are correlated.

---
---


**SHAP v. Shapley values**

Source: [Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, by Christoph Molnar](https://christophm.github.io/interpretable-ml-book/shap.html)

*"SHAP (SHapley Additive exPlanations) by Lundberg and Lee (2017) is a method to explain individual predictions. SHAP is based on the game-theoretically optimal Shapley values. I recommend reading the chapter on Shapley values first.*

*To understand why SHAP is a thing and not just an extension of the Shapley values chapter, a bit of history: In 1953, Lloyd Shapley introduced the concept of Shapley values for game theory. Shapley values for explaining machine learning predictions were suggested for the first time by Štrumbelj and Kononenko (2011) and Štrumbelj and Kononenko (2014). However, they didn’t become so popular. A few years later, Lundberg and Lee (2017) proposed SHAP, which was basically a new way to estimate Shapley values for interpreting machine learning predictions, along with a theory connecting Shapley values with LIME and other post-hoc attribution methods, and a bit of additional theory on Shapley values.*

*You might say that SHAP is just a rebranding of Shapley values (which is true), but that would miss the fact that SHAP also marks a change in popularity and usage of Shapley values, and introduced new ways of estimating Shapley values and aggregating them in new ways. Also, SHAP brought Shapley values to text and image models."*

---

**[SHAP](https://github.com/shap/shap#deep-learning-example-with-gradientexplainer-tensorflowkeraspytorch-models) (SHapley Additive exPlanations)**

SHAP assigns each feature an importance value for a particular prediction

[SHAP_documentation](https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html)

SHAP is represented as an additive feature attribution method (a linear model), connecting LIME and Shapley values. Specifcally, SHAP defines an explanation model as a linear function of binary variables representing the presence (1)) or absence (0)) of features. The model is defined as:
$$g(z^{\prime })=\phi _{0}+\sum _{j=1}^{M}\phi _{j}z_{j}^{\prime }$$ Where:
* $g$ is the explanation model.
* $z^{\prime }=(z_1',z_2',...., z_M')^T \in \{0,1\}^{M}$ is the coalition vector.
* $M$ is the number of features, the maximum coalition size.
* $\phi _{j}\in \mathbb{R}$ is the Shapley value (SHAP value) for feature $j$.
* $\phi _{0}=\mathbb{E}[f(X)]$ is the base value, or the average model output over the training dataset.

The above equation represents the idea that the model's output can be expressed as the sum of a baseline value and the contribution of each feature.

**Note**: features can be different than features of the model, e.g., in image data, the images are not represented on the pixel level, but aggregated into superpixels.

**Note**: The properties of Shapley values apply here too:
* Local accuracy or efficiency: For a specific input instance $x$, the SHAP values add up to the difference between the model's prediction for that instance, $f(x)$ and the base value, $\phi _{0}$, i.e., by setting colatition vector as vectors of 1's we get
$$\sum _{j=1}^{M}\phi _{j}=f(x)-\phi _{0}$$
* Satisfies properties of symmetry, dummy, and efficiency.

---
**SHAP estimation methods**  
Calculating the exact Shapley value is computationally intensive, as it requires evaluating $2^{M}$ coalitions for each instance. KernelSHAP, TreeSHAP, and Permutation methods are used. KernelSHAP help explain how SHAP connects Shapley values with LIME but are not computationally efficient and not commonly used. Permutation methos is more common. TreeSHAP is specific to tree based methods: ML, e.g., deecisions trees, random forests, and gradient boosted trees. We will skip TreeSHAP.

---
**KernelSHAP**:   
Works by creating a simplified linear regression model to explain the model's predictions. It samples a subset of coalitions to train the linear model instead of evaluating all possible coalitions. Note this is similar to LIME concept, except that it uses a SHAP kernel for the proximity measure.

**Steps of KernelSHAP:**
* Sample coalition vectors $z_k'\in {0,1}^M, k\in{1,..., K} $ (1 = feature present in coalition, 0 = feature absent).$M$ is number of features. $K$ is number of coalitions to sample.
* Get prediction for each $z_k'$ by first converting it to the **original feature space** $h_x(z_k')$ and then applying the prediction model $\hat{f}:\hat{f}(h_x(z_k'))$.
* Compute the weight for each coalition with the **SHAP kernel weight function**.
* Find the **Shapley values** $\phi_k$, by solving for the coefficients of a weighted linear regression model that minimizes the **loss function**.

**SHAP kernel's weight function:**
:$$\pi _{x^{\prime }}(z^{\prime })=\frac{(M-1)}{(M \text{ choose } z') |z^{\prime }|(M-|z^{\prime }|)}$$
* $\pi _{x^{\prime }}(z^{\prime })$ is the weight for the coalition vector $z^{\prime }$.
* $M$  is the number of features.
* $|z^{\prime }|$ is the number of non-zero elements in $z^{\prime }$ (the size of the coalition).

*SHAP kernel weights the sampled coalitions to give more importance to coalitions with very few or almost all features*

**Loss function:**

$$Loss(f,g,\pi _{x^{\prime }})=\sum _{z^{\prime }\in Z}[f(h_{x}(z^{\prime }))-g(z^{\prime })]^{2}\pi _{x^{\prime }}(z^{\prime })$$
* $Z$ is the training data.
* $g$ is the weighted linear regression model $$g(z^{\prime })=\phi _{0}+\sum _{j=1}^{M}\phi _{j}z_{j}^{\prime }$$
* $h_x$ for tabular data: maps a coalition to a valid instance; for present features (1), it maps to the feature values of $x$. For absent features (0), it maps to the values of a randomly sampled data instance. [See Figure 18.1](https://christophm.github.io/interpretable-ml-book/shap.html#fig-shap-simplified-feature)
*  $h_x$ for image data: maps coalitions of superpixels (sp) to images. Superpixels are groups of pixels. For present features (1), $h_x$ returns the corresponding part of the original image. For absent features (0), $h_x$ greys out the corresponding area.  [See Figure 18.2](https://christophm.github.io/interpretable-ml-book/shap.html#fig-shap-images). Assigning the average color of surrounding pixels or similar would also be an option.
* The resulting coefficients $\phi _{j}$ of this linear model are the approximate SHAP values

**Note**: The big difference between SHAP and LIME is the weighting of the instances in the regression model. LIME weights the instances according to how close they are to the original instance. SHAP weights the sampled instances according to the weight the coalition would get in the Shapley value estimation. Small coalitions (few 1’s) and large coalitions (i.e. many 1’s) get the largest weights. The intuition behind it is: We learn most about individual features if we can study their effects in isolation. If a coalition consists of a single feature, we can learn about this feature’s isolated main effect on the prediction. If a coalition consists of all but one feature, we can learn about this feature’s total effect (main effect plus feature interactions). If a coalition consists of half the features, we learn little about an individual feature’s contribution, as there are many possible coalitions with half of the features.


SHAP computtaional tools
* SHAP webbrowser-based vizualization tool {SHAP tool- Poloclub](https://poloclub.github.io/webshap/?model=image)

