## SHAP Values

### A brief introduction to shapley values

SHAP values are generated through the [SHAP library](https://shap.readthedocs.io/en/latest/index.html) and are approximations of [Shapley Values](https://en.wikipedia.org/wiki/Shapley_value), a concept derived from game-theory. A very abbreviated explanation of how these values are generated: for every model decision passed to the explainer, the explainer considers how the model decision is impacted by removing that feature. For a more in-depth explanation consider this [summary article](https://towardsdatascience.com/understanding-how-ime-shapley-values-explains-predictions-d75c0fceca5a).

Rsmexplain by default uses the [Sampling](https://shap.readthedocs.io/en/latest/generated/shap.explainers.Sampling.html#shap.explainers.Sampling) explainer model, which computes shap values through a method described [here](https://link.springer.com/article/10.1007/s10115-013-0679-x).

The sampling explainer is model agnostic, meaning it should in principle work for any type of model. Rsmexplain currently only supports regressors. 


### How to read shap values

Shap values are additive representations of a feature's impact on a model decision. The sum of all shap values and the base value for a prediction should yield the actual model output.

A shap value for a feature can be considered that feature's contribution to the decision during that specific prediction. By calculating an absolute mean of all shap values of a feature, we can calculate an average impact for the data that was passed to the explainer. Absolute mean shap values are saved in "/csv_files/mean_shap_values.csv".



### Things to consider

Rsmexplain can only generate shap values for the data passed in the "explainable_data" and "range" parameters. If the dataset passed is small, then the values derived cannot be considered representative of the model as a whole. Plots that display mean values for your shap values should be taken with a grain of salt if your passed data was small, or not representative of the typical data the model deals with.

As long as sufficiently large background set was passed, the individual values for predictions can be considered trustworthy.

If you wish to investigate your shap values by hand, please refer to files in "/csv_files/".

If you wish to use the generated shap Explanation object, you may unpickle "explanation.pkl". Your initial row ids are stored in "ids.pkl" in a dictionary format of \{array index: actual index\}.

### An overview over your shap values

This is a quick text overview over your shap values. Please refer to the Plots section for visualizations.



#### Absolute Mean Shap Values

The top 5 features in terms of absolute mean impact were:

In [None]:
display(mean_values[0:5])

The following features have an absolute mean shap value of 0:

In [None]:
try:
    value0 = mean_values.loc[mean_values['abs. mean shap value'].isin([0])]
    display(value0)
except:
    display(Markdown("No features with a mean value of 0 found."))

If features appear in the above list with a mean shap value of 0, then those features did not contribute to the model decisions. If the data set passed was large and representative of the data the model usually encounters, then this may mean that those features are not useful for the model.

Before you draw conclusions, make sure that those features were not simply set to 0 in all data instances that were passed to the model. This might accidentally create this effect.

#### Absolute Max Shap Values

Here are the top 5 features in terms of absolute maximal impact:

In [None]:
display(max_values[0:5])

If the features in the above list do not overlap with the top 5 in terms of absolute mean impact, then these features have high outlier values, but less overall average impact.