# [SHAP Values](https://towardsdatascience.com/explain-your-model-with-the-shap-values-bc36aac4de3d) - (SHapley Additive exPlanations)

Objective: Model transparency and transparent 

## What is it?
the average of the marginal contributions across all permutations.

The "error" is the difference between the actual value and prediction. 

## Benefits
1. global interpretability  - SHAP values can show how much each predictor contributes, either in a positive or negative manner. 
2. local interpretability - each observation gets its own set of SHAP values. Local interpretability enables us to pinpoint and contrast impacts of the factors. 
3. can be calculated for any-tree based model, while other methods use linear regression models as the surrogate models.

#### Note: Model interpretability does NOT mean `causality`

In [1]:
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor


In [8]:
df = pd.read_csv('winequality-red.csv', sep=';') # Load the data
# The target variable is 'quality'.

In [10]:
Y = df['quality']
X =  df[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar','chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density','pH', 'sulphates', 'alcohol']]
# Split the data into train and test data:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6, random_state=0, n_estimators=10)
model.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=6,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=0, verbose=0,
                      warm_start=False)

In [11]:
import shap

shap_values = shap.TreeExplainer(model).shap_values(X_train)
shap.summary_plot(shap_values, X_train, plot_type="bar")

ModuleNotFoundError: No module named 'shap'