## Shapley Values to explain Machine Learning outcomes
## Illustration with the Adult Dataset

This demo is from the `shap`[ package documentation ](https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Census%20income%20classification%20with%20scikit-learn.html )



In [None]:
import sklearn
import numpy as np

In [None]:
try: import shap
except ImportError: 
    !pip install shap
    import shap

Import the Adult Dataset - which predicts whether an adult makes over a certain income threshold, based on the 1994 ACS data 

In [None]:
X, y = shap.datasets.adult()
X["Occupation"] *= 1000  # to show the impact of feature scale on KNN predictions
X_display, y_display = shap.datasets.adult(display=True)
X_train, X_valid, y_train, y_valid = sklearn.model_selection.train_test_split(
    X, y, test_size=0.2, random_state=7
)

Fit a K Nearest Neighbors Model 

In [None]:

knn = sklearn.neighbors.KNeighborsClassifier()
knn.fit(X_train, y_train)

The following function will iterate over all the possible combinations - It takes a couple of minutes to run

In [None]:
##  THIS CELL TAKES A LONG TIME TO RUN  
def f(x):
    return knn.predict_proba(x)[:, 1]


med = X_train.median().values.reshape((1, X_train.shape[1]))

explainer = shap.Explainer(f, med)
shap_values = explainer(X_valid.iloc[0:1000, :])

Now you can visualize the relative impacts of the different features 

In [None]:
shap.plots.waterfall(shap_values[0])


In [None]:
shap.plots.beeswarm(shap_values)


And finally you can see the range of impacts an individual feature along the continuum of values for that feature

In [None]:
shap.plots.scatter(shap_values[:, "Education-Num"])


In [None]:
shap.plots.heatmap(shap_values)
