Lambda School Data Science

*Unit 2, Sprint 3, Module 4*

---


# Model Interpretation 2

You will use your portfolio project dataset for all assignments this sprint.

## Assignment

Complete these tasks for your project, and document your work.

- [ ] Continue to iterate on your project: data cleaning, exploratory visualization, feature engineering, modeling.
- [ ] Make a Shapley force plot to explain at least 1 individual prediction.
- [ ] Share at least 1 visualization (of any type) on Slack.

But, if you aren't ready to make a Shapley force plot with your own dataset today, that's okay. You can practice this objective with another dataset instead. You may choose any dataset you've worked with previously.

## Stretch Goals
- [ ] Make Shapley force plots to explain at least 4 individual predictions.
    - If your project is Binary Classification, you can do a True Positive, True Negative, False Positive, False Negative.
    - If your project is Regression, you can do a high prediction with low error, a low prediction with low error, a high prediction with high error, and a low prediction with high error.
- [ ] Use Shapley values to display verbal explanations of individual predictions.
- [ ] Use the SHAP library for other visualization types.

The [SHAP repo](https://github.com/slundberg/shap) has examples for many visualization types, including:

- Force Plot, individual predictions
- Force Plot, multiple predictions
- Dependence Plot
- Summary Plot
- Summary Plot, Bar
- Interaction Values
- Decision Plots

We just did the first type during the lesson. The [Kaggle microcourse](https://www.kaggle.com/dansbecker/advanced-uses-of-shap-values) shows two more. Experiment and see what you can learn!


## Links
- [Kaggle / Dan Becker: Machine Learning Explainability — SHAP Values](https://www.kaggle.com/learn/machine-learning-explainability)
- [Christoph Molnar: Interpretable Machine Learning — Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html)
- [SHAP repo](https://github.com/slundberg/shap) & [docs](https://shap.readthedocs.io/en/latest/)

In [None]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*
    !pip install eli5
    !pip install pdpbox
    !pip install shap

# If you're working locally:
else:
    DATA_PATH = '../data/'

In [46]:
DATA_PATH = '../../file (2)'

In [40]:
import shap
import pdpbox
import eli5
import category_encoders as ce
import pandas as pd
import numpy as np

In [61]:
df = pd.read_csv(DATA_PATH)

In [76]:
print(df.shape)
df.head()

(1186, 20)


Unnamed: 0.1,Unnamed: 0,id,name,released_at,uri,scryfall_uri,image_uris,mana_cost,cmc,type_line,colors,color_identity,set,rarity,flavor_text,artist,power,toughness,loyalty,price
0,0,e0f83824-43c6-4101-88fd-9109958b23e2,Ravnica at War,2019-05-03,https://api.scryfall.com/cards/e0f83824-43c6-4...,https://scryfall.com/card/war/28/ravnica-at-wa...,{'small': 'https://img.scryfall.com/cards/smal...,{3}{W},4.0,Sorcery,['W'],['W'],war,rare,The heart of Ravnica disappeared before anyone...,Adam Paquette,,,,0.18
1,1,280f2a85-1900-460b-a768-164fc2dea636,Pteramander,2019-01-25,https://api.scryfall.com/cards/280f2a85-1900-4...,https://scryfall.com/card/rna/47/pteramander?u...,{'small': 'https://img.scryfall.com/cards/smal...,{U},1.0,Creature — Salamander Drake,['U'],['U'],rna,uncommon,,Simon Dominic,1.0,1.0,,0.44
2,2,aa686c34-1c11-469f-93c2-f9891aea521f,Veil of Summer,2019-07-12,https://api.scryfall.com/cards/aa686c34-1c11-4...,https://scryfall.com/card/m20/198/veil-of-summ...,{'small': 'https://img.scryfall.com/cards/smal...,{G},1.0,Instant,['G'],['G'],m20,uncommon,,Lake Hurwitz,,,,6.45
3,3,884c47fa-7060-48da-995c-e4037640a208,Keldon Raider,2019-07-12,https://api.scryfall.com/cards/884c47fa-7060-4...,https://scryfall.com/card/m20/146/keldon-raide...,{'small': 'https://img.scryfall.com/cards/smal...,{2}{R}{R},4.0,Creature — Human Warrior,['R'],['R'],m20,common,Keldon raiders' spoils are limited to what the...,Chris Seaman,4.0,3.0,,0.03
4,4,ae2998a1-1713-467e-a08e-0efd8720aa5b,"Yorvo, Lord of Garenbrig",2019-10-04,https://api.scryfall.com/cards/ae2998a1-1713-4...,https://scryfall.com/card/eld/185/yorvo-lord-o...,{'small': 'https://img.scryfall.com/cards/smal...,{G}{G}{G},3.0,Legendary Creature — Giant Noble,['G'],['G'],eld,rare,,Zack Stella,0.0,0.0,,0.44


In [82]:
target = 'price'
features = df.columns.drop(target)
df = df.loc[df[target] <= 10.0]

unimportant_modeling_features = ['id','uri','scryfall_uri','image_uris', 'artist', 'loyalty']

features = features.drop(unimportant_modeling_features)

In [83]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, train_size= .80, test_size= .20, random_state=0)
train, val = train_test_split(train, train_size= .80, test_size= .20, random_state=0)

In [84]:
X_train = train[features]
y_train = train[target]

X_val = val[features]
y_val = val[target]

X_test = test[features]
y_test = test[target]

In [85]:
encoder = ce.OrdinalEncoder()

X_train = encoder.fit_transform(X_train)
X_val = encoder.transform(X_val)
X_test = encoder.transform(X_test);

In [86]:
from scipy.stats import randint, uniform
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV


param_distributions = { 
    'n_estimators': randint(50, 500), 
    'max_depth': [5, 10, 15, 20, None], 
    'max_features': uniform(0, 1), 
}

search = RandomizedSearchCV(
    RandomForestRegressor(random_state=42), 
    param_distributions=param_distributions, 
    n_iter=5, 
    cv=2, 
    scoring='neg_mean_absolute_error', 
    verbose=10, 
    return_train_score=True, 
    n_jobs=-1 
)

search.fit(X_train, y_train);

Fitting 2 folds for each of 5 candidates, totalling 10 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of  10 | elapsed:    6.4s remaining:    6.4s
[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    7.1s remaining:    3.0s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    7.4s finished


In [87]:
print('Best hyperparameters', search.best_params_)
print('Cross-validation MAE', -search.best_score_)
model = search.best_estimator_

Best hyperparameters {'max_depth': 5, 'max_features': 0.6550933015727188, 'n_estimators': 288}
Cross-validation MAE 0.7967997374309064
