Lambda School Data Science

*Unit 2, Sprint 3, Module 4*

---


# Model Interpretation 2

You will use your portfolio project dataset for all assignments this sprint.

## Assignment

Complete these tasks for your project, and document your work.

- [ ] Continue to iterate on your project: data cleaning, exploratory visualization, feature engineering, modeling.
- [ ] Make a Shapley force plot to explain at least 1 individual prediction.
- [ ] Share at least 1 visualization (of any type) on Slack.

But, if you aren't ready to make a Shapley force plot with your own dataset today, that's okay. You can practice this objective with another dataset instead. You may choose any dataset you've worked with previously.

## Stretch Goals
- [ ] Make Shapley force plots to explain at least 4 individual predictions.
    - If your project is Binary Classification, you can do a True Positive, True Negative, False Positive, False Negative.
    - If your project is Regression, you can do a high prediction with low error, a low prediction with low error, a high prediction with high error, and a low prediction with high error.
- [ ] Use Shapley values to display verbal explanations of individual predictions.
- [ ] Use the SHAP library for other visualization types.

The [SHAP repo](https://github.com/slundberg/shap) has examples for many visualization types, including:

- Force Plot, individual predictions
- Force Plot, multiple predictions
- Dependence Plot
- Summary Plot
- Summary Plot, Bar
- Interaction Values
- Decision Plots

We just did the first type during the lesson. The [Kaggle microcourse](https://www.kaggle.com/dansbecker/advanced-uses-of-shap-values) shows two more. Experiment and see what you can learn!


## Links
- [Kaggle / Dan Becker: Machine Learning Explainability — SHAP Values](https://www.kaggle.com/learn/machine-learning-explainability)
- [Christoph Molnar: Interpretable Machine Learning — Shapley Values](https://christophm.github.io/interpretable-ml-book/shapley.html)
- [SHAP repo](https://github.com/slundberg/shap) & [docs](https://shap.readthedocs.io/en/latest/)

In [None]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*
    !pip install eli5
    !pip install pdpbox
    !pip install shap

# If you're working locally:
else:
    DATA_PATH = '../data/'

In [2]:
url = 'https://raw.githubusercontent.com/JeanFraga/DS-Unit-2-Applied-Modeling/master/data/Restaurant_Consumer_Data_merged'

import pandas as pd

df = pd.read_csv(url)
df.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,Ulatitude,Ulongitude,smoker,drink_level,dress_preference,...,Rpayment_Diners_Club,Rpayment_Discover,Rpayment_Japan_Credit_Bureau,Rpayment_MasterCard-Eurocard,Rpayment_VISA,Rpayment_Visa,Rpayment_bank_debit_cards,Rpayment_cash,Rpayment_checks,Rpayment_gift_certificates
0,U1077,135085,2,2,2,22.156469,-100.98554,False,2,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,U1077,135038,2,2,1,22.156469,-100.98554,False,2,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2,U1077,132825,2,2,2,22.156469,-100.98554,False,2,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,U1077,135060,1,2,2,22.156469,-100.98554,False,2,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,U1077,135027,0,1,1,22.156469,-100.98554,False,2,0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0


In [3]:
from sklearn.model_selection import train_test_split

target1 = 'rating'
target2 = 'food_rating'
target3 = 'service_rating'

X = df.drop(columns=[target1, target2, target3])
y1 = df[target1]
y2 = df[target2]
y3 = df[target3]

X_train, X_test= train_test_split(X, test_size=0.2, random_state=7)
y1_train, y1_test,y2_train, y2_test,y3_train, y3_test= train_test_split(y1,y2,y3, test_size=0.2, random_state=7)
X_train.shape, X_test.shape,y1_train.shape, y1_test.shape,y2_train.shape, y2_test.shape,y3_train.shape, y3_test.shape

((928, 495), (233, 495), (928,), (233,), (928,), (233,), (928,), (233,))

In [4]:
print(y1_train.value_counts(normalize=True))
print(y2_train.value_counts(normalize=True))
print(y3_train.value_counts(normalize=True))

2    0.415948
1    0.358836
0    0.225216
Name: rating, dtype: float64
2    0.437500
1    0.329741
0    0.232759
Name: food_rating, dtype: float64
1    0.365302
2    0.358836
0    0.275862
Name: service_rating, dtype: float64


In [5]:
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score

dummy = DummyClassifier(strategy= 'stratified',random_state=7)
y1_pred = dummy.fit(X_train, y1_train).predict(y1_test)
print(accuracy_score(y1_test, y1_pred))
y2_pred = dummy.fit(X_train, y2_train).predict(y2_test)
print(accuracy_score(y2_test, y2_pred))
y3_pred = dummy.fit(X_train, y3_train).predict(y3_test)
print(accuracy_score(y3_test, y3_pred))

0.3218884120171674
0.40772532188841204
0.296137339055794


In [6]:
from sklearn.metrics import classification_report
print(classification_report(y1_test, y1_pred))
print(classification_report(y2_test, y2_pred))
print(classification_report(y3_test, y3_pred))

              precision    recall  f1-score   support

           0       0.12      0.16      0.13        45
           1       0.35      0.32      0.33        88
           2       0.43      0.40      0.41       100

    accuracy                           0.32       233
   macro avg       0.30      0.29      0.29       233
weighted avg       0.34      0.32      0.33       233

              precision    recall  f1-score   support

           0       0.33      0.40      0.36        50
           1       0.32      0.33      0.33        73
           2       0.52      0.46      0.49       110

    accuracy                           0.41       233
   macro avg       0.39      0.40      0.39       233
weighted avg       0.42      0.41      0.41       233

              precision    recall  f1-score   support

           0       0.20      0.24      0.22        59
           1       0.35      0.33      0.34        87
           2       0.32      0.30      0.31        87

    accuracy        

In [9]:
import category_encoders as ce
#from sklearn.impute import SimpleImputer
#from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier

encoder = ce.OrdinalEncoder()
X_train_encoded = encoder.fit_transform(X_train)
X_test_encoded = encoder.transform(X_test)

eval_set = [(X_train_encoded, y2_train), 
            (X_test_encoded, y2_test)]

model = XGBClassifier(
    n_estimators=1000, # <= 1000 trees, depends on early stopping
    max_depth=15,       # try deeper trees because of high cardinality categoricals
    learning_rate=.1, # try higher learning rate
    n_jobs=-1
)

model.fit(X_train_encoded, y2_train, eval_set=eval_set, 
          eval_metric=['merror'], early_stopping_rounds=100)

[0]	validation_0-merror:0.096983	validation_1-merror:0.407725
Multiple eval metrics have been passed: 'validation_1-merror' will be used for early stopping.

Will train until validation_1-merror hasn't improved in 100 rounds.
[1]	validation_0-merror:0.076509	validation_1-merror:0.446352
[2]	validation_0-merror:0.076509	validation_1-merror:0.44206
[3]	validation_0-merror:0.067888	validation_1-merror:0.446352
[4]	validation_0-merror:0.063578	validation_1-merror:0.424893
[5]	validation_0-merror:0.056034	validation_1-merror:0.44206
[6]	validation_0-merror:0.049569	validation_1-merror:0.433476
[7]	validation_0-merror:0.046336	validation_1-merror:0.416309
[8]	validation_0-merror:0.039871	validation_1-merror:0.420601
[9]	validation_0-merror:0.036638	validation_1-merror:0.424893
[10]	validation_0-merror:0.03125	validation_1-merror:0.424893
[11]	validation_0-merror:0.02694	validation_1-merror:0.429185
[12]	validation_0-merror:0.023707	validation_1-merror:0.424893
[13]	validation_0-merror:0.0226

[140]	validation_0-merror:0	validation_1-merror:0.390558
[141]	validation_0-merror:0	validation_1-merror:0.390558
[142]	validation_0-merror:0	validation_1-merror:0.390558
[143]	validation_0-merror:0	validation_1-merror:0.390558
[144]	validation_0-merror:0	validation_1-merror:0.386266
[145]	validation_0-merror:0	validation_1-merror:0.390558
[146]	validation_0-merror:0	validation_1-merror:0.386266
[147]	validation_0-merror:0	validation_1-merror:0.390558
[148]	validation_0-merror:0	validation_1-merror:0.390558
[149]	validation_0-merror:0	validation_1-merror:0.390558
[150]	validation_0-merror:0	validation_1-merror:0.390558
[151]	validation_0-merror:0	validation_1-merror:0.390558
[152]	validation_0-merror:0	validation_1-merror:0.39485
[153]	validation_0-merror:0	validation_1-merror:0.39485
[154]	validation_0-merror:0	validation_1-merror:0.39485
[155]	validation_0-merror:0	validation_1-merror:0.39485
[156]	validation_0-merror:0	validation_1-merror:0.39485
[157]	validation_0-merror:0	validati

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=15,
              min_child_weight=1, missing=None, n_estimators=1000, n_jobs=-1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [12]:
import eli5
from eli5.sklearn import PermutationImportance
transformers = make_pipeline(
    ce.OrdinalEncoder(), 
#    SimpleImputer(strategy='median')
)

permuter = PermutationImportance(
    model, 
    scoring='accuracy', 
    n_iter=5, 
    random_state=42
)

X_train_transformed = transformers.fit_transform(X_train)
X_test_transformed = transformers.transform(X_test)


# model = RandomForestClassifier(n_estimators=100, random_state=7, n_jobs=-1)
# model.fit(X_train_transformed, y1_train)

permuter.fit(X_test_transformed, y2_test)

feature_names = X_test.columns.tolist()
pd.Series(permuter.feature_importances_, feature_names).sort_values(ascending=False)

eli5.show_weights(
    permuter, 
    top=None, # show permutation importances for all features
    feature_names=feature_names
)

Weight,Feature
0.0464  ± 0.0268,latitude
0.0335  ± 0.0284,userID
0.0326  ± 0.0358,longitude
0.0318  ± 0.0150,drink_level
0.0318  ± 0.0246,color
0.0283  ± 0.0103,dress_preference
0.0283  ± 0.0221,interest
0.0283  ± 0.0088,personality
0.0258  ± 0.0224,Ulongitude
0.0258  ± 0.0196,Ulatitude


In [34]:
print('Shape before removing features:', X_train.shape)
minimum_importance = -0
mask = permuter.feature_importances_ > minimum_importance
neg_mask = permuter.feature_importances_ < minimum_importance
features2 = X_train.columns[mask]
X2_train = X_train_transformed[features2].copy()

print('Shape after removing features:', X2_train.shape)

Shape before removing features: (928, 495)
Shape after removing features: (928, 54)


In [38]:
X2_test = X_test_transformed[features2].copy()

# pipeline = make_pipeline(
#     ce.OrdinalEncoder(), 
#     #SimpleImputer(strategy='median'), 
#     RandomForestClassifier(n_estimators=100, random_state=7, n_jobs=-1)
# )
eval_set2 = [(X2_train, y2_train), 
            (X2_test, y2_test)]
# Fit on train, score on val
model.fit(X2_train, y2_train, eval_set=eval_set2, 
          eval_metric=['merror'], early_stopping_rounds=100)
#model.fit(X2_train, y2_train)
#print('Validation Accuracy', pipeline.score(X2_test_transformed, y2_test))

[0]	validation_0-merror:0.110991	validation_1-merror:0.420601
Multiple eval metrics have been passed: 'validation_1-merror' will be used for early stopping.

Will train until validation_1-merror hasn't improved in 100 rounds.
[1]	validation_0-merror:0.075431	validation_1-merror:0.44206
[2]	validation_0-merror:0.056034	validation_1-merror:0.437768
[3]	validation_0-merror:0.051724	validation_1-merror:0.424893
[4]	validation_0-merror:0.043103	validation_1-merror:0.412017
[5]	validation_0-merror:0.038793	validation_1-merror:0.412017
[6]	validation_0-merror:0.037716	validation_1-merror:0.412017
[7]	validation_0-merror:0.034483	validation_1-merror:0.403433
[8]	validation_0-merror:0.02694	validation_1-merror:0.390558
[9]	validation_0-merror:0.020474	validation_1-merror:0.39485
[10]	validation_0-merror:0.017241	validation_1-merror:0.386266
[11]	validation_0-merror:0.015086	validation_1-merror:0.386266
[12]	validation_0-merror:0.011853	validation_1-merror:0.386266
[13]	validation_0-merror:0.010

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=15,
              min_child_weight=1, missing=None, n_estimators=1000, n_jobs=-1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [40]:
permuter.fit(X2_test, y2_test)

feature_names = X2_test.columns.tolist()
pd.Series(permuter.feature_importances_, feature_names).sort_values(ascending=False)

eli5.show_weights(
    permuter, 
    top=None, # show permutation importances for all features
    feature_names=feature_names
)

Exception ignored in: <function Booster.__del__ at 0x000001E90E8D69D8>
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py", line 957, in __del__
    if self.handle is not None:
AttributeError: 'Booster' object has no attribute 'handle'
Exception ignored in: <function Booster.__del__ at 0x000001E90E8D69D8>
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py", line 957, in __del__
    if self.handle is not None:
AttributeError: 'Booster' object has no attribute 'handle'


Weight,Feature
0.0609  ± 0.0064,latitude
0.0515  ± 0.0154,color
0.0489  ± 0.0150,drink_level
0.0455  ± 0.0088,Ulatitude
0.0343  ± 0.0163,birth_year
0.0326  ± 0.0228,Ulongitude
0.0292  ± 0.0226,personality
0.0283  ± 0.0193,interest
0.0283  ± 0.0207,weight
0.0258  ± 0.0292,longitude


In [44]:
model = XGBClassifier(n_estimators=1000, n_jobs=-1)
model.fit(X2_train, y2_train, eval_set=eval_set2, eval_metric='merror', 
          early_stopping_rounds=100)

[0]	validation_0-merror:0.44181	validation_1-merror:0.433476
Multiple eval metrics have been passed: 'validation_1-merror' will be used for early stopping.

Will train until validation_1-merror hasn't improved in 100 rounds.
[1]	validation_0-merror:0.4375	validation_1-merror:0.437768
[2]	validation_0-merror:0.435345	validation_1-merror:0.437768
[3]	validation_0-merror:0.427802	validation_1-merror:0.433476
[4]	validation_0-merror:0.421336	validation_1-merror:0.450644
[5]	validation_0-merror:0.407328	validation_1-merror:0.424893
[6]	validation_0-merror:0.380388	validation_1-merror:0.39485
[7]	validation_0-merror:0.380388	validation_1-merror:0.416309
[8]	validation_0-merror:0.378233	validation_1-merror:0.416309
[9]	validation_0-merror:0.377155	validation_1-merror:0.416309
[10]	validation_0-merror:0.363147	validation_1-merror:0.424893
[11]	validation_0-merror:0.357759	validation_1-merror:0.403433
[12]	validation_0-merror:0.352371	validation_1-merror:0.403433
[13]	validation_0-merror:0.3459

[128]	validation_0-merror:0.200431	validation_1-merror:0.407725
[129]	validation_0-merror:0.195043	validation_1-merror:0.407725
[130]	validation_0-merror:0.190733	validation_1-merror:0.407725
[131]	validation_0-merror:0.190733	validation_1-merror:0.407725
[132]	validation_0-merror:0.192888	validation_1-merror:0.403433
[133]	validation_0-merror:0.189655	validation_1-merror:0.407725
[134]	validation_0-merror:0.1875	validation_1-merror:0.403433
[135]	validation_0-merror:0.19181	validation_1-merror:0.403433
[136]	validation_0-merror:0.1875	validation_1-merror:0.403433
[137]	validation_0-merror:0.188578	validation_1-merror:0.403433
Stopping. Best iteration:
[37]	validation_0-merror:0.284483	validation_1-merror:0.377682



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=1000, n_jobs=-1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

## To run a shapley model I had to make my test binary

In [114]:
y2_train_bool = y2_train.copy()

for i in range(len(y2_train_bool)):
    #print(i)
    y2_train_bool.iloc[i] = 1 if y2_train_bool.iloc[i] > 0 else 0

y2_test_bool = y2_test.copy()
    
for i in range(len(y2_test_bool)):
    #print(i)
    y2_test_bool.iloc[i] = 1 if y2_test_bool.iloc[i] > 0 else 0

In [120]:
eval_set3 = [(X2_train, y2_train_bool), 
            (X2_test, y2_test_bool)]

model = XGBClassifier(n_estimators=1000, n_jobs=-1)
model.fit(X2_train, y2_train_bool, eval_set=eval_set3, eval_metric='auc', 
          early_stopping_rounds=100)

[0]	validation_0-auc:0.677747	validation_1-auc:0.676612
Multiple eval metrics have been passed: 'validation_1-auc' will be used for early stopping.

Will train until validation_1-auc hasn't improved in 100 rounds.
[1]	validation_0-auc:0.677747	validation_1-auc:0.676612
[2]	validation_0-auc:0.753043	validation_1-auc:0.725355
[3]	validation_0-auc:0.780453	validation_1-auc:0.709126
[4]	validation_0-auc:0.781669	validation_1-auc:0.714372
[5]	validation_0-auc:0.787235	validation_1-auc:0.730601
[6]	validation_0-auc:0.787847	validation_1-auc:0.733388
[7]	validation_0-auc:0.797129	validation_1-auc:0.739454
[8]	validation_0-auc:0.797577	validation_1-auc:0.743169
[9]	validation_0-auc:0.814425	validation_1-auc:0.759399
[10]	validation_0-auc:0.815368	validation_1-auc:0.759618
[11]	validation_0-auc:0.819236	validation_1-auc:0.763771
[12]	validation_0-auc:0.822468	validation_1-auc:0.766831
[13]	validation_0-auc:0.827384	validation_1-auc:0.768087
[14]	validation_0-auc:0.837127	validation_1-auc:0.7647

[142]	validation_0-auc:0.96758	validation_1-auc:0.83082
[143]	validation_0-auc:0.96823	validation_1-auc:0.831475
[144]	validation_0-auc:0.968457	validation_1-auc:0.831913
[145]	validation_0-auc:0.968548	validation_1-auc:0.831475
[146]	validation_0-auc:0.968893	validation_1-auc:0.831803
[147]	validation_0-auc:0.969231	validation_1-auc:0.831366
[148]	validation_0-auc:0.969673	validation_1-auc:0.828415
[149]	validation_0-auc:0.970018	validation_1-auc:0.82918
[150]	validation_0-auc:0.970694	validation_1-auc:0.826995
[151]	validation_0-auc:0.970558	validation_1-auc:0.824918
[152]	validation_0-auc:0.970818	validation_1-auc:0.826011
[153]	validation_0-auc:0.970753	validation_1-auc:0.826011
[154]	validation_0-auc:0.970844	validation_1-auc:0.826885
[155]	validation_0-auc:0.97087	validation_1-auc:0.825792
[156]	validation_0-auc:0.97111	validation_1-auc:0.827432
[157]	validation_0-auc:0.971149	validation_1-auc:0.827104
[158]	validation_0-auc:0.971221	validation_1-auc:0.825902
[159]	validation_0-a

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=1000, n_jobs=-1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [123]:
from sklearn.metrics import roc_auc_score
class_index = 1
y_pred_proba = model.predict_proba(X2_test)[:, class_index]
print(f'Test ROC AUC for class {class_index}:')
print(roc_auc_score(y2_test_bool, y_pred_proba)) # Ranges from 0-1, higher is better

Test ROC AUC for class 1:
0.8442622950819672


In [124]:
permuter.fit(X2_test, y2_test_bool)

feature_names = X2_test.columns.tolist()
pd.Series(permuter.feature_importances_, feature_names).sort_values(ascending=False)

eli5.show_weights(
    permuter, 
    top=None, # show permutation importances for all features
    feature_names=feature_names
)

Weight,Feature
0.0506  ± 0.0422,weight
0.0361  ± 0.0139,drink_level
0.0335  ± 0.0522,userID
0.0275  ± 0.0088,interest
0.0240  ± 0.0177,longitude
0.0232  ± 0.0258,Ulatitude
0.0215  ± 0.0180,hijos
0.0155  ± 0.0419,latitude
0.0137  ± 0.0148,price
0.0103  ± 0.0177,Ulongitude


In [166]:
testing_if_works = df['placeID']

train_id, test_id = train_test_split(testing_if_works, test_size=0.2, random_state=7)

In [167]:
test_id

641     135055
669     132866
1093    132955
590     135025
109     134976
         ...  
507     135045
98      132723
63      132869
284     135055
938     135062
Name: placeID, Length: 233, dtype: int64

In [171]:
df2 = pd.DataFrame({
    #'user_id': userID,
    'placeID': test_id,
    'pred_proba': y_pred_proba, 
    'status_group': y2_test_bool
})

df2 = df2.merge(
     X2_test[['placeID', 'userID', 'weight', 'drink_level', 'interest', 'hijos', 'birth_year']], 
     how='left'
)
df2.head()

Unnamed: 0,placeID,pred_proba,status_group,userID,weight,drink_level,interest,hijos,birth_year
0,135055,0.94126,1,137,65,0,4,1,1986
1,135055,0.94126,1,114,50,2,2,1,1990
2,135055,0.94126,1,17,52,1,4,1,1991
3,132866,0.923983,1,59,68,2,3,1,1989
4,132866,0.923983,1,12,65,2,1,3,1990


In [173]:
rating_one = df2['status_group'] == 1
rating_zero = ~rating_one
right = (rating_one) == (df2['pred_proba'] > 0.50)
wrong = ~right

In [175]:
df2[rating_one & right].sample(n=30, random_state=1).sort_values(by='pred_proba')

Unnamed: 0,placeID,pred_proba,status_group,userID,weight,drink_level,interest,hijos,birth_year
86,135071,0.509959,1,15,62,2,1,1,1991
87,135071,0.509959,1,29,87,2,3,1,1989
677,135042,0.598199,1,5,66,1,4,2,1988
164,135043,0.625148,1,27,74,0,4,1,1991
289,135032,0.655571,1,69,75,2,3,1,1991
577,135025,0.808054,1,91,65,1,3,1,1989
695,134986,0.829585,1,48,53,0,3,1,1992
133,135038,0.860538,1,20,55,1,1,3,1991
647,135064,0.870104,1,137,65,0,4,1,1986
278,135041,0.879547,1,26,49,2,4,1,1991


In [179]:
row = X_test[features2].iloc[[86]]
row.T

Unnamed: 0,952
userID,U1090
placeID,135085
Ulatitude,22.1585
Ulongitude,-100.984
smoker,False
drink_level,2
dress_preference,3
ambience,1
marital_status,1
hijos,1


In [None]:
import shap

explainer = shap.TreeExplainer(model)
row_processed = transformers.transform(row)
shap_values = explainer.shap_values(row_processed)

shap.initjs()
shap.force_plot(
    base_value=explainer.expected_value, 
    shap_values=shap_values, 
    features=row
)

In [None]:
!pip install shap