# LIME with XGBoost

In this notebook, we will again use the Titanic dataset, but this time we will use the LIME package to explain the predictions of an XGBoost model. 

In [None]:
# Install the necessary libraries

# !pip install -q dalex xgboost lime

In [None]:
import dalex as dx
import xgboost
import lime

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

### Load and Preprocess Data

`lime.lime_tabular.LimeTabularExplainer` assumes integer-encoded categorical variables by the following parameters:

- `categorical_features` – list of indices (ints) corresponding to the categorical columns. Everything else will be considered continuous. Values in these columns MUST be integers.

- `categorical_names` – map from int to list of names, where `categorical_names[x][y]` represents the name of the yth value of column x.

But, XGBoost assumes categorical variables of strict category type.

The challenge is to make one work with the other. First, let's use one-hot encoding.

In [None]:
df = dx.datasets.load_titanic()

X = df.drop(columns='survived')
X = pd.get_dummies(X, columns=['gender', 'class', 'embarked'], drop_first=True)
y = df.survived

In [None]:
# Split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Train the Model

In [None]:
model = xgboost.XGBClassifier(
    n_estimators=200,
    max_depth=4,
    use_label_encoder=False,
    eval_metric="logloss"
)
model.fit(X_train, y_train)

### Explain the Model with LIME & dalex

dalex uses the original lime package to estimate LIME under a unified API.

dalex aims to improve the user's conveninence by:

- combining the use of LimeTabularExplainer and explain_instance() into the one predict_surrogate() method,
- automatically setting some of the lime parameters based on explainer.data, explainer.model_type etc.


In [None]:
explainer = dx.Explainer(model, X_train, y_train, label='XGBoost')

In [None]:
explainer.model_performance(cutoff=y.mean())

In [None]:
observation = X.iloc[[0]]
explainer.predict(observation)

In [None]:
predict_fn = lambda x: model.predict_proba(x).astype(float)
explanation = explainer.predict_surrogate(observation, predict_fn=predict_fn)

In [None]:
explanation.result

In [None]:
explanation.plot()

Be careful! LIME algorithm, like many other explanations, involves randomness

In [None]:
import random
import matplotlib.pyplot as plt

for seed in range(4):
    random.seed(seed)
    np.random.seed(seed)
    exp = explainer.predict_surrogate(observation, predict_fn=predict_fn)
    exp.plot(return_figure=True)
    plt.title(f'Explanation for observation id0 assuming random seed is {seed}')

### Explain with Lime

In [None]:
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X_train.columns,
    mode='classification',
)

In [None]:
lime_explanation = lime_explainer.explain_instance(
    data_row=observation.iloc[0],
    predict_fn=lambda d: model.predict_proba(d)
)   

In [None]:
lime_explanation.as_list()

In [None]:
lime_explanation.as_pyplot_figure()

In [None]:
lime_explanation.show_in_notebook()