## Local Interpretable Model-agnostic Explanations

LIME (Local Interpretable Model-agnostic Explanations) is a model-agnostic technique used for explaining the predictions of machine learning models. It generates local explanations for individual predictions. Below is an example of how to use LIME with a Random Forest Regressor for the same "Diabetes" dataset.

In this code:

* We load the "Diabetes" dataset and train a Gradient Boosting Classifier, just like in the previous examples.

* We create a LIME explainer using LimeTabularExplainer. LIME requires the training data, and we set the mode to "regression" since we are working with a regression problem.

* We choose a prediction to explain, in this case, we use the prediction for the first test data point.

* We explain the prediction using the LIME explainer, which generates an explanation.

* Finally, we visualize the explanation using the show_in_notebook method.

This will provide a local explanation for how the Random Forest model arrived at its prediction for the selected data point. You can use LIME to explain predictions for other data points as well.

In [None]:
# First we install LIME
!pip install lime
# upgrading sklearn to the latest version
!pip install scikit-learn --upgrade

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from lime.lime_tabular import LimeTabularExplainer

In [None]:
random_state = 42

# Load the Diabetes dataset
diabetes_data = load_diabetes(scaled=False)
X = pd.DataFrame(diabetes_data.data, columns=diabetes_data.feature_names)
y = diabetes_data.target

# Set a threshold for binary classification (e.g., using the median of y)
threshold = np.median(y)
y_binary = (y > threshold).astype(int)  # 1 for high risk, 0 for low risk

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=random_state)

In [None]:
# Create and fit the Random Forest Classifier model
model = GradientBoostingClassifier(n_estimators=100, random_state=random_state)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on Test Data:", accuracy)

In [None]:
# Create a LIME explainer with feature names, 
# kernel_width – kernel width for the exponential kernel. If None, defaults to sqrt (number of columns) * 0.75
# kernel – Function that transforms an array of distances into an array of proximity values (floats).
# similarity kernel takes Euclidean distances and kernel width as input and outputs weights in (0,1). 
# If None, defaults to an exponential kernel: np.sqrt(np.exp(-(d ** 2) / kernel_width ** 2))
explainer = LimeTabularExplainer(X_train.values, mode="classification", training_labels=y_train, 
                                 feature_names=X.columns, categorical_features=[1],
                                 sample_around_instance=True, kernel_width=None, random_state=42)

In [None]:
# Explain the prediction
sample_ind = 1
# model_regressor – sklearn regressor to be used in the explanation. Defaults to Ridge regression in LimeBase. 
# Must have model_regressor.coef_ and ‘sample_weight’ as a parameter to model_regressor.fit()
# sampling_method – Method to sample synthetic data. Defaults to Gaussian sampling. Can also use Latin Hypercube Sampling.
explanation = explainer.explain_instance(data_row=X_test.iloc[sample_ind].values, predict_fn=model.predict_proba)

# Visualize the explanation
explanation.show_in_notebook()


**Exercise 3.1:** Have a look at the documentation of LimeTabularExplainer (at https://lime-ml.readthedocs.io/en/latest/lime.html#lime.lime_tabular.LimeTabularExplainer) and explain_instance (at https://lime-ml.readthedocs.io/en/latest/lime.html#lime.lime_tabular.LimeTabularExplainer.explain_instance), and study their different options for input arguments.

**Exercise 3.2:** Try the LIME explainer `sample_around_instance=False` and `sample_around_instance=True` for a few (5) samples. Do the results differ? What are pros and cons of using each option?  