# Interpretable Machine Learning
## Exercise Sheet: 7
## This exercise sheet covers chapters 9.2 and 9.3 from the IML book by Christoph Molnar
Kristin Blesch (blesch@leibniz-bips.de)<br>
Niklas Koenen (koenen@leibniz-bips.de)
<hr style="border:1.5px solid gray"> </hr>

# 1) LIME 

**a)** Please describe the general idea behind LIME and how the procedure works. Make sure to explain why this method is considered local and what a surrogate model is.

**Solution:**

Relevant passages of text from 9.2:
"LIME focuses on training local surrogate models to explain individual predictions."
"Local surrogate models are interpretable models that are used to explain individual predictions of black box machine learning models."
"LIME tests what happens to the predictions when you give variations of your data into the machine learning model. LIME generates a new dataset consisting of perturbed samples and the corresponding predictions of the black box model. On this new dataset LIME then trains an interpretable model, which is weighted by the proximity of the sampled instances to the instance of interest."
"The recipe for training local surrogate models:
- Select your instance of interest for which you want to have an explanation of its black box prediction.
- Perturb your dataset and get the black box predictions for these new points.
- Weight the new samples according to their proximity to the instance of interest.
- Train a weighted, interpretable model on the dataset with the variations.
- Explain the prediction by interpreting the local model."

**b)** Lime with tabular data

For an application of LIME, consider the gradient boosted california housing model you fitted last week (run code below). 

Use LIME to give local explanations of the first and the fifth instance of the test data! Use a local ridge regression (default), discretize the data and then give an explanation using 4 features only. Visualize your results using an adequate plot and interpret the output! 

**Solution:**

In [1]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

# Get data
cal_housing = fetch_california_housing()
X = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)
y = cal_housing.target

# Get train and test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

# Train model and calculate R²-score
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor

model = HistGradientBoostingRegressor(random_state = 42)
model.fit(X_train, y_train)
model.predict(X_train)

array([0.76154181, 1.62701009, 2.59560924, ..., 2.73226183, 4.32333119,
       0.59749722])

In [None]:
# explaination using LIME
import numpy as np
import lime.lime_tabular


In [None]:
explainer = lime.lime_tabular.LimeTabularExplainer(np.array(X_train),
                    feature_names=cal_housing.feature_names, 
                    discretize_continuous = True,              
                    verbose=True, mode='regression')

In [None]:
exp = explainer.explain_instance(X_test.iloc[0], 
     model.predict,
     num_features=4) # using 4 features only
expe = explainer.explain_instance(X_test.iloc[4], 
     model.predict, num_features=4) # using 4 features only
expe.as_pyplot_figure()
exp.as_pyplot_figure()

# 2) Counterfactual Explanations


What is a counterfactual explanation? Explain the concept using an example. Furthermorre, describe how this perspective applies to the context of interpretable machine learning: Do we explain the causal structure of the data?What makes a counterfactual explanation a good explanation? 


**Solution:**

Relevant passages of text from 9.3: 
"A counterfactual explanation describes a causal situation in the form: “If X had not occurred, Y would not have occurred”." 
"Thinking in counterfactuals requires imagining a hypothetical reality that contradicts the observed facts (for example, a world in which I have not drunk the hot coffee), hence the name “counterfactual”."
"Even if in reality the relationship between the inputs and the outcome to be predicted might not be causal, we can see the inputs of a model as the cause of the prediction. The inputs cause the prediction (not necessarily reflecting the real causal relation of the data)."
A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output.

What makes a counterfactual explanation a good explanation: 
"An obvious first requirement is that a counterfactual instance produces the predefined prediction as closely as possible. [...] counterfactual should be as similar as possible to the instance regarding feature values [...]The counterfactual should not only be close to the original instance, but should also change as few features as possible. [...] Third, it is often desirable to generate multiple diverse counterfactual explanations so that the decision-subject gets access to multiple viable ways of generating a different outcome. [...] The last requirement is that a counterfactual instance should have feature values that are likely. "


**b)** Consider the model for california housing in task 1 a). Try to find at least 2 counterfactual explanations for the first instance in the test set that would at least double the predicted value. 
What are the advantages and disadvantages of counterfactual explanations? Use the counterfactual explanations you found to vividly explain the advantages/disadvantages!

In [3]:
X_test.iloc[0]

MedInc           4.151800
HouseAge        22.000000
AveRooms         5.663073
AveBedrms        1.075472
Population    1551.000000
AveOccup         4.180593
Latitude        32.580000
Longitude     -117.050000
Name: 14740, dtype: float64

In [8]:
model.predict(X_test.iloc[0:1]) # predicted value that should be doubled

array([1.58096968])

**manipulate input to get a counterfactual prediction, i.e. set new values for certain variables such that the predicted value doubles (trial and error)**

In [99]:
manipulated_instance = X_test.iloc[0:1].replace(to_replace = {'MedInc':X_test['MedInc'].iloc[0]}, value= '2')
model.predict(manipulated_instance) # not yet at desired level -> increas MedInc further

array([1.06671371])

In [100]:
manipulated_instance = X_test.iloc[0:1].replace(to_replace = {'MedInc':X_test['MedInc'].iloc[0]}, value= '8')
model.predict(manipulated_instance) # prediction now above desired level

array([3.5203337])

In [97]:
manipulated_instance = X_test.iloc[0:1].replace(to_replace = {'Longitude':X_test['Longitude'].iloc[0]}, value= '-150')
model.predict(manipulated_instance)

array([3.60136852])

Advantages listed in 9.3: "The interpretation of counterfactual explanations is very clear. [...] The counterfactual method does not require access to the data or the model.[...] The method works also with systems that do not use machine learning. [...] The counterfactual explanation method is relatively easy to implement."
Disadvantages: "For each instance you will usually find multiple counterfactual explanations (Rashomon effect). "

As we can see from the example above, a sequence of trial and error in replacing certain feature values leads to the desired counterfactual of the predictive value being doubled. However, there are many such combinations possible (Rashomon effect) and some values might be unrealistic. 