# Interpretable Machine Learning
## Exercise Sheet: 4
## This exercise sheet covers chapters 6, 8.1 + 8.2 from the IML book by Christoph Molnar

Kristin Blesch (blesch@leibniz-bips.de)<br>
Niklas Koenen (koenen@leibniz-bips.de)
<hr style="border:1.5px solid gray"> </hr>

# 1) Partial Dependence Plot (PDP)

In this task, the concept of Partial Dependence Plots (PDP) will be explained. For this, you have to train a multilayer perceptron (MLP) on the regression dataset `sklearn.datasets.fetch_california_housing` and then implement the PDP method on your own.

## a) Data
Load the data and get familiar with it by extracting the features $X$ as a `pandas.DataFrame` and the target $Y$. Then, answer the following questions:

- What are the features and what are their types (numeric, binary, categorical)?
- What is the target outcome?

**Solution:**

In [None]:
import pandas as pd
from sklearn.datasets import fetch_california_housing

# Get features
X = ... # to do

# Get target
Y = ... # to do

## b) Train MLP
Create training and test data using the function `train_test_split` and train the predefined MLP model. Then calculate the $R^2$ value on the test data using the [score](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor.score) method.

**Hint:** Use function `fit(X_train, y_train)`.

**Solution:**

In [None]:
# Create train and test split
from sklearn.model_selection import train_test_split

#
# to do
#

# Define model
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import QuantileTransformer
from sklearn.neural_network import MLPRegressor

model =  make_pipeline(
    QuantileTransformer(),
    MLPRegressor(
        hidden_layer_sizes=(50, 50), learning_rate_init=0.01, early_stopping=True
    )
)

# Train the model on the trainings data

#
# to do
#

# Calculate the R^2- value

#
# to do
#

## c) Implement PDP
**i)** Now let's apply the method to the feature `HouseAge`. Crate a PDP for this feature.

**Hint:** Get the minimum and maximum value of this feature and create a vector with the intermediate values (`np.linspace(min, max, 100)`). Insert these values into the test data in a loop and calculate the average output. Finally, create a plot with `plt.plot`.

**Solution:**

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#
# to do
#

**ii)** Create another PDP for this feature with the predefined function [`PartialDependenceDisplay`](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.PartialDependenceDisplay.html#sklearn-inspection-partialdependencedisplay) and check your implementation.

**Solution:**

In [None]:
from sklearn.inspection import PartialDependenceDisplay

#
# to do
#

**iii)** What do these plots tell us? Can they indicate the importance of the feature and if not, how do you get it?

**Solution:**

# 2) Accumulated Local Effects (ALE) Plots

## a) Theory


**i)** What is there problem with PDPs and why can't we blindly trust PDP-based Feature Importance?

**Solution:**

**ii)** Explain the concepts of PDP, M-Plots and ALE plots and their differences. What are the respective advantages and disadvantages of each method compared to the other two?

**Solution:**

# b) Example: ALE Plot

**i)** Install the package [ALEPython](https://github.com/blent-ai/ALEPython) from Github and create an ALE plot for the example from task **1)** as well for the feature `HouseAge`.

**Hint:** See the 'Install' and 'Usage' section on the Github page.

**Solution:**

In [None]:
# Install package
# To use a code chunk as a console the line must start with '!'.

#
# to do
#

In [None]:
# Create the ALE plot (set 'monte_carlo = False')

#
# to do
#

**II)** How does the ALE plot differ from the PDP in task 1? Give possible reasons for the similarities or differences.

**Solution:**