# Partial Dependence Plots

While feature importance shows **what** variables most affect predictions, partial dependence plots show **how** a feature affects predictions.

This is useful to answer questions like: *Controlling for all other house features, what impact do longitude and latitude have on home prices?*

In [None]:
!pip install pdpbox

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

data = pd.read_csv('https://raw.githubusercontent.com/DataScienceUB/ExplainableDataScience/master/FIFA%202018%20Statistics.csv')
y = (data['Man of the Match'] == "Yes")  # Convert from string "Yes"/"No" to binary
feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]
X = data[feature_names]
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
tree_model = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_split=5).fit(train_X, train_y)

In [None]:
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

# Create the data that we will plot
pdp_goals = pdp.pdp_isolate(model=tree_model, 
                            dataset=val_X, 
                            model_features=feature_names, 
                            feature='Goal Scored')

# plot it
pdp.pdp_plot(pdp_goals, 'Goal Scored')
plt.show()

The `y` axis is interpreted as change in the prediction from what it would be predicted at the baseline or leftmost value.

A blue shaded area indicates level of confidence

From this particular graph, the explanation we can produce is: **Scoring a goal substantially increases your chances of winning "Player of The Game." But extra goals beyond that appear to have little impact on predictions.**

Here is another example plot:

In [None]:
feature_to_plot = 'Distance Covered (Kms)'
pdp_dist = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=feature_names, feature=feature_to_plot)

pdp.pdp_plot(pdp_dist, feature_to_plot)
plt.show()

**Question**: What is the explanation you can produce from this graph?

*Your answer here*