## Introduction

One of the most basic questions we might ask of a model is *What features have the biggest impact on predictions?*

This concept is called **feature importance**. I've seen feature importance used effectively many times for every purpose in the list of use cases above.

There are multiple ways to measure feature importance. Some approaches answer subtly different versions of the question above. Other approaches have documented shortcomings.

In this lesson, we'll focus on *permutation importance*. Compared to most other approaches, permutation importance is:

* Fast to calculate
* Widely used and understood
* Consistent with properties we would want a feature importance measure to have


## How it works ?

the process is as follows:

1. Get a trained model
2. Shuffle the values in a single column, make predictions using the resulting dataset. Use these predictions and the true target values to calculate how much the loss function suffered from shuffling. That performance deterioration measures the importance of the variable you just shuffled.
3. Return the data to the original order (undoing the shuffle from step 2.) Now repeat step 2 with the next column in the dataset, until you have calculated the importance of each column.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [2]:
# data
data = pd.read_csv("./data/FIFA_2018_Statistics.csv")
data.head()

Unnamed: 0,Date,Team,Opponent,Goal Scored,Ball Possession %,Attempts,On-Target,Off-Target,Blocked,Corners,...,Yellow Card,Yellow & Red,Red,Man of the Match,1st Goal,Round,PSO,Goals in PSO,Own goals,Own goal Time
0,14-06-2018,Russia,Saudi Arabia,5,40,13,7,3,3,6,...,0,0,0,Yes,12.0,Group Stage,No,0,,
1,14-06-2018,Saudi Arabia,Russia,0,60,6,0,3,3,2,...,0,0,0,No,,Group Stage,No,0,,
2,15-06-2018,Egypt,Uruguay,0,43,8,3,3,2,0,...,2,0,0,No,,Group Stage,No,0,,
3,15-06-2018,Uruguay,Egypt,1,57,14,4,6,4,5,...,0,0,0,Yes,89.0,Group Stage,No,0,,
4,15-06-2018,Morocco,Iran,0,64,13,3,6,4,5,...,1,0,0,No,,Group Stage,No,0,1.0,90.0


In [3]:
# convert from string "Yes/No" to bool
y = (data['Man of the Match'] == "Yes")
y[:5]
     

0     True
1    False
2    False
3     True
4    False
Name: Man of the Match, dtype: bool

In [4]:
feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]
feature_names

['Goal Scored',
 'Ball Possession %',
 'Attempts',
 'On-Target',
 'Off-Target',
 'Blocked',
 'Corners',
 'Offsides',
 'Free Kicks',
 'Saves',
 'Pass Accuracy %',
 'Passes',
 'Distance Covered (Kms)',
 'Fouls Committed',
 'Yellow Card',
 'Yellow & Red',
 'Red',
 'Goals in PSO']

In [5]:
X = data[feature_names]

In [6]:
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=1)

In [7]:
model = RandomForestClassifier(random_state=0).fit(train_X, train_y)



#### Here is how to calculate and show importance with the [eli5](https://eli5.readthedocs.io/en/latest/) library:

In [8]:
import eli5
from eli5.sklearn import PermutationImportance

In [9]:
perm = PermutationImportance(model, random_state=1).fit(test_X, test_y)
eli5.show_weights(perm, feature_names=test_X.columns.tolist())

Weight,Feature
0.0750  ± 0.1159,Goal Scored
0.0625  ± 0.0791,Corners
0.0437  ± 0.0500,Distance Covered (Kms)
0.0375  ± 0.0729,On-Target
0.0375  ± 0.0468,Free Kicks
0.0187  ± 0.0306,Blocked
0.0125  ± 0.0750,Pass Accuracy %
0.0125  ± 0.0500,Yellow Card
0.0063  ± 0.0468,Saves
0.0063  ± 0.0250,Offsides


### Interpreting Permutation Importances

The values towards the top are the most important features, and those towards the bottom matter least.

The *first number in each row* shows *how much model performance decreased with a random shuffling* (in this case, using *"accuracy"* as the performance metric).

Like most things in data science, there is some randomness to the exact performance change from a shuffling a column. We measure the amount of randomness in our permutation importance calculation by repeating the process with multiple shuffles. The *number after the ± measures how performance varied from one-reshuffling to the next*.

You'll occasionally see *negative values* for permutation importances. In those cases, the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn't matter (should have had an importance close to 0), but random chance caused the predictions on shuffled data to be more accurate. This is more common with small datasets, like the one in this example, because there is more room for luck/chance.

In our example, the most important feature was **Goals scored**. That seems sensible. Soccer fans may have some intuition about whether the orderings of other variables are surprising or not.