# Permutation Importance Introduction

One of the most basic questions we might ask of a model is: What features have the biggest impact on predictions?

This concept is called feature importance.

There are multiple ways to measure feature importance. Some approaches answer subtly different versions of the question above. Other approaches have documented shortcomings.

In this lesson, we'll focus on permutation importance. Compared to most other approaches, permutation importance is:

- fast to calculate,
- widely used and understood, and
- consistent with properties we would want a feature importance measure to have.


Randomly re-ordering a single column should cause less accurate predictions, since the resulting data no longer corresponds to anything observed in the real world. Model accuracy especially suffers if we shuffle a column that the model relied on heavily for predictions. In this case, shuffling height at age 10 would cause terrible predictions. If we shuffled socks owned instead, the resulting predictions wouldn't suffer nearly as much.

With this insight, the process is as follows:

- Get a trained model.
- Shuffle the values in a single column, make predictions using the resulting dataset. Use these predictions and the true target values to calculate how much the loss function suffered from shuffling. That performance deterioration measures the importance of the variable you just shuffled.
- Return the data to the original order (undoing the shuffle from step 2). Now repeat step 2 with the next column in the dataset, until you have calculated the importance of each column.


- <b>Permutation importance is calculated after a model has been fitted upon validation dataset (Post-hog)</b>

In [1]:
!jupyter nbextension enable --py widgetsnbextension 

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [62]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.feature_extraction import DictVectorizer

import xgboost as xgb
from xgboost import XGBRFClassifier as xgbclassifier
from xgboost import DMatrix

import eli5
from eli5.sklearn import PermutationImportance, explain_prediction
from eli5 import show_prediction

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

%matplotlib ipympl

In [37]:
df = pd.read_csv('../data/titanic.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,0,1,0.0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,1,2,1.0,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,2,3,1.0,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,3,4,1.0,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,4,5,0.0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [97]:
'''
Important: Permutation Importance can only use numbers

Normally, XGBoost only works with number, for this example
we just exclude text values for simplicity, but we can use encoding or mapping to 
convert text values into number values 
Name, Sex, Ticket, Cabin, Embarked
'''
df = df.loc[:, ~df.columns.isin(['Unnamed: 0', 'PassengerId', 'Name', 'Cabin', 'Ticket'])] #exclude Name, Cabin and Ticket 

'''
mapping categorical into numerical
'''
df.Sex = df.Sex.map({'male': 1, 'female': 0})
df.Embarked = df.Embarked.map({'S': 0, 'C': 1, 'Q': 1})

In [98]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0.0,3,1,22.0,1,0,7.25,0.0
1,1.0,1,0,38.0,1,0,71.2833,1.0
2,1.0,3,0,26.0,0,0,7.925,0.0
3,1.0,1,0,35.0,1,0,53.1,0.0
4,0.0,3,1,35.0,0,0,8.05,0.0


In [99]:
'''
train test split
'''
X = df.drop('Survived', axis='columns')
y = df.Survived
X.fillna(-999, inplace=True)
y.fillna(0, inplace=True)
X_train, X_validation, y_train, y_validation = train_test_split(X, y, 
                                                                train_size=0.75,
                                                                random_state=42)

In [100]:
params = {'base_score': 0.5, 
          'booster': 'gbtree',
          'max_depth': 3,
          'eval_metric': 'logloss'}
model = xgbclassifier(**params)
model.fit(X_train, y_train)

XGBRFClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                colsample_bytree=1, enable_categorical=False,
                eval_metric='logloss', gamma=0, gpu_id=-1, importance_type=None,
                interaction_constraints='', max_delta_step=0, max_depth=3,
                min_child_weight=1, missing=nan, monotone_constraints='()',
                n_estimators=100, n_jobs=4, num_parallel_tree=100,
                objective='binary:logistic', predictor='auto', random_state=0,
                reg_alpha=0, scale_pos_weight=1, tree_method='exact',
                validate_parameters=1, verbosity=None)

***

## Permutation Importances using scikit-learn wrapper

In [101]:
'''
Permutation importances uses trained model (using train dataset) upon validation dataset
and can only use float values
'''
pi_params = {
    'estimator': model, 
    'random_state': 42}
permutation_importances = PermutationImportance(**pi_params)
permutation_importances.fit(X_validation, y_validation)

PermutationImportance(estimator=XGBRFClassifier(base_score=0.5,
                                                booster='gbtree',
                                                colsample_bylevel=1,
                                                colsample_bytree=1,
                                                enable_categorical=False,
                                                eval_metric='logloss', gamma=0,
                                                gpu_id=-1, importance_type=None,
                                                interaction_constraints='',
                                                max_delta_step=0, max_depth=3,
                                                min_child_weight=1, missing=nan,
                                                monotone_constraints='()',
                                                n_estimators=100, n_jobs=4,
                                                num_parallel_tree=100,
                                        

In [102]:
'''
show weight permutation importances for features, only works with numeric features
'''
vec = DictVectorizer()
eli5.show_weights(permutation_importances, 
                  vec=vec, 
                  feature_names=X_validation.columns.tolist())

Weight,Feature
0.0909  ± 0.0383,Sex
0.0482  ± 0.0213,Pclass
0.0006  ± 0.0060,Age
0  ± 0.0000,Embarked
0  ± 0.0000,Parch
0  ± 0.0000,SibSp
-0.0024  ± 0.0046,Fare


In [103]:
'''
explain prediction
'''
sample = X_validation.iloc[1, :]
show_prediction(model, sample, show_feature_values=True)

Contribution?,Feature,Value
0.989,<BIAS>,1.0
0.516,Sex,1.0
0.032,Age,42.0
0.005,Parch,0.0
0.001,Embarked,0.0
-0.008,SibSp,0.0
-0.169,Fare,26.55
-0.229,Pclass,1.0
