<a href="https://colab.research.google.com/github/KyleHaggin/DnD-class-predictor/blob/master/Models_and_Work.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science, Unit 2: Predictive Modeling

# Applied Modeling, Module 2

You will use your portfolio project dataset for all assignments this sprint.

## Assignment

Complete these tasks for your project, and document your work.

- [ ] Plot the distribution of your target. 
    - Classification problem: Are your classes imbalanced? Then, don't use just accuracy.
    - Regression problem: Is your target skewed? If so, let's discuss in Slack.
- [ ] Continue to clean and explore your data. Make exploratory visualizations.
- [ ] Fit a model. Does it beat your baseline?
- [ ] Try xgboost.
- [ ] Get your model's permutation importances.

You should try to complete an initial model today, because the rest of the week, we're making model interpretation visualizations.


## Reading

Top recommendations in _**bold italic:**_

#### Permutation Importances
- _**[Kaggle / Dan Becker: Machine Learning Explainability](https://www.kaggle.com/dansbecker/permutation-importance)**_
- [Christoph Molnar: Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/feature-importance.html)

#### (Default) Feature Importances
  - [Ando Saabas: Selecting good features, Part 3, Random Forests](https://blog.datadive.net/selecting-good-features-part-iii-random-forests/)
  - [Terence Parr, et al: Beware Default Random Forest Importances](https://explained.ai/rf-importance/index.html)

#### Gradient Boosting
  - [A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning](https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/)
  - _**[A Kaggle Master Explains Gradient Boosting](http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/)**_
  - [_An Introduction to Statistical Learning_](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf) Chapter 8
  - [Gradient Boosting Explained](http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html)
  - _**[Boosting](https://www.youtube.com/watch?v=GM3CDQfQ4sw) (2.5 minute video)**_

In [1]:
import os, sys
in_colab = 'google.colab' in sys.modules

# If you're in Colab...
if in_colab:
    # Pull files from Github repo
    os.chdir('/content')
    !git init .
    !git remote add origin https://github.com/LambdaSchool/DS-Unit-2-Applied-Modeling.git
    !git pull origin master
    # Install packages in Colab
    !pip install category_encoders==2.0.0
    !pip install eli5==0.10.1
    !pip install pandas-profiling==2.3.0
    !pip install pdpbox==0.2.0
    !pip install plotly==4.1.1
    !pip install shap==0.30.0
    
    # Install required python packages
    !pip install -r requirements.txt
    
    # Change into directory for module
    os.chdir('module2')

Reinitialized existing Git repository in /content/.git/
fatal: remote origin already exists.
From https://github.com/LambdaSchool/DS-Unit-2-Applied-Modeling
 * branch            master     -> FETCH_HEAD
Already up to date.


In [2]:
# import important libraries
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import category_encoders as ce 
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score
import eli5
from eli5.sklearn import PermutationImportance

Using TensorFlow backend.


In [0]:
# import the dataset
df = pd.read_csv('https://raw.githubusercontent.com/oganm/dndstats/master/docs/uniqueTable.tsv',
                 sep='\t')

In [4]:
# double check the import worked correctly
df.head()

Unnamed: 0,name,race,background,date,class,justClass,subclass,level,feats,HP,AC,Str,Dex,Con,Int,Wis,Cha,alignment,skills,weapons,spells,day,processedAlignment,good,lawful,processedRace,processedSpells,processedWeapons,levelGroup
0,22bf79,Deep Gnome,Acolyte,2018-07-24T22:37:38Z,Warlock 20,Warlock,The Celestial,20,,260,15,18,18,20,22,24,20,,Sleight of Hand|Stealth|History|Religion|Insig...,Dagger|Dagger,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,07 24 18,,,,Gnome,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,Dagger,19-20
1,b1bd6b,Human,Hermit,2018-08-01T01:25:29Z,Monk 20,Monk,Way of the Four Elements,20,Mobile|Observant,200,20,13,20,15,12,21,11,,Acrobatics|History|Religion|Insight|Medicine,Dagger|Dart|Unarmed strike,,07 31 18,,,,Human,,Dagger|Dart|Unarmed Strike,19-20
2,e9a755,Wood Elf,Guild Artisan,2018-07-09T17:23:10Z,Wizard 20,Wizard,School of Conjuration,20,Spell Sniper|Elven Accuracy|Elemental Adept|Wa...,122,13,10,16,14,20,14,12,,Arcana|Insight|Medicine|Perception|Persuasion,Quarterstaff,Blade Ward*0|Firebolt*0|Green-Flame Blade*0|Ra...,07 09 18,,,,Elf,Blade Ward*0|Fire Bolt*0|Green-Flame Blade*0|R...,Quarterstaff,19-20
3,68bf99,Kenku,Clan Crafter,2018-06-17T23:06:53Z,Warlock 20,Warlock,The Great Old One,20,Actor|War Caster,220,16,12,20,16,14,14,21,,Stealth|Arcana|History|Religion|Insight|Deception,"Crossbow, light|Dagger|Dagger",,06 17 18,,,,Kenku,,"Crossbow, Light|Dagger",19-20
4,7b7c1c,Human,Folk Hero,2018-09-08T19:42:47Z,Cleric 20,Cleric,War Domain,20,,203,24,20,16,20,12,20,15,,Religion|Animal Handling|Insight|Survival,Spear|ThePowerhead|GaeBolga,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,09 08 18,,,,Human,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,Spear,19-20


In [0]:
# create a has spells feature

# create a function that returns true if there is a spell or false if there is not
def check_spells(item):
  # check if the value is a float. This works because np.nan is a float value and the value will be a string if there are spells
  if isinstance(item, float):
    # if nan is found return false (no spells)
    return False
  else:
    # else return true (spell found)
    return True

# apply the function to the dataframe
df['has_spells'] = df['processedSpells'].apply(check_spells)

In [6]:
df.head()

Unnamed: 0,name,race,background,date,class,justClass,subclass,level,feats,HP,AC,Str,Dex,Con,Int,Wis,Cha,alignment,skills,weapons,spells,day,processedAlignment,good,lawful,processedRace,processedSpells,processedWeapons,levelGroup,has_spells
0,22bf79,Deep Gnome,Acolyte,2018-07-24T22:37:38Z,Warlock 20,Warlock,The Celestial,20,,260,15,18,18,20,22,24,20,,Sleight of Hand|Stealth|History|Religion|Insig...,Dagger|Dagger,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,07 24 18,,,,Gnome,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,Dagger,19-20,True
1,b1bd6b,Human,Hermit,2018-08-01T01:25:29Z,Monk 20,Monk,Way of the Four Elements,20,Mobile|Observant,200,20,13,20,15,12,21,11,,Acrobatics|History|Religion|Insight|Medicine,Dagger|Dart|Unarmed strike,,07 31 18,,,,Human,,Dagger|Dart|Unarmed Strike,19-20,False
2,e9a755,Wood Elf,Guild Artisan,2018-07-09T17:23:10Z,Wizard 20,Wizard,School of Conjuration,20,Spell Sniper|Elven Accuracy|Elemental Adept|Wa...,122,13,10,16,14,20,14,12,,Arcana|Insight|Medicine|Perception|Persuasion,Quarterstaff,Blade Ward*0|Firebolt*0|Green-Flame Blade*0|Ra...,07 09 18,,,,Elf,Blade Ward*0|Fire Bolt*0|Green-Flame Blade*0|R...,Quarterstaff,19-20,True
3,68bf99,Kenku,Clan Crafter,2018-06-17T23:06:53Z,Warlock 20,Warlock,The Great Old One,20,Actor|War Caster,220,16,12,20,16,14,14,21,,Stealth|Arcana|History|Religion|Insight|Deception,"Crossbow, light|Dagger|Dagger",,06 17 18,,,,Kenku,,"Crossbow, Light|Dagger",19-20,False
4,7b7c1c,Human,Folk Hero,2018-09-08T19:42:47Z,Cleric 20,Cleric,War Domain,20,,203,24,20,16,20,12,20,15,,Religion|Animal Handling|Insight|Survival,Spear|ThePowerhead|GaeBolga,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,09 08 18,,,,Human,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,Spear,19-20,True


In [0]:
# create a has feat feature

# create a function that returns true if it has a feat and false if there is not
def check_feats(item):
  if isinstance(item, float):
    return False
  else:
    return True

# apply the function to the dataframe
df['has_feats'] = df['feats'].apply(check_feats)

In [0]:
# create a HP per level feature
df['HP_per_level'] = df['HP'] / df['level']

In [9]:
df.head()

Unnamed: 0,name,race,background,date,class,justClass,subclass,level,feats,HP,AC,Str,Dex,Con,Int,Wis,Cha,alignment,skills,weapons,spells,day,processedAlignment,good,lawful,processedRace,processedSpells,processedWeapons,levelGroup,has_spells,has_feats,HP_per_level
0,22bf79,Deep Gnome,Acolyte,2018-07-24T22:37:38Z,Warlock 20,Warlock,The Celestial,20,,260,15,18,18,20,22,24,20,,Sleight of Hand|Stealth|History|Religion|Insig...,Dagger|Dagger,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,07 24 18,,,,Gnome,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,Dagger,19-20,True,False,13.0
1,b1bd6b,Human,Hermit,2018-08-01T01:25:29Z,Monk 20,Monk,Way of the Four Elements,20,Mobile|Observant,200,20,13,20,15,12,21,11,,Acrobatics|History|Religion|Insight|Medicine,Dagger|Dart|Unarmed strike,,07 31 18,,,,Human,,Dagger|Dart|Unarmed Strike,19-20,False,True,10.0
2,e9a755,Wood Elf,Guild Artisan,2018-07-09T17:23:10Z,Wizard 20,Wizard,School of Conjuration,20,Spell Sniper|Elven Accuracy|Elemental Adept|Wa...,122,13,10,16,14,20,14,12,,Arcana|Insight|Medicine|Perception|Persuasion,Quarterstaff,Blade Ward*0|Firebolt*0|Green-Flame Blade*0|Ra...,07 09 18,,,,Elf,Blade Ward*0|Fire Bolt*0|Green-Flame Blade*0|R...,Quarterstaff,19-20,True,True,6.1
3,68bf99,Kenku,Clan Crafter,2018-06-17T23:06:53Z,Warlock 20,Warlock,The Great Old One,20,Actor|War Caster,220,16,12,20,16,14,14,21,,Stealth|Arcana|History|Religion|Insight|Deception,"Crossbow, light|Dagger|Dagger",,06 17 18,,,,Kenku,,"Crossbow, Light|Dagger",19-20,False,True,11.0
4,7b7c1c,Human,Folk Hero,2018-09-08T19:42:47Z,Cleric 20,Cleric,War Domain,20,,203,24,20,16,20,12,20,15,,Religion|Animal Handling|Insight|Survival,Spear|ThePowerhead|GaeBolga,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,09 08 18,,,,Human,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,Spear,19-20,True,False,10.15


In [10]:
# drop uneeded columns due to high randomness or high variance
drop_columns_variance = ['name', 'date', 'day']
df = df.drop(columns = drop_columns_variance)

# drop columns due to leakage
drop_columns_leak = ['subclass', 'class']
df = df.drop(columns = drop_columns_leak)

# drop columns due to duplication
drop_columns_dup = ['race']
df = df.drop(columns = drop_columns_dup)

# check the head
df.head()

Unnamed: 0,background,justClass,level,feats,HP,AC,Str,Dex,Con,Int,Wis,Cha,alignment,skills,weapons,spells,processedAlignment,good,lawful,processedRace,processedSpells,processedWeapons,levelGroup,has_spells,has_feats,HP_per_level
0,Acolyte,Warlock,20,,260,15,18,18,20,22,24,20,,Sleight of Hand|Stealth|History|Religion|Insig...,Dagger|Dagger,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,,,,Gnome,Light*0|Sacred Flame*0|Eldritch Blast*0|Presti...,Dagger,19-20,True,False,13.0
1,Hermit,Monk,20,Mobile|Observant,200,20,13,20,15,12,21,11,,Acrobatics|History|Religion|Insight|Medicine,Dagger|Dart|Unarmed strike,,,,,Human,,Dagger|Dart|Unarmed Strike,19-20,False,True,10.0
2,Guild Artisan,Wizard,20,Spell Sniper|Elven Accuracy|Elemental Adept|Wa...,122,13,10,16,14,20,14,12,,Arcana|Insight|Medicine|Perception|Persuasion,Quarterstaff,Blade Ward*0|Firebolt*0|Green-Flame Blade*0|Ra...,,,,Elf,Blade Ward*0|Fire Bolt*0|Green-Flame Blade*0|R...,Quarterstaff,19-20,True,True,6.1
3,Clan Crafter,Warlock,20,Actor|War Caster,220,16,12,20,16,14,14,21,,Stealth|Arcana|History|Religion|Insight|Deception,"Crossbow, light|Dagger|Dagger",,,,,Kenku,,"Crossbow, Light|Dagger",19-20,False,True,11.0
4,Folk Hero,Cleric,20,,203,24,20,16,20,12,20,15,,Religion|Animal Handling|Insight|Survival,Spear|ThePowerhead|GaeBolga,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,,,,Human,Divine Favor*1|Shield of Faith*1|Magic Weapon*...,Spear,19-20,True,False,10.15


In [11]:
df['justClass'].value_counts()

Fighter                           104
Rogue                              87
Cleric                             79
Barbarian                          73
Paladin                            72
Ranger                             64
Sorcerer                           56
Wizard                             51
Monk                               50
Druid                              48
Bard                               37
Warlock                            36
Fighter|Barbarian                   9
Paladin|Sorcerer                    7
Fighter|Rogue                       6
Ranger|Rogue                        6
Fighter|Warlock                     6
Bard|Warlock                        5
Barbarian|Druid                     4
Rogue|Fighter                       4
Rogue|Warlock                       3
Mystic                              3
Sorcerer|Warlock                    3
Wizard|Fighter                      3
Barbarian|Fighter                   3
Rogue|Wizard                        2
Ranger|Warlo

In [0]:
# get our target
target = 'justClass'
targetAllowed = ['Fighter', 'Rogue', 'Cleric', 'Barbarian', 'Paladin', 'Ranger', 'Sorcerer', 'Wizard', 'Monk', 'Druid', 'Bard', 'Warlock']

In [0]:
# df = df[df[target] targetAllowed]
df = df.loc[df[target].isin(targetAllowed)]

In [14]:
df['justClass'].value_counts()

Fighter      104
Rogue         87
Cleric        79
Barbarian     73
Paladin       72
Ranger        64
Sorcerer      56
Wizard        51
Monk          50
Druid         48
Bard          37
Warlock       36
Name: justClass, dtype: int64

In [0]:
# get our features
train_features = df.drop(columns=target)
numeric_features = train_features.select_dtypes(include='number').columns.tolist()
cardinality = train_features.select_dtypes(exclude='number').nunique()
categorical_features = cardinality[cardinality <= 75].index.tolist()
features = numeric_features + categorical_features

In [16]:
features

['level',
 'HP',
 'AC',
 'Str',
 'Dex',
 'Con',
 'Int',
 'Wis',
 'Cha',
 'HP_per_level',
 'background',
 'alignment',
 'processedAlignment',
 'good',
 'lawful',
 'processedRace',
 'levelGroup',
 'has_spells',
 'has_feats']

In [17]:
# majority class check
majority_class = df[target].mode()
print('Majority class is', majority_class)
y_pred = [majority_class] * len(df)
accuracy_score(df[target], y_pred)

Majority class is 0    Fighter
dtype: object


0.13738441215323646

In [0]:
# train test split the data (80/20)
train, val = train_test_split(df, train_size=0.80, test_size=.20,
                               stratify=df[target], random_state=42)

In [0]:
X_train = train[features]
y_train = train[target]
X_val = val[features]
y_val = val[target]

In [20]:
# fit a pipeline (Decision Tree)
pipelineTree = make_pipeline(
    ce.OneHotEncoder(use_cat_names=True),
    SimpleImputer(strategy='mean'),
    StandardScaler(),
    DecisionTreeClassifier(max_depth=3)
)

pipelineTree.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('onehotencoder',
                 OneHotEncoder(cols=['background', 'alignment',
                                     'processedAlignment', 'good', 'lawful',
                                     'processedRace', 'levelGroup'],
                               drop_invariant=False, handle_missing='value',
                               handle_unknown='value', return_df=True,
                               use_cat_names=True, verbose=0)),
                ('simpleimputer',
                 SimpleImputer(add_indicator=False, copy=True, fill_value=None,
                               mis...
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('decisiontreeclassifier',
                 DecisionTreeClassifier(class_weight=None, criterion='gini',
                                        max_depth=3, max_features=None,
                                        max_leaf_nodes=None,
                                        m

In [21]:
# validation accuracy (Decision Tree)
y_pred_tree = pipelineTree.predict(X_val)
print('Validation Accuracy', accuracy_score(y_val, y_pred_tree))

Validation Accuracy 0.4276315789473684


In [22]:
y_pred_tree

array(['Bard', 'Cleric', 'Cleric', 'Paladin', 'Fighter', 'Fighter',
       'Cleric', 'Bard', 'Sorcerer', 'Sorcerer', 'Bard', 'Bard',
       'Sorcerer', 'Barbarian', 'Cleric', 'Barbarian', 'Cleric',
       'Sorcerer', 'Wizard', 'Cleric', 'Bard', 'Cleric', 'Fighter',
       'Cleric', 'Wizard', 'Bard', 'Sorcerer', 'Barbarian', 'Wizard',
       'Fighter', 'Cleric', 'Wizard', 'Cleric', 'Sorcerer', 'Wizard',
       'Cleric', 'Fighter', 'Sorcerer', 'Bard', 'Fighter', 'Fighter',
       'Wizard', 'Wizard', 'Bard', 'Wizard', 'Cleric', 'Bard', 'Cleric',
       'Cleric', 'Cleric', 'Barbarian', 'Sorcerer', 'Fighter', 'Paladin',
       'Wizard', 'Sorcerer', 'Fighter', 'Sorcerer', 'Fighter', 'Cleric',
       'Sorcerer', 'Wizard', 'Wizard', 'Fighter', 'Cleric', 'Bard',
       'Cleric', 'Cleric', 'Fighter', 'Sorcerer', 'Wizard', 'Cleric',
       'Sorcerer', 'Cleric', 'Cleric', 'Sorcerer', 'Sorcerer', 'Paladin',
       'Cleric', 'Cleric', 'Sorcerer', 'Wizard', 'Cleric', 'Barbarian',
       'Fighter', 'F

In [23]:
# fit a pipeline (Random Forest)
pipelineForest = make_pipeline(
    ce.OneHotEncoder(use_cat_names=True),
    SimpleImputer(strategy='mean'),
    StandardScaler(),
    RandomForestClassifier(n_estimators=50, random_state=42, n_jobs=-1)
)

pipelineForest.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('onehotencoder',
                 OneHotEncoder(cols=['background', 'alignment',
                                     'processedAlignment', 'good', 'lawful',
                                     'processedRace', 'levelGroup'],
                               drop_invariant=False, handle_missing='value',
                               handle_unknown='value', return_df=True,
                               use_cat_names=True, verbose=0)),
                ('simpleimputer',
                 SimpleImputer(add_indicator=False, copy=True, fill_value=None,
                               mis...
                ('randomforestclassifier',
                 RandomForestClassifier(bootstrap=True, class_weight=None,
                                        criterion='gini', max_depth=None,
                                        max_features='auto',
                                        max_leaf_nodes=None,
                                        min_impurity_dec

In [24]:
y_pred_forest = pipelineForest.predict(X_val)
print('Validation Accuracy', accuracy_score(y_val, y_pred_forest))

Validation Accuracy 0.6381578947368421


In [25]:
y_pred_forest

array(['Bard', 'Monk', 'Rogue', 'Barbarian', 'Fighter', 'Fighter',
       'Ranger', 'Sorcerer', 'Rogue', 'Paladin', 'Paladin', 'Paladin',
       'Cleric', 'Fighter', 'Fighter', 'Ranger', 'Fighter', 'Cleric',
       'Wizard', 'Rogue', 'Warlock', 'Paladin', 'Fighter', 'Monk',
       'Cleric', 'Fighter', 'Rogue', 'Barbarian', 'Wizard', 'Barbarian',
       'Cleric', 'Rogue', 'Barbarian', 'Wizard', 'Wizard', 'Ranger',
       'Fighter', 'Rogue', 'Rogue', 'Fighter', 'Fighter', 'Bard', 'Rogue',
       'Rogue', 'Wizard', 'Ranger', 'Sorcerer', 'Fighter', 'Monk',
       'Barbarian', 'Barbarian', 'Sorcerer', 'Barbarian', 'Paladin',
       'Wizard', 'Sorcerer', 'Fighter', 'Sorcerer', 'Fighter', 'Fighter',
       'Bard', 'Wizard', 'Rogue', 'Fighter', 'Monk', 'Sorcerer',
       'Paladin', 'Sorcerer', 'Fighter', 'Sorcerer', 'Sorcerer', 'Monk',
       'Sorcerer', 'Cleric', 'Cleric', 'Rogue', 'Sorcerer', 'Paladin',
       'Monk', 'Rogue', 'Sorcerer', 'Rogue', 'Barbarian', 'Barbarian',
       'Barbarian'

In [0]:
transformers = make_pipeline(
    ce.OneHotEncoder(use_cat_names=True),
    SimpleImputer(strategy='mean'),
    StandardScaler()
)

# tranform the data
X_train_transformed = transformers.fit_transform(X_train)
X_val_transformed = transformers.transform(X_val)

eval_set = [(X_train_transformed, y_train),
            (X_val_transformed, y_val)]

In [0]:
# model = RandomForestClassifier(n_estimators=50, random_state=42, n_jobs=-1)
# model.fit(X_train_transformed, y_train)

In [28]:
model = XGBClassifier(n_estimators=1000, n_jobs=-1)
model.fit(X_train_transformed, y_train, eval_set=eval_set, early_stopping_rounds=10)

[0]	validation_0-merror:0.363636	validation_1-merror:0.453947
Multiple eval metrics have been passed: 'validation_1-merror' will be used for early stopping.

Will train until validation_1-merror hasn't improved in 10 rounds.
[1]	validation_0-merror:0.350413	validation_1-merror:0.440789
[2]	validation_0-merror:0.335537	validation_1-merror:0.407895
[3]	validation_0-merror:0.319008	validation_1-merror:0.407895
[4]	validation_0-merror:0.307438	validation_1-merror:0.381579
[5]	validation_0-merror:0.307438	validation_1-merror:0.394737
[6]	validation_0-merror:0.280992	validation_1-merror:0.375
[7]	validation_0-merror:0.282645	validation_1-merror:0.361842
[8]	validation_0-merror:0.27438	validation_1-merror:0.368421
[9]	validation_0-merror:0.259504	validation_1-merror:0.342105
[10]	validation_0-merror:0.254545	validation_1-merror:0.335526
[11]	validation_0-merror:0.246281	validation_1-merror:0.355263
[12]	validation_0-merror:0.242975	validation_1-merror:0.335526
[13]	validation_0-merror:0.23471

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=1000, n_jobs=-1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [29]:
# Validation accuracy
y_pred = model.predict(X_val_transformed)
print('Validation Accuracy', accuracy_score(y_val, y_pred))

Validation Accuracy 0.6842105263157895


In [30]:
y_pred

array(['Bard', 'Monk', 'Rogue', 'Paladin', 'Fighter', 'Fighter', 'Monk',
       'Bard', 'Rogue', 'Paladin', 'Paladin', 'Warlock', 'Sorcerer',
       'Rogue', 'Fighter', 'Fighter', 'Fighter', 'Cleric', 'Wizard',
       'Rogue', 'Warlock', 'Druid', 'Fighter', 'Monk', 'Druid',
       'Barbarian', 'Sorcerer', 'Barbarian', 'Wizard', 'Fighter',
       'Cleric', 'Monk', 'Barbarian', 'Sorcerer', 'Wizard', 'Rogue',
       'Fighter', 'Rogue', 'Warlock', 'Fighter', 'Fighter', 'Bard',
       'Wizard', 'Rogue', 'Wizard', 'Ranger', 'Barbarian', 'Fighter',
       'Monk', 'Barbarian', 'Barbarian', 'Sorcerer', 'Barbarian',
       'Paladin', 'Wizard', 'Sorcerer', 'Barbarian', 'Sorcerer',
       'Barbarian', 'Wizard', 'Bard', 'Wizard', 'Wizard', 'Fighter',
       'Rogue', 'Warlock', 'Cleric', 'Druid', 'Fighter', 'Sorcerer',
       'Wizard', 'Monk', 'Sorcerer', 'Cleric', 'Cleric', 'Rogue',
       'Sorcerer', 'Fighter', 'Druid', 'Rogue', 'Paladin', 'Rogue',
       'Wizard', 'Sorcerer', 'Barbarian', 'Barbar

In [0]:
# permuter = PermutationImportance(
#     model, 
#     scoring='accuracy',
#     n_iter=3,
#     random_state=42
# )

# permuter.fit(X_val_transformed, y_val)
# feature_names = X_val.columns.tolist()

# eli5.show_weights(
#     permuter,
#     top=None, # show importance of all features
#     feature_names=feature_names
# )

In [0]:
# the row to anaylze in the shaply plot
# change the number in row to find the shaply value
row = 25
row = X_train.iloc[[row]]

In [0]:
# import shap

# explainer = shap.TreeExplainer(model)
# row_processed = transformers.transform(row)
# shap_values = explainer.shap_values(row_processed)

# shap.initjs()
# shap.force_plot(
#     base_value=explainer.expected_value,
#     shap_values=shap_values,
#     features=row
# )