Feature Immportance chart with selected feature names with scores #10

VinayChaudhari1996 · 2022-04-08T06:24:41Z

How do I get feature names with scores like this? (Traditional Xg-Boost)
And what will be the X-axis scoring scale for that?

img-src : https://user-images.githubusercontent.com/42869040/162376574-03869b81-f11e-4d1f-8bea-eddb714d39b0.png

Thanks

Originally posted by @VinayChaudhari1996 in #4 (comment)

cerlymarco · 2022-04-08T06:53:26Z

Hi, u can reproduce this plot in a very straightforward way. Below is a dummy example:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from shaphypetune import BoostRFE
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from xgboost import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X = pd.DataFrame(X, columns=[f"c{c}" for c in range(X.shape[1])])

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

clf_xgb = XGBClassifier(n_estimators=150, random_state=0, verbosity=0, n_jobs=-1)

model = BoostRFE(clf_xgb, min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

selected_features = model.support_
feature_importance = model.estimator_.feature_importances_
sort_idx = feature_importance.argsort()

plt.barh(X.columns[selected_features][sort_idx], feature_importance[sort_idx])
plt.xlabel("Xgboost Feature Importance")

According to XGBOOST DOC, the "x-axis" represents the feature importance values obtained with “gain”, “weight”, “cover”, “total_gain” or “total_cover” criteria.

That's all.
If u support the project don't forget to leave a star ;-)

VinayChaudhari1996 · 2022-04-08T07:11:35Z

Hey @cerlymarco Thanks a lot :) , I have a few questions,

Does the X-axis is normalized to (0-1 aggregated scale) for all “gain”, “weight”, “cover”, “total_gain” or “total_cover”?
In default Xg-boost what is the default X-axis scale type, it did not seem to normalize at all?
so what it is actually? gain/weight/cover/total_gain/total_cover

I tried this for XGboost before, but for now, I just changed

fit_model_obj.feature_importances_[sorted_idx] TO ---> fit_model_obj.estimator_.feature_importances_[sorted_idx]
Is it the right approach for XG-boost and BORUTA?

Thanks.

cerlymarco closed this as completed Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Immportance chart with selected feature names with scores #10

Feature Immportance chart with selected feature names with scores #10

VinayChaudhari1996 commented Apr 8, 2022

cerlymarco commented Apr 8, 2022

VinayChaudhari1996 commented Apr 8, 2022 •

edited

Feature Immportance chart with selected feature names with scores #10

Feature Immportance chart with selected feature names with scores #10

Comments

VinayChaudhari1996 commented Apr 8, 2022

cerlymarco commented Apr 8, 2022

VinayChaudhari1996 commented Apr 8, 2022 • edited

VinayChaudhari1996 commented Apr 8, 2022 •

edited