Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Immportance chart with selected feature names with scores #10

Closed
VinayChaudhari1996 opened this issue Apr 8, 2022 · 2 comments

Comments

@VinayChaudhari1996
Copy link

Hi @cerlymarco ,

  1. How do I get feature names with scores like this? (Traditional Xg-Boost)
  2. And what will be the X-axis scoring scale for that?

image
img-src : https://user-images.githubusercontent.com/42869040/162376574-03869b81-f11e-4d1f-8bea-eddb714d39b0.png

Thanks

Originally posted by @VinayChaudhari1996 in #4 (comment)

@cerlymarco
Copy link
Owner

Hi, u can reproduce this plot in a very straightforward way. Below is a dummy example:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from shaphypetune import BoostRFE
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from xgboost import *

X, y = make_classification(n_samples=6000, n_features=20, n_classes=2, 
                                   n_informative=4, n_redundant=6, random_state=0)

X = pd.DataFrame(X, columns=[f"c{c}" for c in range(X.shape[1])])

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, shuffle=False)

clf_xgb = XGBClassifier(n_estimators=150, random_state=0, verbosity=0, n_jobs=-1)

model = BoostRFE(clf_xgb, min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)

selected_features = model.support_
feature_importance = model.estimator_.feature_importances_
sort_idx = feature_importance.argsort()

plt.barh(X.columns[selected_features][sort_idx], feature_importance[sort_idx])
plt.xlabel("Xgboost Feature Importance")

image

According to XGBOOST DOC, the "x-axis" represents the feature importance values obtained with “gain”, “weight”, “cover”, “total_gain” or “total_cover” criteria.

That's all.
If u support the project don't forget to leave a star ;-)

@VinayChaudhari1996
Copy link
Author

VinayChaudhari1996 commented Apr 8, 2022

Hey @cerlymarco Thanks a lot :) , I have a few questions,

  1. Does the X-axis is normalized to (0-1 aggregated scale) for all “gain”, “weight”, “cover”, “total_gain” or “total_cover”?

  2. In default Xg-boost what is the default X-axis scale type, it did not seem to normalize at all?
    so what it is actually? gain/weight/cover/total_gain/total_cover

I tried this for XGboost before, but for now, I just changed

fit_model_obj.feature_importances_[sorted_idx] TO ---> fit_model_obj.estimator_.feature_importances_[sorted_idx]
Is it the right approach for XG-boost and BORUTA?

image

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants