Why is the default value for feature_importance 'weight' in python but R uses 'gain'? #2706

bbennett36 · 2017-09-13T13:30:24Z

I was reading through the docs and noticed that in the R-package section
http://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html#feature-importance
it says the follow..

"The column Gain provide the information we are looking for."

"Frequency is a simpler way to measure the Gain. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it)."
Which I'm assuming is 'weight' in the python package. (Correct me if I'm wrong)

If we shouldn't be using weight/frequency as a way to check feature importance, why is it the default param in XGBoost ?

Just a bit concerning as someone who just started learning XGBoost recently because I was using the default "plot_importance" to find the best features but now it seems misleading since it doesn't default to the best param to find this. Unless I am missing something. Seems like 'gain' should be the default parameter.

pommedeterresautee · 2017-09-13T13:44:28Z

There are many ways to look at feature importance. Gain is more informative than just frequency.
Please check this paper for some development https://arxiv.org/abs/1706.06060
It s going to be implemented in XGBoost
#2438

bbennett36 · 2017-09-13T13:47:00Z

Sounds good, I'll check it out. Thank you!

HugoDLopes · 2017-11-15T11:57:08Z

Hi,
@pommedeterresautee
I've run into the same issue today. The feature_importance_ being default to weight in the python package can be really misleading. I've then dig into the code and noticed that the definition of feature importance in the XGBoost is the weight.

When compared with sklearn classifiers (RF or GB) this type of feature importance is not used. Shall it be more similar to the sklearn since the purpose of the feature_importance_ is to resemble the implementation in sklearn?

Currently the weight is not being very helpful since it is far from the actually predictive contribution of a feature for the whole model.

pommedeterresautee closed this as completed Sep 13, 2017

jaradc mentioned this issue Oct 19, 2017

I Can Determine Feature Importance, Can I determine WHY They're Important? #2802

Closed

lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the default value for feature_importance 'weight' in python but R uses 'gain'? #2706

Why is the default value for feature_importance 'weight' in python but R uses 'gain'? #2706

bbennett36 commented Sep 13, 2017 •

edited

Loading

pommedeterresautee commented Sep 13, 2017

bbennett36 commented Sep 13, 2017

HugoDLopes commented Nov 15, 2017 •

edited

Loading

Why is the default value for feature_importance 'weight' in python but R uses 'gain'? #2706

Why is the default value for feature_importance 'weight' in python but R uses 'gain'? #2706

Comments

bbennett36 commented Sep 13, 2017 • edited Loading

pommedeterresautee commented Sep 13, 2017

bbennett36 commented Sep 13, 2017

HugoDLopes commented Nov 15, 2017 • edited Loading

bbennett36 commented Sep 13, 2017 •

edited

Loading

HugoDLopes commented Nov 15, 2017 •

edited

Loading