Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBClassifier has no attribute get_fscore #757

Closed
aewhatley opened this issue Jan 18, 2016 · 17 comments

Comments

Projects
None yet
@aewhatley
Copy link

commented Jan 18, 2016

It looks like XGBClassifier in xgboost.sklearn does not have get_fscore, and it does not have feature_importances_ like other sklearn functions do. I think that some kind of feature importance metric should be incorporated into this model, or if it does exist, should be better documented.

@davidgasquez

This comment has been minimized.

Copy link

commented Jan 25, 2016

Hi! Today I ran into the same issue and this is the workaround I made to get the feature_importances_ like sklearn ensemble methods.

class XGBFeatureImportances(XGBClassifier):
    """A custom XGBClassifier with feature importances computation.

    This class implements XGBClassifier and also computes feature importances
    based on the fscores. Implementing feature_importances_ property allow us
    to use `SelectFromModel` with XGBClassifier.
    """

    def __init__(self, n_features, *args, **kwargs):
        """Init method adding n_features."""
        super(XGBFeatureImportances, self).__init__(*args, **kwargs)
        self._n_features = n_features

    @property
    def n_features(self):
        """Number of classes to predict."""
        return self._n_features

    @n_features.setter
    def n_features(self, value):
        self._n_features = value

    @property
    def feature_importances_(self):
        """Return the feature importances.

        Returns
        -------
        feature_importances_ : array, shape = [n_features]
        """
        booster = self.booster()
        fscores = booster.get_fscore()

        importances = np.zeros(self.n_features)

        for k, v in fscores.iteritems():
            importances[int(k[1:])] = v

        return importances

I didn't figured out how to get the number of initial features(this would simplify this child class) without implicitly passing it to the class constructor.

With this class you can run sklearn.feature_selection.SelectFromModel with XGBClassifier.

I would be pleased to add this to XGBClassifier, once I get a smarter way to handle the n_features issue.

@thusithaC

This comment has been minimized.

Copy link

commented Apr 6, 2016

@davidgasquez Thanks for the code, but I'm not sure how to use this. Could you kindly attach some example usage code as well. Thanks!

@davidgasquez

This comment has been minimized.

Copy link

commented Apr 7, 2016

Of course!

The idea is to fit a model but XGBFeatureImportances instead of the base XGBClassifier. You need to provide the class a n_features parameter as It's a trivial way to know the n_classes and removes the need to compute it. A simple example would be:

clf = XGBFeatureImportances(n_features, ...)
clf.fit(X, y)
importances = clf.feature_importances_

Once you have your classifier working, you can use sfm = SelectFromModel(clf) and you're ready to fit and transform this model as you want.

@krishnateja614

This comment has been minimized.

Copy link

commented May 31, 2016

@davidgasquez . Thanks for your code, I tried the way you described and I defined the model in this way
fimp_xgb_model=XGBFeatureImportances(n_features=158,base_score=0.5, colsample_bytree=0.5, gamma=0.017,learning_rate=0.15, max_delta_step=0, max_depth=8, min_child_weight=3, n_estimators=300, nthread=-1, objective='binary:logistic', seed=0, silent=True, subsample=0.9)

fitting it like
fimp_xgb_model.fit(x_train,y_train).

But i'm getting this run time error
RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their init (no varargs). <class 'main.XGBFeatureImportances'> with constructor (, n_features, _args, *_kwargs) doesn't follow this convention

Can you help me out with this.

@davidgasquez

This comment has been minimized.

Copy link

commented Jun 1, 2016

Hi @krishnateja614,

Quick question, which version of Python, Pandas and Sklearn are you using?

@krishnateja614

This comment has been minimized.

Copy link

commented Jun 1, 2016

python 2.7.11
pandas 0.18.0
scikit-learn 0.17.1
Anaconda 4.0.0 64 bit

@davidgasquez

This comment has been minimized.

Copy link

commented Jun 1, 2016

It seems like one of the latest scikit-learn updates broke my code.

If you want to get the features importances from your model, I'll advise to just use:

fimp_xgb_model = XGBClassifier(base_score=0.5, colsample_bytree=0.5, 
                               gamma=0.017,learning_rate=0.15, max_delta_step=0, 
                               max_depth=8, min_child_weight=3, n_estimators=300, 
                               nthread=-1, objective='binary:logistic', seed=0, 
                               silent=True, subsample=0.9)

fimp_xgb_model.fit(x_train,y_train)

fscore = fimp_xgb_model.booster().get_fscore()

importances = np.zeros(158)
for k, v in fscores.iteritems():
    importances[int(k[1:])] = v
@krishnateja614

This comment has been minimized.

Copy link

commented Jun 1, 2016

Thank you,will try it out and let you know.

@hminle

This comment has been minimized.

Copy link

commented Nov 27, 2016

Hi, @davidgasquez, could you please explain what is 158 means in the above code snippet?
importances = np.zeros(158)
Thank you a lot

@davidgasquez

This comment has been minimized.

Copy link

commented Nov 28, 2016

Hey there @hminle! The line importances = np.zeros(158) is creating a vector of size 158 filled with 0. You can get more information in Numpy docs.

The number 158 is just an example of the number of features for the example specific model. This array will later contain the relative importance of each feature. To get the length of this array, you could use the number of columns in the train set (usually the same as the number of features).

@OnTheRicky

This comment has been minimized.

Copy link

commented Mar 19, 2017

Is it possible to implement the get_score() function from xgboost\core ?
I want to be able to get the different importance types: importance_type: ['weight', 'gain', 'cover']

@wwwxmu

This comment has been minimized.

Copy link

commented Aug 23, 2017

Sorry,I suffer from the error:

fimp_xgb_model.fit(X_train,y_train)
fscore = fimp_xgb_model.booster().get_fscore()

in ()
1 fimp_xgb_model = gs.best_estimator_
2 fimp_xgb_model.fit(X_train,y_train)
----> 3 fscore = fimp_xgb_model.booster().get_fscore()
4
5 # importances = np.zeros(158)

TypeError: 'str' object is not callable

@davidgasquez

This comment has been minimized.

Copy link

commented Aug 23, 2017

Hey @wwwxmu! The best thing to do here would be to print fimp_xgb_model after each step and check that's not a string.

@xxqcheers

This comment has been minimized.

Copy link

commented Aug 24, 2017

TypeError : str object is not callable

feat_imp_temp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
Traceback (most recent call last):
File "D:/Program/XGB-CVR-pre/pureDemo/XgbGridSearch.py", line 91, in
modelfit_origianl(xgb1, train, test, predictors)
File "D:/Program/XGB-CVR-pre/pureDemo/XgbGridSearch.py", line 72, in modelfit_origianl
feat_imp_temp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
TypeError: 'str' object is not callable

@wwwxmu

This comment has been minimized.

Copy link

commented Aug 25, 2017

@davidgasquez Thank you! The problem has been solved.I reinstall xgboost, and there are no error any more

@tqchen tqchen closed this Dec 30, 2017

@arnehuang

This comment has been minimized.

Copy link

commented Jan 25, 2018

In case anyone stumbles across this from google, you can get the values from:
.get_booster().get_score(importance_type='weight')

@quantumds

This comment has been minimized.

Copy link

commented Jun 20, 2018

Based on @arnehuang´s fantastic contribution. For anyone trying to fix Analytics Vidhya XGBoost lines of code which are wrong you should replace: feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False) with feat_imp = pd.Series(alg.get_booster().get_score(importance_type='weight')).sort_values(ascending=False). As well, take care of print lines which are in Python2.X. format (parenthesis are needed print() ), and response variables which are hardcoded with 'Disbursed'. Thks @arnehuang .

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.