Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBClassifier has no attribute get_fscore #757

Closed
aewhatley opened this issue Jan 18, 2016 · 17 comments
Closed

XGBClassifier has no attribute get_fscore #757

aewhatley opened this issue Jan 18, 2016 · 17 comments

Comments

@aewhatley
Copy link

It looks like XGBClassifier in xgboost.sklearn does not have get_fscore, and it does not have feature_importances_ like other sklearn functions do. I think that some kind of feature importance metric should be incorporated into this model, or if it does exist, should be better documented.

@davidgasquez
Copy link

Hi! Today I ran into the same issue and this is the workaround I made to get the feature_importances_ like sklearn ensemble methods.

class XGBFeatureImportances(XGBClassifier):
    """A custom XGBClassifier with feature importances computation.

    This class implements XGBClassifier and also computes feature importances
    based on the fscores. Implementing feature_importances_ property allow us
    to use `SelectFromModel` with XGBClassifier.
    """

    def __init__(self, n_features, *args, **kwargs):
        """Init method adding n_features."""
        super(XGBFeatureImportances, self).__init__(*args, **kwargs)
        self._n_features = n_features

    @property
    def n_features(self):
        """Number of classes to predict."""
        return self._n_features

    @n_features.setter
    def n_features(self, value):
        self._n_features = value

    @property
    def feature_importances_(self):
        """Return the feature importances.

        Returns
        -------
        feature_importances_ : array, shape = [n_features]
        """
        booster = self.booster()
        fscores = booster.get_fscore()

        importances = np.zeros(self.n_features)

        for k, v in fscores.iteritems():
            importances[int(k[1:])] = v

        return importances

I didn't figured out how to get the number of initial features(this would simplify this child class) without implicitly passing it to the class constructor.

With this class you can run sklearn.feature_selection.SelectFromModel with XGBClassifier.

I would be pleased to add this to XGBClassifier, once I get a smarter way to handle the n_features issue.

@thusithaC
Copy link

@davidgasquez Thanks for the code, but I'm not sure how to use this. Could you kindly attach some example usage code as well. Thanks!

@davidgasquez
Copy link

Of course!

The idea is to fit a model but XGBFeatureImportances instead of the base XGBClassifier. You need to provide the class a n_features parameter as It's a trivial way to know the n_classes and removes the need to compute it. A simple example would be:

clf = XGBFeatureImportances(n_features, ...)
clf.fit(X, y)
importances = clf.feature_importances_

Once you have your classifier working, you can use sfm = SelectFromModel(clf) and you're ready to fit and transform this model as you want.

@krishnateja614
Copy link

@davidgasquez . Thanks for your code, I tried the way you described and I defined the model in this way
fimp_xgb_model=XGBFeatureImportances(n_features=158,base_score=0.5, colsample_bytree=0.5, gamma=0.017,learning_rate=0.15, max_delta_step=0, max_depth=8, min_child_weight=3, n_estimators=300, nthread=-1, objective='binary:logistic', seed=0, silent=True, subsample=0.9)

fitting it like
fimp_xgb_model.fit(x_train,y_train).

But i'm getting this run time error
RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their init (no varargs). <class 'main.XGBFeatureImportances'> with constructor (, n_features, _args, *_kwargs) doesn't follow this convention

Can you help me out with this.

@davidgasquez
Copy link

Hi @krishnateja614,

Quick question, which version of Python, Pandas and Sklearn are you using?

@krishnateja614
Copy link

python 2.7.11
pandas 0.18.0
scikit-learn 0.17.1
Anaconda 4.0.0 64 bit

@davidgasquez
Copy link

It seems like one of the latest scikit-learn updates broke my code.

If you want to get the features importances from your model, I'll advise to just use:

fimp_xgb_model = XGBClassifier(base_score=0.5, colsample_bytree=0.5, 
                               gamma=0.017,learning_rate=0.15, max_delta_step=0, 
                               max_depth=8, min_child_weight=3, n_estimators=300, 
                               nthread=-1, objective='binary:logistic', seed=0, 
                               silent=True, subsample=0.9)

fimp_xgb_model.fit(x_train,y_train)

fscore = fimp_xgb_model.booster().get_fscore()

importances = np.zeros(158)
for k, v in fscores.iteritems():
    importances[int(k[1:])] = v

@krishnateja614
Copy link

Thank you,will try it out and let you know.

@hminle
Copy link

hminle commented Nov 27, 2016

Hi, @davidgasquez, could you please explain what is 158 means in the above code snippet?
importances = np.zeros(158)
Thank you a lot

@davidgasquez
Copy link

Hey there @hminle! The line importances = np.zeros(158) is creating a vector of size 158 filled with 0. You can get more information in Numpy docs.

The number 158 is just an example of the number of features for the example specific model. This array will later contain the relative importance of each feature. To get the length of this array, you could use the number of columns in the train set (usually the same as the number of features).

@OnTheRicky
Copy link

Is it possible to implement the get_score() function from xgboost\core ?
I want to be able to get the different importance types: importance_type: ['weight', 'gain', 'cover']

@wwwxmu
Copy link

wwwxmu commented Aug 23, 2017

Sorry,I suffer from the error:

fimp_xgb_model.fit(X_train,y_train)
fscore = fimp_xgb_model.booster().get_fscore()

in ()
1 fimp_xgb_model = gs.best_estimator_
2 fimp_xgb_model.fit(X_train,y_train)
----> 3 fscore = fimp_xgb_model.booster().get_fscore()
4
5 # importances = np.zeros(158)

TypeError: 'str' object is not callable

@davidgasquez
Copy link

Hey @wwwxmu! The best thing to do here would be to print fimp_xgb_model after each step and check that's not a string.

@xxqcheers
Copy link

TypeError : str object is not callable

feat_imp_temp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
Traceback (most recent call last):
File "D:/Program/XGB-CVR-pre/pureDemo/XgbGridSearch.py", line 91, in
modelfit_origianl(xgb1, train, test, predictors)
File "D:/Program/XGB-CVR-pre/pureDemo/XgbGridSearch.py", line 72, in modelfit_origianl
feat_imp_temp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
TypeError: 'str' object is not callable

@wwwxmu
Copy link

wwwxmu commented Aug 25, 2017

@davidgasquez Thank you! The problem has been solved.I reinstall xgboost, and there are no error any more

@tqchen tqchen closed this as completed Dec 30, 2017
@arnehuang
Copy link

arnehuang commented Jan 25, 2018

In case anyone stumbles across this from google, you can get the values from:
.get_booster().get_score(importance_type='weight')

@quantumds
Copy link

quantumds commented Jun 20, 2018

Based on @arnehuang´s fantastic contribution. For anyone trying to fix Analytics Vidhya XGBoost lines of code which are wrong you should replace: feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False) with feat_imp = pd.Series(alg.get_booster().get_score(importance_type='weight')).sort_values(ascending=False). As well, take care of print lines which are in Python2.X. format (parenthesis are needed print() ), and response variables which are hardcoded with 'Disbursed'. Thks @arnehuang .

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests