Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] xgb.cv doesn't return feature names #5018

Open
nettoyoussef opened this issue Nov 6, 2019 · 2 comments
Open

[R] xgb.cv doesn't return feature names #5018

nettoyoussef opened this issue Nov 6, 2019 · 2 comments
Assignees
Labels
cross-validation Issues related to cross validation implementation in XGBoost. type: bug type: r-package

Comments

@nettoyoussef
Copy link

nettoyoussef commented Nov 6, 2019

Hi all,

Long fan of your efforts with the Xgboost algorithm/implementation. It is super fast and memory-friendly.

I found a problem when trying to see feature importance when using the xgb.cv function, namely that it doesn't return the features names when using the callback cb.cv.predict(save_models = TRUE).

I found this trying to plot the model importance using xgb.plot.importance.
Does the numbers refer to the python way of counting columns (i.e., starting from 0)?

I made an MRE below:

Xgboost version: xgboost_0.90.0.2 (R package)

data(iris)
library(xgboost)
library(dplyr)

iris <- filter(iris, Species != 'setosa')
features <- as.matrix(iris[, !grepl('Species', colnames(iris))])
label <- ifelse(iris$Species == 'virginica', 1, 0)



model <- xgboost::xgb.cv(
                      data = features
                    , label = label 
                    , nfold = 5 
                    , nrounds = 25
                    , metrics = list("auc")
                    , stratified = TRUE
                    , verbose = TRUE
                    , callbacks = list(cb.cv.predict(save_models = TRUE))
                    
                    , params = list(
                        eta = 0.1
                        , max_depth = 10
                        , objective = "binary:logistic"
                        , colsample_bytree = 0.5
                        , subsample = 0.5
                        , nthread = 2
                        , seed = 1
                        )
                    )

importance <- xgb.importance(model = model$models[[1]])
xgboost::xgb.plot.importance(importance)

xgboost_reprex

@nettoyoussef nettoyoussef changed the title xgb.cv doesn't return features names xgb.cv doesn't return feature names Nov 6, 2019
@nettoyoussef
Copy link
Author

We have a workaround for the problem passing the names of the features, as exposed here:

importance <- xgb.importance(model = model$models[[1]], feature_names = colnames(features))

But I still find that it would be advisable to correct the original problem.

@trivialfis trivialfis changed the title xgb.cv doesn't return feature names [R] xgb.cv doesn't return feature names Apr 16, 2020
@hcho3 hcho3 self-assigned this Sep 8, 2020
@trivialfis trivialfis added the cross-validation Issues related to cross validation implementation in XGBoost. label Oct 18, 2021
@trivialfis
Copy link
Member

Sorry for the delay.

Note to myself: Need to store the feature names into booster after UBJSON is merged. This line sets the feature names for booster:

bst$feature_names <- colnames(dtrain)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross-validation Issues related to cross validation implementation in XGBoost. type: bug type: r-package
Projects
None yet
Development

No branches or pull requests

3 participants