Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Implement feature weights. #7660

Merged
merged 1 commit into from
Feb 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions R-package/R/xgb.DMatrix.R
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,13 @@ setinfo.xgb.DMatrix <- function(object, name, info, ...) {
.Call(XGDMatrixSetInfo_R, object, name, as.integer(info))
return(TRUE)
}
if (name == "feature_weights") {
if (length(info) != ncol(object)) {
stop("The number of feature weights must equal to the number of columns in the input data")
}
.Call(XGDMatrixSetInfo_R, object, name, as.numeric(info))
return(TRUE)
}
stop("setinfo: unknown info name ", name)
return(FALSE)
}
Expand Down
27 changes: 27 additions & 0 deletions R-package/tests/testthat/test_feature_weights.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
library(xgboost)

context("feature weights")

test_that("training with feature weights works", {
nrows <- 1000
ncols <- 9
set.seed(2022)
x <- matrix(rnorm(nrows * ncols), nrow = nrows)
y <- rowSums(x)
weights <- seq(from = 1, to = ncols)

test <- function(tm) {
names <- paste0("f", 1:ncols)
xy <- xgb.DMatrix(data = x, label = y, feature_weights = weights)
params <- list(colsample_bynode = 0.4, tree_method = tm, nthread = 1)
model <- xgb.train(params = params, data = xy, nrounds = 32)
importance <- xgb.importance(model = model, feature_names = names)
expect_equal(dim(importance), c(ncols, 4))
importance <- importance[order(importance$Feature)]
expect_lt(importance[1, Frequency], importance[9, Frequency])
}

for (tm in c("hist", "approx", "exact")) {
test(tm)
}
})
7 changes: 3 additions & 4 deletions doc/parameter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,9 @@ Parameters for Tree Booster
'colsample_bynode':0.5}`` with 64 features will leave 8 features to choose from at
each split.

On Python interface, when using ``hist``, ``gpu_hist`` or ``exact`` tree method, one
can set the ``feature_weights`` for DMatrix to define the probability of each feature
being selected when using column sampling. There's a similar parameter for ``fit``
method in sklearn interface.
Using the Python or the R package, one can set the ``feature_weights`` for DMatrix to
define the probability of each feature being selected when using column sampling.
There's a similar parameter for ``fit`` method in sklearn interface.

* ``lambda`` [default=1, alias: ``reg_lambda``]

Expand Down