Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign uphelper function #17
helper function #17
Comments
|
I think I have the same problem with check_newdata() when using bl1 %X% bl2 with a variable that has more than one class and unsing predict() with newdata. I think the error occurs whenever you have more than one variable in the base-learner and at least one of those variables has more than one class. Consider the following minimal example (based on help of mboost): library(mboost)
data("volcano", package = "datasets")
vol <- as.vector(volcano)
x1 <- 1:nrow(volcano)
x2 <- 1:ncol(volcano)
x <- expand.grid(x1, x2)
# generate a variable with more than one class
x$factorz <- I(gl(2, 2, length=nrow(x)))
modx <- mboost(vol ~ bbs(Var2, df = 3, knots = 10) %X%
bols(factorz, df = 3), data = x,
control = boost_control(nu = 0.25))
# try to predict the data, gives error...
test <- predict(modx, newdata=x)
# ... as this comparison in check_newdata() is not possible,
# even if the data is the same
sapply(x, class) == sapply(x, class) |
|
This is an ugly hack, but it workes for my example. I simply override the current version of check_newdata() by a function that does not check the class of the variables in newdata at all! my_check_newdata <- function(newdata, blg, mf, to.data.frame = TRUE) {
nm <- names(blg)
if (!all(nm %in% names(newdata)))
stop(sQuote("newdata"),
" must contain all predictor variables,",
" which were used to specify the model.")
if (!class(newdata) %in% c("list", "data.frame"))
stop(sQuote("newdata"), " must be either a data.frame or a list")
if (any(duplicated(nm))) ## removes duplicates
nm <- unique(nm)
#if (!all(sapply(newdata[nm], class) == sapply(mf, class)))
# warning("Some variables in ", sQuote("newdata"),
# " do not have the same class as in the original data set",
# call. = FALSE)
## subset data
mf <- newdata[nm]
if (is.list(mf) && to.data.frame)
mf <- as.data.frame(mf)
return(mf)
}
library(mboost)
## write my version of check_newdata() into the namespace of mboost
assignInNamespace("check_newdata", my_check_newdata, ns="mboost", envir=as.environment("package:mboost"))
# check whether it worked
mboost:::check_newdata |
|
Thanks! I did a similar hack by replacing if (!all(sapply(newdata[nm], class) == sapply(mf, class))) |
|
Thank you very much, I think your hack is nicer than mine. I just had to replace nd with newdata to make it work, i.e. replacing |
|
Thanks for the bug report. I've used a different (but similar) fix as I've already modified
and check your code with this package. Please let me know if the error persists. |
|
Thanks for the fix. Everything works fine for me! |
|
Thanks a lot, for me it works as well! |
I am currently using the ctm package from R Forge, which also implements model boosting when dealing with ordinal outcomes (i.e., ordered factors). The predict.ctm() function delivers an error probably because of:
function check_newdata() in mboost/R/helpers.R/
line: if (!all(sapply(newdata[nm], class) == sapply(mf, class)))
This will give an error when the outcome variable is of class "ordered" "factor".