predict.mboost with newdata when argument index was used in bl #15

sbrockhaus · 2015-08-11T09:15:13Z

I have some troubles with predict.mboost() when passing the argument newdata, but I think this is mainly an issue of documentation. I fit a model using mboost() and then want to use predict.mboost() with the argument newdata.
As far as I understand, it is not possible to pass a list instead of a data.frame to newdata, even when the model-fit was done using data in list form, as it happens when using the argument index (an exception to this is the use of %O% which I could not find in the help either).
But the real question is, what happens with the index in the prediction. I wrote a small code example for this and to me it looks like the index-variable does not have any influence on the prediciton. This was somewhat unexpected to me and I wanted to ask why this is the case and whether you could add a comment on this in the documentation.

library(mboost)

### modified example from mboost-help 
data("volcano", package = "datasets")
layout(matrix(1:2, ncol = 2))

## estimate mean of image per row treating image as matrix
image(volcano, main = "data")
x1 <- 1:nrow(volcano)
x2 <- 1:ncol(volcano)
vol <- as.vector(volcano)

## create dataset containing only one direction and 
## an index variable for the other direction
datList <- list(vol=vol, x2=x2, id=rep(x2, each=length(x1)) )

## fit the volcano data only in one direction using index
modid <- mboost(vol ~ bbs(x2, index=id, df = 3, knots = 10), 
                data = datList, control = boost_control(nu = 0.25))
modid[250]

volfid <- matrix(fitted(modid), nrow = nrow(volcano))
image(volfid, main = "fitted")

## try to predict the original data in list form
## gives an error, as newdata has to be a data.frame 
## (if %O% is not part of the base-learner)
pred <- predict(modid, newdata=datList)

## use a data.frame as newdata
## does the index-variable have any influence on the prediction?
newd <- data.frame(x2=datList$x2[1:5], id=1)
pred1 <- predict(modid, newdata=newd)

newd <- data.frame(x2=datList$x2[1:5])  ## id=1:length(x2)
pred2 <- predict(modid, newdata=newd)

## apparently not! can predict without passing index-variable
all(pred1==pred2)

The text was updated successfully, but these errors were encountered:

hofnerb · 2015-08-11T12:47:09Z

Thanks, @sbrockhaus. I just had a look at your problems.

Regarding the index argument, you already state yourself, that it is not necessary for prediction. It is only used to estimate the model. Essentially, index can be seen in analogy to case weights, i.e., we repeat each observation as often as it is contained in index.

Often, index isn't directly specified by the user but it is used internally to speed up computations if nrow(data) exceeds options("mboost_indexmin") (which is per default 10000). See also the section Global Options in ?bols.

Note that index doesn't have to be included in the data set as it is not really part of the data itself. You wouldn't necessarily add weights as a column to the data frame.

Having said this, the remaining problem is the prediction with newdata = list(). This can be also seen in a much simpler example:

data("bodyfat", package = "TH.data")
## convert data to list
bf <- as.list(bodyfat)
mod <- mboost(DEXfat ~ btree(age) + bols(waistcirc) + bbs(hipcirc), data = bf)

## use first two rows of data as new data set (again as liist)
nd <- as.list(bodyfat[1:2,])
predict(mod, newdata = nd)

## using a data frame works
nd <- bodyfat[1:2,]
predict(mod, newdata = nd)

I am now investigating where we have to change the subsetting of newdata. Instead of newdata[, nm, drop = FALSE] which gives an error for lists we can always use newdata[nm], which works for both lists and data frames.

sbrockhaus · 2015-08-12T06:44:17Z

Thank you very much for the explanation!

hofnerb closed this as completed in 5103029 Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict.mboost with newdata when argument index was used in bl #15

predict.mboost with newdata when argument index was used in bl #15

sbrockhaus commented Aug 11, 2015

hofnerb commented Aug 11, 2015

sbrockhaus commented Aug 12, 2015

predict.mboost with newdata when argument index was used in bl #15

predict.mboost with newdata when argument index was used in bl #15

Comments

sbrockhaus commented Aug 11, 2015

hofnerb commented Aug 11, 2015

sbrockhaus commented Aug 12, 2015