Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to use predict on a (new) dataset #13

Closed
romunov opened this issue Jul 23, 2015 · 2 comments
Closed

unable to use predict on a (new) dataset #13

romunov opened this issue Jul 23, 2015 · 2 comments

Comments

@romunov
Copy link

@romunov romunov commented Jul 23, 2015

Lorcan Treanor reported on Stackoverflow that he is unable to use predict. I installed the dev version with the same error.

library("mboostDevel")

### fitting multinomial logit model via a linear array model
X0 <- K0 <- diag(nlevels(iris$Species) - 1)
colnames(X0) <- levels(iris$Species)[-nlevels(iris$Species)]
mlm <- mboost(Species ~ bols(Sepal.Length, df = 2) %O%
                buser(X0, K0, df = 2), data = iris,
              family = Multinomial())
round(predict(mlm, type = "response", newdata = iris), 2)

Error in `[.data.frame`(newdata, nm) : undefined columns selected 
@sbrockhaus
Copy link
Member

@sbrockhaus sbrockhaus commented Jul 23, 2015

The problem is that predict() does not work with newdata when buser() was used in the model.
If the model is fitted using bols() instead of buser() the prediction with newdata works.

### fitting multinomial logit model via a linear array model
X0 <- K0 <- diag(nlevels(iris$Species) - 1)
colnames(X0) <- levels(iris$Species)[-nlevels(iris$Species)]
mlm <- mboost(Species ~ bols(Sepal.Length, df = 2) %O%
                buser(X0, K0, df = 2), data = iris,
              family = Multinomial())

round(predict(mlm, type = "response", newdata = iris), 2)
pred <- round(predict(mlm, type = "response"), 2)

## set up data that contain a dummy-variable 
## thus you can use bols() instead of buser()
myiris <- as.list(iris)
myiris$dummy <- factor(1:2)

## fitting the model without buser()
mlm2 <- mboost(Species ~ bols(Sepal.Length, df = 2) %O%
                bols(dummy, df = 2, contrasts.arg = "contr.dummy"), data = myiris,
              family = Multinomial())

pred2 <- round(predict(mlm2, type = "response", newdata = myiris), 2)

## compare the predictions of the two models
all(pred2==pred)

### compare design and penalty matrices of the two models
extract(mlm, "design")[[1]][2]
extract(mlm, "penalty")

extract(mlm2, "design")[[1]][2]
extract(mlm2, "penalty")

## look at offset 
## mlm2$offset

@romunov romunov closed this Jul 23, 2015
@hofnerb
Copy link
Member

@hofnerb hofnerb commented Jul 23, 2015

Thanks, @sbrockhaus.

Just one addition: To make predictions for really new data, one needs to always keep the dummy as is. E.g. we can make a prediction only for the first observation as follows:

newdata <- as.list(iris[1,])
## define a dummy vector with one factor level less than the outcome
newdata$dummy <- factor(1:(nlevels(iris$Species) - 1))
pred3 <- round(predict(mlm2, type = "response", newdata = newdata), 2)

## check results
pred2[1,]
pred3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.