Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bols for factors with unobserved levels breaks #47

Closed
hofnerb opened this issue Aug 24, 2016 · 5 comments
Closed

bols for factors with unobserved levels breaks #47

hofnerb opened this issue Aug 24, 2016 · 5 comments
Assignees
Labels
bug

Comments

@hofnerb
Copy link
Member

@hofnerb hofnerb commented Aug 24, 2016

library("mboost")
z <- factor(sample(1:5, 100, replace = TRUE), levels = 1:6)
y <- rnorm(100)
mboost(y ~ bols(z))
## Error in solve.default(XtX, crossprod(X, y), LINPACK = FALSE) : 
##  Lapack routine dgesv: system is exactly singular: U[6,6] = 0

z <- droplevels(z)
mboost(y ~ bols(z)) # works

Thus, perhaps we should use droplevels() within bols and issue a warning if any levels are dropped.

@hofnerb hofnerb added the bug label Aug 24, 2016
@hofnerb hofnerb self-assigned this Aug 24, 2016
@hofnerb hofnerb closed this in f3fe8d9 Aug 24, 2016
@sbrockhaus
Copy link
Member

@sbrockhaus sbrockhaus commented Aug 24, 2016

Reading your issue, I just wondered what happens within cvrisk() when in a fold a factor level is empty. I constructed the following example:

set.seed(123)
z <- factor(sample(1:5, 100, replace = TRUE), levels = 1:6)
y <- rnorm(100)
m <- mboost(y ~ bols(z))

## Create resampling folds 
myfolds <- cv(model.weights(m), "kfold")

# In the first fold, set all observations with factor level 1 to 0
# thus, in this fold this factor level is empty
myfolds[ z == 1 , 1] <- 0 

## cvrisk does not work for first fold
cv1 <- cvrisk(m, folds = myfolds)

## fit the model of the first fold by hand 
## works fine by dropping factor level
y_fold1 <- y[myfolds[ ,1] == 1]
z_fold1 <- z[myfolds[ ,1] == 1]
m_fold1 <- mboost(y_fold1 ~ bols(z_fold1))

## try to fit the same model using weights, breaks with error
m_fold1 <- mboost(y ~ bols(z), weights = myfolds[ , 1])

## Error in solve.default(XtX, crossprod(X, y), LINPACK = FALSE) : 
## system is computationally singular: reciprocal condition number = 2.43337e-18

Do you think this is a problem?

@sbrockhaus sbrockhaus reopened this Aug 24, 2016
@sbrockhaus
Copy link
Member

@sbrockhaus sbrockhaus commented Aug 24, 2016

@davidruegamer does this change in the mboost package affect your resampling using brandom()?

@davidruegamer
Copy link
Member

@davidruegamer davidruegamer commented Aug 24, 2016

@sbrockhaus you mean the bootstrapped "confidence intervals" for which resampling is done on subject-level? I actually did the droplevels by hand and as I do not have to validate each sample (just extracting coefficients), there should be no problem.

@hofnerb
Copy link
Member Author

@hofnerb hofnerb commented Aug 25, 2016

We modified cvrisk() such that it doesn't break if single folds break. This was considered reasonable as usually the remaining folds should be sufficient. I see no problem when we drop empty levels and cvrisk is used to estimate the optimal stopping iteration. In contrary, results are now based on more folds and thus more representative.

Regarding confidence intervals:

  1. You know that there is a funciton implementing this? See ?confint.mboost.
    The function was described in B. Hofner, T. Kneib, T. Hothorn (2016). "A Unified Framework of Constrained Regression". Statistics and Computing. 26:1-14. DOI 10.1007/s11222-014-9520-y
  2. If you construct CIs for factor variables droplevels might be a problem, yet, not using droplevels is a problem as well. The question is: What does it actually mean if a level was dropped? Is it equal to the level beeing estimated as zero? As I use predictions this should be somehow managable. What were your considerations @davidruegamer?

Currently, the following code breaks:

### check confidents intervals for factors with very small level frequencies
z <- factor(c(sample(1:5, 100, replace = TRUE), 6), levels = 1:6)
y <- rnorm(101)
mod <- mboost(y ~ bols(z))
confint(mod)
@hofnerb
Copy link
Member Author

@hofnerb hofnerb commented Aug 25, 2016

I moved this to a new issue as it touches a similar yet distinct problem. The original issue was solved with the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.