Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

Closed
GeorBe opened this issue Dec 22, 2023 · 3 comments

Comments

@GeorBe
Copy link

GeorBe commented Dec 22, 2023

Describe the bug
Differences in the table of estimates
| CVXR.est| GLMNET.est|
|:-----------|--------:|----------:|
|(Intercept) | -0.285| -0.017|
|V1 | 0.000| -0.277|
|V2 | 0.000| 0.000|
|V3 | 0.000| 0.000|
|V4 | 0.000| -0.280|
|V5 | 0.000| 0.000|
|V6 | 0.000| 0.000|
|V7 | 0.000| 0.000|
|V8 | 0.000| 0.000|
|V9 | 0.000| 0.000|
|V10 | 0.000| 0.000|
|V11 | 0.000| 0.000|
|V12 | 0.000| 0.000|
|V13 | 0.000| 0.000|
|V14 | -0.065| -0.046|
|V15 | 0.000| 0.000|
|V16 | 0.000| 0.000|
|V17 | 0.000| 0.000|
|V18 | 0.000| 0.000|
|V19 | 0.000| 0.005|
|V20 | 0.000| 0.000|
To Reproduce
set.seed(123)
n2 <- 100; p2 <- 20; thresh <- 1e-12; lambda2 <- .05
x2 <- matrix(rnorm(n2* 10), n2, p2/2)
x21 <- matrix(sample(x=c(0,1),size=n2*10,replace=TRUE),n2,p2/2)
x2 <- cbind(x21,x2)
xDesign2 <- cbind(1, x2)
y2 <- sample(x = c(0, 1), size = n2, replace = TRUE)

lambda2 <- .045

fit2 <- glmnet(x2, y2, lambda = lambda2, thresh = thresh, family = "binomial")

beta2 <- Variable(p2 + 1)
obj2 <- (sum(xDesign2[y2 <= 0, ] %% beta2) + sum(logistic(-xDesign2 %% beta2))) / n2 +
lambda2 * p_norm(beta2[-1], 1)
prob2 <- Problem(Minimize(obj2))
result2 <- solve(prob2, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)

est.table2 <- data.frame("CVXR.est" = result2$getValue(beta2), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table2) <- rownames(coef(fit2))
knitr::kable(est.table2, digits = 3)

Expected behavior
Minor differences as described in the Tutorial page
Output
If applicable, include program output. If reporting a program crash, please include the entire stack trace.

Version
If you use reprex, include the session info via reprex(..., session_info = TRUE). Otherwise, paste the output of sessionInfo() in your R session in a verbatim section.
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.utf8 LC_CTYPE=English_Switzerland.utf8
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.utf8

time zone: Europe/Zurich
tzcode source: internal

1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] knitr_1.45 CVXR_1.0-11 RCAL_2.0 trust_0.1-8 WeightIt_0.14.2 cobalt_4.5.2
[7] glmnet_4.1-8 Matrix_1.6-1.1 hdm_0.3.1 mboost_2.9-9 stabs_0.6-4 dplyr_1.1.4
[13] haven_2.5.4 rando_0.2.0 polycor_0.8-1 copula_1.1-3 mnormt_2.1.1 mvtnorm_1.2-4

loaded via a namespace (and not attached):
[1] gtable_0.3.4 pspline_1.0-19 shape_1.4.6 xfun_0.41 ggplot2_3.4.4
[6] lattice_0.21-9 numDeriv_2016.8-1.1 tools_4.3.2 quadprog_1.5-8 vctrs_0.6.5
[11] generics_0.1.3 stats4_4.3.2 tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
[16] nnls_1.5 lifecycle_1.0.4 compiler_4.3.2 munsell_0.5.0 codetools_0.2-19
[21] ECOSolveR_0.5.5 gmp_0.7-3 Formula_1.2-5 pillar_1.9.0 crayon_1.5.2
[26] admisc_0.34 iterators_1.0.14 rpart_4.1.21 foreach_1.5.2 tidyselect_1.2.0
[31] inum_1.0-5 forcats_1.0.0 splines_4.3.2 pcaPP_2.0-4 gsl_2.1-8
[36] grid_4.3.2 colorspace_2.1-0 cli_3.6.2 magrittr_2.0.3 survival_3.5-7
[41] utf8_1.2.4 libcoin_1.0-10 Rmpfr_0.9-4 scales_1.3.0 backports_1.4.1
[46] bit64_4.0.5 bit_4.0.5 hms_1.1.3 ADGofTest_0.3 stabledist_0.7-1
[51] rlang_1.1.2 Rcpp_1.0.11 partykit_1.2-20 glue_1.6.2 R6_2.5.1
Additional context
Add any other context about the problem here.

Lastly

Don't forget to preview your report using the Preview tab. Unless we can see things clearly marked up, your report will most likely percolate down to the bottom of the pile.

@bnaras
Copy link
Collaborator

bnaras commented Dec 22, 2023

You have to be careful when comparing with glmnet results, because glmnet standardizes the design matrix but returns the coefficients on the original scale. See the standardization documentation of the glmnet function. So if you standardize the design matrix and pass to CVXR and multiply the returned coefficients by the standard deviations, it will match closely. The intercept will also have to be appropriately transformed by means of an easy formula that you can derive.

Further note that glmnet uses active set methods, so coefficients can be exact zeros. That may not be the case with CVXR, unless one uses an active set solver with CVXR.

@GeorBe
Copy link
Author

GeorBe commented Dec 23, 2023 via email

@bnaras
Copy link
Collaborator

bnaras commented Dec 24, 2023

Great. Please close the ticket when you get a chance.
PS. Also thanks for your suggestion: we will also be updating all the examples as part of a new release that we are currently working on.

@GeorBe GeorBe closed this as completed Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants