Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

GeorBe · 2023-12-22T19:48:18Z

Describe the bug
Differences in the table of estimates
| CVXR.est| GLMNET.est|
|:-----------|--------:|----------:|
|(Intercept) | -0.285| -0.017|
|V1 | 0.000| -0.277|
|V2 | 0.000| 0.000|
|V3 | 0.000| 0.000|
|V4 | 0.000| -0.280|
|V5 | 0.000| 0.000|
|V6 | 0.000| 0.000|
|V7 | 0.000| 0.000|
|V8 | 0.000| 0.000|
|V9 | 0.000| 0.000|
|V10 | 0.000| 0.000|
|V11 | 0.000| 0.000|
|V12 | 0.000| 0.000|
|V13 | 0.000| 0.000|
|V14 | -0.065| -0.046|
|V15 | 0.000| 0.000|
|V16 | 0.000| 0.000|
|V17 | 0.000| 0.000|
|V18 | 0.000| 0.000|
|V19 | 0.000| 0.005|
|V20 | 0.000| 0.000|
To Reproduce
set.seed(123)
n2 <- 100; p2 <- 20; thresh <- 1e-12; lambda2 <- .05
x2 <- matrix(rnorm(n2* 10), n2, p2/2)
x21 <- matrix(sample(x=c(0,1),size=n2*10,replace=TRUE),n2,p2/2)
x2 <- cbind(x21,x2)
xDesign2 <- cbind(1, x2)
y2 <- sample(x = c(0, 1), size = n2, replace = TRUE)

lambda2 <- .045

fit2 <- glmnet(x2, y2, lambda = lambda2, thresh = thresh, family = "binomial")

beta2 <- Variable(p2 + 1)
obj2 <- (sum(xDesign2[y2 <= 0, ] %% beta2) + sum(logistic(-xDesign2 %% beta2))) / n2 +
lambda2 * p_norm(beta2[-1], 1)
prob2 <- Problem(Minimize(obj2))
result2 <- solve(prob2, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)

est.table2 <- data.frame("CVXR.est" = result2$getValue(beta2), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table2) <- rownames(coef(fit2))
knitr::kable(est.table2, digits = 3)

Expected behavior
Minor differences as described in the Tutorial page
Output
If applicable, include program output. If reporting a program crash, please include the entire stack trace.

Version
If you use reprex, include the session info via reprex(..., session_info = TRUE). Otherwise, paste the output of sessionInfo() in your R session in a verbatim section.
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_Switzerland.utf8 LC_CTYPE=English_Switzerland.utf8
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.utf8

time zone: Europe/Zurich
tzcode source: internal

1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] knitr_1.45 CVXR_1.0-11 RCAL_2.0 trust_0.1-8 WeightIt_0.14.2 cobalt_4.5.2
[7] glmnet_4.1-8 Matrix_1.6-1.1 hdm_0.3.1 mboost_2.9-9 stabs_0.6-4 dplyr_1.1.4
[13] haven_2.5.4 rando_0.2.0 polycor_0.8-1 copula_1.1-3 mnormt_2.1.1 mvtnorm_1.2-4

loaded via a namespace (and not attached):
[1] gtable_0.3.4 pspline_1.0-19 shape_1.4.6 xfun_0.41 ggplot2_3.4.4
[6] lattice_0.21-9 numDeriv_2016.8-1.1 tools_4.3.2 quadprog_1.5-8 vctrs_0.6.5
[11] generics_0.1.3 stats4_4.3.2 tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
[16] nnls_1.5 lifecycle_1.0.4 compiler_4.3.2 munsell_0.5.0 codetools_0.2-19
[21] ECOSolveR_0.5.5 gmp_0.7-3 Formula_1.2-5 pillar_1.9.0 crayon_1.5.2
[26] admisc_0.34 iterators_1.0.14 rpart_4.1.21 foreach_1.5.2 tidyselect_1.2.0
[31] inum_1.0-5 forcats_1.0.0 splines_4.3.2 pcaPP_2.0-4 gsl_2.1-8
[36] grid_4.3.2 colorspace_2.1-0 cli_3.6.2 magrittr_2.0.3 survival_3.5-7
[41] utf8_1.2.4 libcoin_1.0-10 Rmpfr_0.9-4 scales_1.3.0 backports_1.4.1
[46] bit64_4.0.5 bit_4.0.5 hms_1.1.3 ADGofTest_0.3 stabledist_0.7-1
[51] rlang_1.1.2 Rcpp_1.0.11 partykit_1.2-20 glue_1.6.2 R6_2.5.1
Additional context
Add any other context about the problem here.

Lastly

Don't forget to preview your report using the Preview tab. Unless we can see things clearly marked up, your report will most likely percolate down to the bottom of the pile.

The text was updated successfully, but these errors were encountered:

bnaras · 2023-12-22T21:39:22Z

You have to be careful when comparing with glmnet results, because glmnet standardizes the design matrix but returns the coefficients on the original scale. See the standardization documentation of the glmnet function. So if you standardize the design matrix and pass to CVXR and multiply the returned coefficients by the standard deviations, it will match closely. The intercept will also have to be appropriately transformed by means of an easy formula that you can derive.

Further note that glmnet uses active set methods, so coefficients can be exact zeros. That may not be the case with CVXR, unless one uses an active set solver with CVXR.

GeorBe · 2023-12-23T16:53:47Z

Thank you! I thought about it after I posted. Now I've re-run setting standardize=FALSE in glmnet and got perfect agreement. However, in your tutorial examples page these issues are not mentioned. I simply followed the example. Thank you so much, Georgina Den fredag 22 december 2023 kl. 22:39:35 CET, Balasubramanian Narasimhan ***@***.***> skrev: You have to be careful when comparing with glmnet results, because glmnet standardizes the design matrix but returns the coefficients on the original scale. See the standardization documentation of the glmnet function. So if you standardize the design matrix and pass to CVXR and multiply the returned coefficients by the standard deviations, it will match closely. The intercept will also have to be appropriately transformed by means of an easy formula that you can derive. Further note that glmnet uses active set methods, so coefficients can be exact zeros. That may not be the case with CVXR, unless one uses an active set solver with CVXR. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

bnaras · 2023-12-24T13:55:05Z

Great. Please close the ticket when you get a chance.
PS. Also thanks for your suggestion: we will also be updating all the examples as part of a new release that we are currently working on.

GeorBe closed this as completed Jan 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

GeorBe commented Dec 22, 2023

bnaras commented Dec 22, 2023

GeorBe commented Dec 23, 2023 via email

bnaras commented Dec 24, 2023 •

edited

Loading

Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124

Comments

GeorBe commented Dec 22, 2023

bnaras commented Dec 22, 2023

GeorBe commented Dec 23, 2023 via email

bnaras commented Dec 24, 2023 • edited Loading

bnaras commented Dec 24, 2023 •

edited

Loading