-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing glmnet results runs into issues with indicator variables and ultrasparse lambdas #124
Comments
You have to be careful when comparing with Further note that |
Thank you! I thought about it after I posted. Now I've re-run setting standardize=FALSE in glmnet and got perfect agreement.
However, in your tutorial examples page these issues are not mentioned. I simply followed the example.
Thank you so much,
Georgina
Den fredag 22 december 2023 kl. 22:39:35 CET, Balasubramanian Narasimhan ***@***.***> skrev:
You have to be careful when comparing with glmnet results, because glmnet standardizes the design matrix but returns the coefficients on the original scale. See the standardization documentation of the glmnet function. So if you standardize the design matrix and pass to CVXR and multiply the returned coefficients by the standard deviations, it will match closely. The intercept will also have to be appropriately transformed by means of an easy formula that you can derive.
Further note that glmnet uses active set methods, so coefficients can be exact zeros. That may not be the case with CVXR, unless one uses an active set solver with CVXR.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Great. Please close the ticket when you get a chance. |
Describe the bug
Differences in the table of estimates
| CVXR.est| GLMNET.est|
|:-----------|--------:|----------:|
|(Intercept) | -0.285| -0.017|
|V1 | 0.000| -0.277|
|V2 | 0.000| 0.000|
|V3 | 0.000| 0.000|
|V4 | 0.000| -0.280|
|V5 | 0.000| 0.000|
|V6 | 0.000| 0.000|
|V7 | 0.000| 0.000|
|V8 | 0.000| 0.000|
|V9 | 0.000| 0.000|
|V10 | 0.000| 0.000|
|V11 | 0.000| 0.000|
|V12 | 0.000| 0.000|
|V13 | 0.000| 0.000|
|V14 | -0.065| -0.046|
|V15 | 0.000| 0.000|
|V16 | 0.000| 0.000|
|V17 | 0.000| 0.000|
|V18 | 0.000| 0.000|
|V19 | 0.000| 0.005|
|V20 | 0.000| 0.000|
To Reproduce
set.seed(123)
n2 <- 100; p2 <- 20; thresh <- 1e-12; lambda2 <- .05
x2 <- matrix(rnorm(n2* 10), n2, p2/2)
x21 <- matrix(sample(x=c(0,1),size=n2*10,replace=TRUE),n2,p2/2)
x2 <- cbind(x21,x2)
xDesign2 <- cbind(1, x2)
y2 <- sample(x = c(0, 1), size = n2, replace = TRUE)
lambda2 <- .045
fit2 <- glmnet(x2, y2, lambda = lambda2, thresh = thresh, family = "binomial")
beta2 <- Variable(p2 + 1)
obj2 <- (sum(xDesign2[y2 <= 0, ] %% beta2) + sum(logistic(-xDesign2 %% beta2))) / n2 +
lambda2 * p_norm(beta2[-1], 1)
prob2 <- Problem(Minimize(obj2))
result2 <- solve(prob2, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)
est.table2 <- data.frame("CVXR.est" = result2$getValue(beta2), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table2) <- rownames(coef(fit2))
knitr::kable(est.table2, digits = 3)
Expected behavior
Minor differences as described in the Tutorial page
Output
If applicable, include program output. If reporting a program crash, please include the entire stack trace.
Version
If you use reprex, include the session info via
reprex(..., session_info = TRUE)
. Otherwise, paste the output ofsessionInfo()
in your R session in a verbatim section.R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_Switzerland.utf8 LC_CTYPE=English_Switzerland.utf8
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.utf8
time zone: Europe/Zurich
tzcode source: internal
1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.45 CVXR_1.0-11 RCAL_2.0 trust_0.1-8 WeightIt_0.14.2 cobalt_4.5.2
[7] glmnet_4.1-8 Matrix_1.6-1.1 hdm_0.3.1 mboost_2.9-9 stabs_0.6-4 dplyr_1.1.4
[13] haven_2.5.4 rando_0.2.0 polycor_0.8-1 copula_1.1-3 mnormt_2.1.1 mvtnorm_1.2-4
loaded via a namespace (and not attached):
[1] gtable_0.3.4 pspline_1.0-19 shape_1.4.6 xfun_0.41 ggplot2_3.4.4
[6] lattice_0.21-9 numDeriv_2016.8-1.1 tools_4.3.2 quadprog_1.5-8 vctrs_0.6.5
[11] generics_0.1.3 stats4_4.3.2 tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
[16] nnls_1.5 lifecycle_1.0.4 compiler_4.3.2 munsell_0.5.0 codetools_0.2-19
[21] ECOSolveR_0.5.5 gmp_0.7-3 Formula_1.2-5 pillar_1.9.0 crayon_1.5.2
[26] admisc_0.34 iterators_1.0.14 rpart_4.1.21 foreach_1.5.2 tidyselect_1.2.0
[31] inum_1.0-5 forcats_1.0.0 splines_4.3.2 pcaPP_2.0-4 gsl_2.1-8
[36] grid_4.3.2 colorspace_2.1-0 cli_3.6.2 magrittr_2.0.3 survival_3.5-7
[41] utf8_1.2.4 libcoin_1.0-10 Rmpfr_0.9-4 scales_1.3.0 backports_1.4.1
[46] bit64_4.0.5 bit_4.0.5 hms_1.1.3 ADGofTest_0.3 stabledist_0.7-1
[51] rlang_1.1.2 Rcpp_1.0.11 partykit_1.2-20 glue_1.6.2 R6_2.5.1
Additional context
Add any other context about the problem here.
Lastly
Don't forget to preview your report using the
Preview
tab. Unless we can see things clearly marked up, your report will most likely percolate down to the bottom of the pile.The text was updated successfully, but these errors were encountered: