Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leukemia Data #207

Open
szcf-weiya opened this issue Sep 27, 2019 · 3 comments
Open

Leukemia Data #207

szcf-weiya opened this issue Sep 27, 2019 · 3 comments

Comments

@szcf-weiya
Copy link
Owner

Paper: Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., … Lander, E. S. (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), 531–537. https://doi.org/10.1126/science.286.5439.531
Data: http://portals.broadinstitute.org/cgi-bin/cancer/publications/view/43
Applications in ESL: Section 18.4
image

szcf-weiya added a commit that referenced this issue Sep 27, 2019
@szcf-weiya
Copy link
Owner Author

Reproduce Fig. 18.5

raw_data
scaled_data

@szcf-weiya
Copy link
Owner Author

R version

> lasso.path

Call:  glmnet(x = t(train_X), y = train_y, family = "binomial", lambda = grid) 

       Df    %Dev    Lambda
  [1,]  1 0.02309 0.3679000
  [2,]  1 0.03381 0.3642000
  [3,]  1 0.04433 0.3605000
  [4,]  2 0.05532 0.3567000
  [5,]  2 0.06634 0.3530000
...
 [96,] 16 0.96520 0.0151900
 [97,] 17 0.97370 0.0114700
 [98,] 17 0.98230 0.0077610
 [99,] 18 0.99070 0.0040480
[100,] 23 0.99920 0.0003355
> elnet.path

Call:  glmnet(x = t(train_X), y = train_y, family = "binomial", alpha = 0.8,      lambda = grid) 

       Df   %Dev    Lambda
  [1,]  4 0.2187 0.3679000
  [2,]  4 0.2269 0.3642000
  [3,]  4 0.2350 0.3605000
  [4,]  4 0.2432 0.3567000
  [5,]  4 0.2512 0.3530000
...
 [96,] 29 0.9700 0.0151900
 [97,] 30 0.9773 0.0114700
 [98,] 32 0.9846 0.0077610
 [99,] 37 0.9919 0.0040480
[100,] 45 0.9993 0.0003355

Julia version

julia> lasso_path

Logistic GLMNet Solution Path (100 solutions for 7129 predictors in 4994 passes):
─────────────────────────────────
       df    pct_dev            λ
─────────────────────────────────
  [1]  21  0.999225   0.000335463
  [2]  18  0.990733   0.00404803 
  [3]  17  0.982281   0.00776059 
  [4]  17  0.973758   0.0114732  
  [5]  16  0.9652     0.0151857  
...
 [96]   2  0.0662522  0.353029   
 [97]   2  0.0552272  0.356742   
 [98]   1  0.0443347  0.360454   
 [99]   1  0.0338104  0.364167   
[100]   1  0.0230916  0.367879   
─────────────────────────────────
julia> elnet_path

Logistic GLMNet Solution Path (100 solutions for 7129 predictors in 4683 passes):
────────────────────────────────
       df   pct_dev            λ
────────────────────────────────
  [1]  46  0.99932   0.000335463
  [2]  37  0.991922  0.00404803 
  [3]  32  0.984638  0.00776059 
  [4]  30  0.97734   0.0114732  
  [5]  29  0.970059  0.0151857  
...
 [96]   4  0.251197  0.353029   
 [97]   4  0.243146  0.356742   
 [98]   4  0.235044  0.360454   
 [99]   4  0.226891  0.364167   
[100]   4  0.218684  0.367879   
────────────────────────────────

No much difference, and actually the Julia version is just a wrapper of the Fortran code, while the R version actually can be a wrapper for the Fortran code.

szcf-weiya added a commit that referenced this issue Sep 27, 2019
@szcf-weiya
Copy link
Owner Author

Reproduce Fig. 18.6

err_and_dev_vs_log_lambda

szcf-weiya added a commit that referenced this issue Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant