-
Notifications
You must be signed in to change notification settings - Fork 0
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect R-squared values for lasso models #2
Comments
Previous results were based on an incorrect method for retrieving the R-squared of lasso models. See #2 for more detail. After evaluating several methods (see https://gist.github.com/dhimmel/588d64a73fa4fef02c8f/a256479897a1a9bc63b5b7985df1e1b2ad8fd1e8), a fix was implemented in 65f4d8c. Analysis was rerun with fix. Only files that meaningfully changed were committed.
Corrected R2 valuesI updated our analysis with the correct lasso R2 values. The old (incorrect) and new (correct) values are:
For all four cancers, the faulty method underestimated the lasso R2. The underestimation was minimal for lung cancer and largest for prostate cancer. The new values are more concordant with the best-subset R2 values. As expected, the best-subset values are still higher, but now the discrepancy is smaller. The conclusions of our study are not affected by this change. To contextualize the change, the old values suggest that the best-subset approach overfit more compared to the new values. However, the main conclusions we drew from the lasso approach were based on the models, which were not affected by this issue. Essentially, the lasso models now appear to explain slightly more variation in cancer incidence. Errors in the publicationAccordingly, the following paragraph of the paper has errors. The bolded values should respectively be replaced with 69%, 55%, 32%, 15%:
In addition, the R2 column of Table 3 should be updated according to the above table. You may notice that the old lasso R2 values for colorectal and prostate models differ minimally between the paragraph and table. The table contains the correct incorrect values—the two paragraph values were not properly updated in the manuscript text at 4352de6. |
Comments added on PeerJI added comments on the online PeerJ article using the questions feature. Now both Table 3 and Paragraph 37 reference the inaccuracy and link to this issue. |
The
glmnet
upgrade to version 2 introduced a bug where themethods
package is not properly loaded. In the course of diagnosing that issue, I discovered a second issue which was brought to light by the upgrade. I briefly mentioned the second issue before knowing its cause:Now, I have tracked down the cause. We were improperly computing R2 values for our lasso models. I corrected the faulty code after evaluating several methods for the R2 computation.
Prior to the fix, we were extracting R2 values directly from a
cv.glmnet
object. This Class is poorly documented and the glmnet vignette now cautions:So essentially, we were reporting an R2 for a model based on a λ evaluated during cross-validation, but not the model with the optimal λ that we intended. Our faulty method for extracting R2 started throwing an error due to a
glmnet
update that brought a:We will keep this thread updated with information on this issue.
The text was updated successfully, but these errors were encountered: