-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
method.CC class fails with duplicated algorithms #99
Comments
Thanks David, this seems fair to me. Also helpful to know that CC_* are being used. Out of curiosity, is there an advantage to using the CC_ algorithms over NNLS? Maybe the fact that it's a true convex combination makes it theoretically preferable? |
Sure thing. I haven't noticed an appreciable difference in performance comparing NNLS vs. CC (though I haven't studied it directly). I think the CC method is better motivated by the theory in that the convex weights minimize cross-validated risk directly. NNLS (as far as I can tell) minimizes cv-risk over non-negative weights and then re-scales by the sum. However, |
It occurs to me that if you're interested in deploying this, it might be worth benchmarking the line colDup <- which(duplicated(Z, MARGIN = 2)) in situations with large Z. A faster option is to check for duplicated values in Thanks for the response! |
Good point, maybe the best approach is check for duplication in cvRisk, and if there is duplication use duplicated() on Z to find the duplicated algorithms. Could even restrict the duplicated() analysis to the subset of Z with duplicates in cvRisk. |
Yep, that's probably the smart way to do it. Just a bit of bookkeeping to add. |
I just ran into a bug with the CC methods. They don't like it when the columns of
Z
are duplicated.solve.QP
throws an error becauseD
is not pd. While it could be user error if methods end up duplicated (see my example below), it can also pop up e.g., whenSL.glm
andSL.step
are used andSL.step
lands on the full model. Here's an example that reproduces:Here's my proposed fix -- get rid of duplicated columns in
Z
before callingcompute
, throw a warning, and add in 0 weight for algorithms that are duplicated.The text was updated successfully, but these errors were encountered: