Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan #63

Closed
wants to merge 3 commits into from
Closed

Nan #63

wants to merge 3 commits into from

Conversation

theadityasam
Copy link
Collaborator

The NAN issue will be fixed and cv function will be made functional

@theadityasam theadityasam mentioned this pull request May 29, 2019
@tdhock
Copy link
Collaborator

tdhock commented May 29, 2019

TODO for the cv function ... please do NOT re-shuffle folds if there is one that has either all left censored outputs or all right censored outputs. in that case please just stop with an informative error that tells the user they need to manually specify the foldid argument. here is some similar code in one of my other packages, https://github.com/tdhock/penaltyLearning/blob/master/R/IntervalRegression.R#L189

@tdhock
Copy link
Collaborator

tdhock commented May 29, 2019

also @anujkhare can you please give @theadityasam push access to your repo so he can create future branches/PRs in your repo? (I think it would make it simpler to keep track of the project if everything is in your repo)

@theadityasam
Copy link
Collaborator Author

theadityasam commented May 29, 2019

Okay, won't re-shuffle the folds. Also, for the censored data error message, if I implement a method to check in the R code itself, I'll have to go through the dataset twice - once for checking whether the model can be fit and then again while assigning the censorship types in get_censoring_types of iregnet_fit.cpp which might affect the run time of iregnet.
https://github.com/theadityasam/iregnet/blob/nan/src/iregnet_fit.cpp#L401

I believe, it would be better to write a new piece of R code that checks for the censorship condition and assigns the censorship type done similar to what is done in
https://github.com/theadityasam/iregnet/blob/nan/src/iregnet_fit.cpp#L401
The result will be then passed as argument to the C++ code. This way, we don't need to go through the dataset twice.

@tdhock
Copy link
Collaborator

tdhock commented May 29, 2019

checking in R code will not result in any significant slowdown if you use vector operations

@theadityasam
Copy link
Collaborator Author

theadityasam commented May 29, 2019

Okayy, will write a new R snippet for the checking

@tdhock
Copy link
Collaborator

tdhock commented Jun 5, 2019

try debugging using print statements or gdb https://tdhock.github.io/blog/2019/gdb/

@anujkhare
Copy link
Owner

@tdhock @theadityasam investigated the cases where NaNs are still produced. I think that they're cases where the unregularized solution does not exist. survreg also produces inf values.

The ideal behavior is that we should fit till the lambda value that is well-behaved, throw a warning for the next lambda that produces a NaN, and then return an iregnet object still. There may be better ways to calculate the lambda path, but I'm not sure yet.

@theadityasam - anything to add? We should make the corresponding changes for this.

@anujkhare
Copy link
Owner

There is a reference to this in section 2.3 of this paper about glmnet. For the cases where the number of covariates (p) > number of samples (n), the unregularized solution is undefined (beta shoots to inf).

They ignore solutions for lambda values close to 0 in such cases.

@tdhock
Copy link
Collaborator

tdhock commented Jun 16, 2019

that is interesting. do you have any concrete p>n examples that can be coded as tests? it makes sense to me to only consider large lambda values in that case, and still return an iregnet object, with a warning.

@anujkhare anujkhare mentioned this pull request Jun 24, 2019
@anujkhare
Copy link
Owner

Closing since this has been merged into the cv branch: #54 .

@anujkhare anujkhare closed this Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants