-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix floating point issues (Issue #538) #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Create Release 0.4.1
… of floats (ratio of samples)
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your fix. Could you please also add a unit test to test/test_evaluation/test_train_evaluator.py?
Codecov Report
@@ Coverage Diff @@
## development #589 +/- ##
===============================================
- Coverage 78.65% 78.61% -0.04%
===============================================
Files 130 130
Lines 10119 10120 +1
===============================================
- Hits 7959 7956 -3
- Misses 2160 2164 +4
Continue to review full report at Codecov.
|
|
@mfeurer I commited some changes and added unit tests. I am uncertain in what's the supposed behaviour for multilabel classification tasks. Should I keep this behaviour for both shuffle and not shuffle versions? should I give the train_size has a ratio of the flatten or non flatten version? |
|
Thanks for adding the tests. Looking at them they appear to work on rather simple data. Do they fail without your fix? If no, could you please alter the data such that the unit test would fail without your fix? Also, could you please remove the pytest_cache files from the pull request?
y is only ravelled for non-multiclass problems
The non-flattened version. |
Yes, they fail without my fix. The issue was that for some train_sizes this line
Yeah, sorry about that. Done
|
|
I changed the solution for a simpler version that it seems to be working ok too. Instead of providing the number of samples as the train and test size (integer) , I provide only the test_size (float). This way, sklearn won't do any problematic computation and train_size will always be the complement of test_size. If you would rather go with the previous version I can revert the changes, but I think is more readable and simpler this way. |
|
Thanks for your comments and code updates. The code looks much simpler now. Could you please fix the PEP8 issues, then I'd be happy to merge. |
|
Done! |
|
I just deleted the a c file which I believe sneaked in by accident. Thanks a lot! |
Fix #538 Issue: Floating point issues when choosing different holdout set train sizes.
Pass train_size and test_size as integers (number of samples) instead of passing floats (ratio). This way we prevent floating point errors in the multiplication (n_samples * train_size/test_size) that happened in some specific cases.