-
Notifications
You must be signed in to change notification settings - Fork 92
Add test_size parameter to ClassImbalanceDataCheck #3341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3c7213c to
50e7bdc
Compare
Codecov Report
@@ Coverage Diff @@
## main #3341 +/- ##
=======================================
- Coverage 99.6% 99.0% -0.6%
=======================================
Files 329 329
Lines 31977 32000 +23
=======================================
- Hits 31847 31649 -198
- Misses 130 351 +221
Continue to review full report at Codecov.
|
50e7bdc to
fd66543
Compare
fd66543 to
76e4d4b
Compare
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this increasing of the targets make me wonder if there's any consolidation that can be done in the evalml/tests/data_checks_tests/test_class_imbalance_data_check.py module to perhaps push some of those synthetic targets out of the individual tests and either to the module level or to a conftest.py further up. But other than that, looks gucci.
| then we consider this severely imbalanced. Must be greater than 0. Defaults to 100. | ||
| num_cv_folds (int): The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning. Defaults to 3. | ||
| test_size (None, float, int): Percentage of test set size. Used to calculate class imbalance prior to splitting the | ||
| data into training and validation/test sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hyper nit - won't you need a Raises: block for the new ValueError? I guess not given the existing ValueError isn't complained about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to ValueError, lame that it's not caught by our linter because we move our docstring from init to our class docstring (IIRC for docs) so it doesn't automatically pick it up 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that darglint doesn't check for raises blocks in constructors. @angela97lin do you know?
Lol Angela and I commented at the same time. Adding the missing Raises block!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah @freddyaboulton, I don't think it does a good job for class docstrings in general (I think I recently came across a case where we added a new parameter but it didn't complain about the missing docstring).
If we had this docstring in the constructor (init) it might actually pick it up, but IIRC, then our docs won't look like this: https://evalml.alteryx.com/en/stable/autoapi/evalml/data_checks/class_imbalance_data_check/index.html#evalml.data_checks.class_imbalance_data_check.ClassImbalanceDataCheck
We'd have to add back init and click on the init method link specifically to see the parameters, rather than just at the class page.
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever use of numpy array multiplication to update the tests, LGTM! Thanks @freddyaboulton 😁
| then we consider this severely imbalanced. Must be greater than 0. Defaults to 100. | ||
| num_cv_folds (int): The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning. Defaults to 3. | ||
| test_size (None, float, int): Percentage of test set size. Used to calculate class imbalance prior to splitting the | ||
| data into training and validation/test sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to ValueError, lame that it's not caught by our linter because we move our docstring from init to our class docstring (IIRC for docs) so it doesn't automatically pick it up 😢
Pull Request Description
Fixes #3334
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rstto include this pull request by adding :pr:123.