Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test_size parameter to ClassImbalanceDataCheck #3341

Merged
merged 5 commits into from
Mar 1, 2022

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #3334


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton force-pushed the 3334-add-test-size-class-imbalance branch from 3c7213c to 50e7bdc Compare February 17, 2022 22:55
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #3341 (370104d) into main (f4a671d) will decrease coverage by 0.7%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3341     +/-   ##
=======================================
- Coverage   99.6%   99.0%   -0.6%     
=======================================
  Files        329     329             
  Lines      31977   32000     +23     
=======================================
- Hits       31847   31649    -198     
- Misses       130     351    +221     
Impacted Files Coverage Δ
evalml/data_checks/class_imbalance_data_check.py 100.0% <100.0%> (ø)
...ta_checks_tests/test_class_imbalance_data_check.py 100.0% <100.0%> (ø)
evalml/automl/pipeline_search_plots.py 17.9% <0.0%> (-82.1%) ⬇️
...l/tests/automl_tests/test_pipeline_search_plots.py 23.3% <0.0%> (-76.7%) ⬇️
...ests/automl_tests/test_automl_search_regression.py 74.3% <0.0%> (-20.4%) ⬇️
.../automl_tests/test_automl_search_classification.py 83.8% <0.0%> (-12.6%) ⬇️
evalml/tests/automl_tests/test_automl_utils.py 91.1% <0.0%> (-8.9%) ⬇️
...lml/tests/automl_tests/test_iterative_algorithm.py 92.3% <0.0%> (-7.7%) ⬇️
evalml/automl/utils.py 98.5% <0.0%> (-1.5%) ⬇️
evalml/automl/automl_search.py 99.6% <0.0%> (-0.1%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4a671d...370104d. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this increasing of the targets make me wonder if there's any consolidation that can be done in the evalml/tests/data_checks_tests/test_class_imbalance_data_check.py module to perhaps push some of those synthetic targets out of the individual tests and either to the module level or to a conftest.py further up. But other than that, looks gucci.

@@ -22,9 +24,11 @@ class ClassImbalanceDataCheck(DataCheck):
min_samples (int): The minimum number of samples per accepted class. If the minority class is both below the threshold and min_samples,
then we consider this severely imbalanced. Must be greater than 0. Defaults to 100.
num_cv_folds (int): The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning. Defaults to 3.
test_size (None, float, int): Percentage of test set size. Used to calculate class imbalance prior to splitting the
data into training and validation/test sets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hyper nit - won't you need a Raises: block for the new ValueError? I guess not given the existing ValueError isn't complained about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to ValueError, lame that it's not caught by our linter because we move our docstring from init to our class docstring (IIRC for docs) so it doesn't automatically pick it up 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that darglint doesn't check for raises blocks in constructors. @angela97lin do you know?

Lol Angela and I commented at the same time. Adding the missing Raises block!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @freddyaboulton, I don't think it does a good job for class docstrings in general (I think I recently came across a case where we added a new parameter but it didn't complain about the missing docstring).

If we had this docstring in the constructor (init) it might actually pick it up, but IIRC, then our docs won't look like this: https://evalml.alteryx.com/en/stable/autoapi/evalml/data_checks/class_imbalance_data_check/index.html#evalml.data_checks.class_imbalance_data_check.ClassImbalanceDataCheck

We'd have to add back init and click on the init method link specifically to see the parameters, rather than just at the class page.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever use of numpy array multiplication to update the tests, LGTM! Thanks @freddyaboulton 😁

@@ -22,9 +24,11 @@ class ClassImbalanceDataCheck(DataCheck):
min_samples (int): The minimum number of samples per accepted class. If the minority class is both below the threshold and min_samples,
then we consider this severely imbalanced. Must be greater than 0. Defaults to 100.
num_cv_folds (int): The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning. Defaults to 3.
test_size (None, float, int): Percentage of test set size. Used to calculate class imbalance prior to splitting the
data into training and validation/test sets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to ValueError, lame that it's not caught by our linter because we move our docstring from init to our class docstring (IIRC for docs) so it doesn't automatically pick it up 😢

@freddyaboulton freddyaboulton enabled auto-merge (squash) March 1, 2022 21:20
@freddyaboulton freddyaboulton merged commit f7853d5 into main Mar 1, 2022
@chukarsten chukarsten mentioned this pull request Mar 3, 2022
@freddyaboulton freddyaboulton deleted the 3334-add-test-size-class-imbalance branch May 13, 2022 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional test_size parameter to ClassImbalanceDataCheck
3 participants