Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit tests for standard metric objectives #741

Merged
merged 24 commits into from
May 11, 2020
Merged

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented May 4, 2020

Closes #619

Adds the following tests:

  • input contains NaN
  • input contains inf
  • inputs are different lengths
  • inputs are zero lengths
  • probabilities are not in [0, 1] range
  • binary classification objective that doesn't need probability estimates gets input with more than two unique values

@angela97lin angela97lin self-assigned this May 4, 2020
@codecov
Copy link

codecov bot commented May 4, 2020

Codecov Report

Merging #741 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #741      +/-   ##
==========================================
+ Coverage   99.36%   99.38%   +0.01%     
==========================================
  Files         151      151              
  Lines        5378     5529     +151     
==========================================
+ Hits         5344     5495     +151     
  Misses         34       34              
Impacted Files Coverage Δ
evalml/exceptions/exceptions.py 100.00% <ø> (ø)
evalml/objectives/standard_metrics.py 100.00% <ø> (ø)
evalml/exceptions/__init__.py 100.00% <100.00%> (ø)
...alml/objectives/binary_classification_objective.py 100.00% <100.00%> (ø)
evalml/objectives/objective_base.py 100.00% <100.00%> (ø)
...alml/tests/objective_tests/test_fraud_detection.py 100.00% <100.00%> (ø)
evalml/tests/objective_tests/test_lead_scoring.py 100.00% <100.00%> (ø)
...lml/tests/objective_tests/test_standard_metrics.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5c8846...93a1f66. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review May 4, 2020 21:28
@angela97lin angela97lin requested a review from dsherry May 4, 2020 21:28
@angela97lin angela97lin removed the request for review from dsherry May 4, 2020 22:14
@angela97lin angela97lin marked this pull request as draft May 4, 2020 22:14
@angela97lin angela97lin marked this pull request as ready for review May 5, 2020 14:25
@angela97lin angela97lin requested a review from dsherry May 5, 2020 14:25
raise ValueError("Length of inputs is 0")
if len(y_predicted) != len(y_true):
raise DimensionMismatchError("Inputs have mismatched dimensions: y_predicted has shape {}, y_true has shape {}".format(len(y_predicted), len(y_true)))
if np.any(np.isnan(y_true)) or np.any(np.isinf(y_true)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I think you can say np.isnan(y_true).any()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Oh, I used np.any because of our previous discussion about numpy methods being faster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both call np.any! np.isnan will return a np.array. I believenp.any(value) and value.any() call the same method. Also lol this was a total nit-pick on my part, feel free to ignore 😂

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin this is great! It's nice to have this sort of thing in the codebase :) this will help people quickly diagnose problems with their data and problem setup. Nice going!

I left comments but I have two main suggestions. First, I left a discussion about what we should do for nans/infs in the input. Second, can you please also add this input validation to our custom objectives (fraud cost and lead scoring) too?

I could imagine us wanting something similar in the plot metrics, but let's handle that another time, in a separate PR.

@angela97lin angela97lin requested a review from dsherry May 7, 2020 19:46
@angela97lin angela97lin merged commit eee3ea6 into master May 11, 2020
@dsherry dsherry deleted the 619_obj_unit_tests branch October 29, 2020 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add unit tests for standard metric objectives and plot metrics
2 participants