Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional checks to InvalidTargetDataCheck to handle invalid target data types #929

Merged
merged 8 commits into from Jul 16, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jul 13, 2020

Closes #916

@codecov
Copy link

codecov bot commented Jul 14, 2020

Codecov Report

Merging #929 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #929   +/-   ##
=======================================
  Coverage   99.85%   99.86%           
=======================================
  Files         170      170           
  Lines        8565     8593   +28     
=======================================
+ Hits         8553     8581   +28     
  Misses         12       12           
Impacted Files Coverage Δ
evalml/data_checks/invalid_targets_data_check.py 100.00% <100.00%> (ø)
...ta_checks_tests/test_invalid_targets_data_check.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2bad29...b691d74. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review Jul 14, 2020
return [DataCheckError("{} row(s) ({}%) of target values are null".format(null_rows.sum(), null_rows.mean() * 100), self.name)]
if null_rows.any():
messages.append(DataCheckError("{} row(s) ({}%) of target values are null".format(null_rows.sum(), null_rows.mean() * 100), self.name))
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'bool']
Copy link
Contributor

@freddyaboulton freddyaboulton Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also defining numerics in PR #917. Might be best to define once in a common location and then import it wherever its needed.

Copy link
Contributor Author

@angela97lin angela97lin Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! And same with having supported_data_types somewhere common.

value_counts = y.value_counts()
if len(value_counts) == 2 and y.dtype in numerics:
unique_values = value_counts.index.tolist()
if set(unique_values) != set([0, 1]):
Copy link
Collaborator

@dsherry dsherry Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if unique_values has float dtype? Does this still work properly?

Copy link
Contributor Author

@angela97lin angela97lin Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! But added test_invalid_target_data_check_numeric_binary_classification_valid_float so we don't have to wonder 😄

Copy link
Collaborator

@dsherry dsherry left a comment

👍 🎆

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

@angela97lin Good to merge but please remember to consolidate the definition of numerics in #932 !

@angela97lin angela97lin merged commit 07d5bea into main Jul 16, 2020
2 checks passed
@dsherry dsherry mentioned this pull request Jul 16, 2020
@angela97lin angela97lin deleted the 916_invalid_targets_data_check branch Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a DataCheck to handle invalid target data types
3 participants