Add problematic target data check #814

angela97lin · 2020-05-27T18:23:30Z

Closes #710 by adding InvalidTargetsDataCheck, appending it to DefaultDataChecks run by AutoML

InvalidTargetsDataCheck currently only checks if there are any NaN/None values in the target labels.

codecov · 2020-05-27T21:07:23Z

Codecov Report

Merging #814 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #814   +/-   ##
=======================================
  Coverage   99.67%   99.67%           
=======================================
  Files         186      188    +2     
  Lines        7295     7338   +43     
=======================================
+ Hits         7271     7314   +43     
  Misses         24       24

Impacted Files	Coverage Δ
evalml/data_checks/__init__.py	`100.00% <100.00%> (ø)`
evalml/data_checks/default_data_checks.py	`100.00% <100.00%> (ø)`
evalml/data_checks/invalid_targets_data_check.py	`100.00% <100.00%> (ø)`
evalml/tests/data_checks_tests/test_data_checks.py	`100.00% <100.00%> (ø)`
...ta_checks_tests/test_invalid_targets_data_check.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c55e109...9fbcb54. Read the comment docs.

evalml/data_checks/invalid_targets_data_check.py

dsherry · 2020-06-02T20:19:30Z

evalml/data_checks/invalid_targets_data_check.py

+            y = pd.Series(y)
+        null_rows = y.isnull()
+        error_msg = "Row '{}' contains a null value"
+        return [DataCheckError(error_msg.format(row_index), self.name) for row_index, row_value in null_rows.items() if row_value]


I don't think we need an error for each row. Let's just return one error saying the target contains a missing value, yeah?

what if we put the count and % of the null values in the error message? 1 row vs 50% of rows is a very different error

@dsherry @kmax12 Something along the lines of "1 row(s) (50%) of rows are null"?

that's what i was thinking

evalml/tests/data_checks_tests/test_data_checks.py

dsherry

Good stuff. I think we should return a single error though.

evalml/data_checks/invalid_targets_data_check.py

evalml/tests/data_checks_tests/test_data_checks.py

dsherry

Almost there, left another comment about what info is included in the error

evalml/data_checks/invalid_targets_data_check.py

kmax12

LGTM

dsherry

Looks great! 🚢

init

7c04bb6

angela97lin self-assigned this May 27, 2020

angela97lin added 4 commits May 27, 2020 15:13

test and changelog

35bcb59

add to api ref

8bc8d3e

test formatting

9e8ec84

replace nan with none

5b1b82f

angela97lin added 8 commits May 27, 2020 17:20

add to defaultdatachecks

4b3e87b

Merge branch 'master' into 710_target_check

505210c

remove detect from name

83e904c

update file names

c45a089

merging

076f740

Merge branch 'master' into 710_target_check

63d4e36

merging

4bfffea

cleanup

84edf15

angela97lin marked this pull request as ready for review June 1, 2020 15:13

angela97lin requested a review from dsherry June 1, 2020 15:13

dsherry reviewed Jun 2, 2020

View reviewed changes

evalml/data_checks/invalid_targets_data_check.py Outdated Show resolved Hide resolved

dsherry reviewed Jun 2, 2020

View reviewed changes

evalml/tests/data_checks_tests/test_data_checks.py Show resolved Hide resolved

dsherry suggested changes Jun 2, 2020

View reviewed changes

angela97lin added 2 commits June 2, 2020 18:05

address comments, convert to just one error for all rows

8445a36

add tests

f2b8bc9

angela97lin requested a review from dsherry June 2, 2020 22:15

dsherry reviewed Jun 3, 2020

View reviewed changes

evalml/data_checks/invalid_targets_data_check.py Outdated Show resolved Hide resolved

dsherry reviewed Jun 3, 2020

View reviewed changes

evalml/tests/data_checks_tests/test_data_checks.py Show resolved Hide resolved

dsherry suggested changes Jun 3, 2020

View reviewed changes

dsherry reviewed Jun 3, 2020

View reviewed changes

evalml/data_checks/invalid_targets_data_check.py Outdated Show resolved Hide resolved

dsherry reviewed Jun 3, 2020

View reviewed changes

evalml/data_checks/invalid_targets_data_check.py Show resolved Hide resolved

cleanup message for error

1d6519a

angela97lin added 2 commits June 3, 2020 14:21

fix daatachecks tests

633e1f7

fix test

d4fd69c

angela97lin requested review from dsherry and kmax12 June 3, 2020 20:27

kmax12 approved these changes Jun 4, 2020

View reviewed changes

Merge branch 'master' into 710_target_check

9fbcb54

dsherry approved these changes Jun 4, 2020

View reviewed changes

angela97lin merged commit f21a2aa into master Jun 4, 2020

angela97lin deleted the 710_target_check branch June 4, 2020 15:27

angela97lin mentioned this pull request Jun 30, 2020

Release v0.11.0 #901

Merged

dsherry mentioned this pull request Jun 30, 2020

SimpleImputer fails if data contains None instead of np.nan #540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add problematic target data check #814

Add problematic target data check #814

angela97lin commented May 27, 2020 •

edited

codecov bot commented May 27, 2020 •

edited

dsherry Jun 2, 2020

kmax12 Jun 3, 2020

angela97lin Jun 3, 2020

kmax12 Jun 3, 2020

dsherry left a comment

dsherry left a comment

kmax12 left a comment

dsherry left a comment

Add problematic target data check #814

Add problematic target data check #814

Conversation

angela97lin commented May 27, 2020 • edited

codecov bot commented May 27, 2020 • edited

Codecov Report

dsherry Jun 2, 2020

Choose a reason for hiding this comment

kmax12 Jun 3, 2020

Choose a reason for hiding this comment

angela97lin Jun 3, 2020

Choose a reason for hiding this comment

kmax12 Jun 3, 2020

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

kmax12 left a comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

angela97lin commented May 27, 2020 •

edited

codecov bot commented May 27, 2020 •

edited