TargetLeakageDataCheck maintains user logical types #2711

eccabay · 2021-08-30T18:37:40Z

codecov · 2021-08-30T18:43:23Z

Codecov Report

Merging #2711 (8cac7ff) into main (3ae1500) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2711     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        301     301             
  Lines      27600   27607      +7     
=======================================
+ Hits       27556   27563      +7     
  Misses        44      44

Impacted Files	Coverage Δ
evalml/data_checks/target_leakage_data_check.py	`100.0% <100.0%> (ø)`
...ata_checks_tests/test_target_leakage_data_check.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ae1500...8cac7ff. Read the comment docs.

freddyaboulton

Thank you @eccabay !!

freddyaboulton · 2021-08-30T20:59:50Z

evalml/data_checks/target_leakage_data_check.py

-            cols_to_compare = infer_feature_types(
-                pd.DataFrame({col: X[col], str(col) + "y": y})
-            )
+            logical_types = {col: type(X.ww.logical_types[col]), str(col) + "y": y_type}


I think we can simplify this with this:

cols_to_compare = X.ww[[col]] cols_to_compare.ww[str(col) + "y"] = y

What I also like about this is that this implementation will also preserve other parts of the schema, like semantic tags and metadata. For the sake of this data check, I think preserving the logical types is enough but we should get in the habit of preserving as much of the schema as possible in our implementation.

freddyaboulton · 2021-08-30T21:01:49Z

evalml/tests/data_checks_tests/test_target_leakage_data_check.py

+
+    X.ww.init(logical_types={"A": "Unknown", "B": "Double"})
+    warnings = TargetLeakageDataCheck().validate(X, y)["warnings"]
+    assert not any(w["message"].startswith("Column 'A'") for w in warnings)


Maybe we should add a comment explaining that mutual information is not supported for Unknown logical types so they should not be included.

I'm also thinking we should just mock ww.mutual information() and verify the logical types are consistent there? Not sure how tricky that would be though.

eccabay added 2 commits August 30, 2021 14:23

Add logical type carryover in _calculate_mutual_information

9dc2f9e

Update releasenotes

a46ec68

Merge branch 'main' into 2683_targetleakage_types

da1f2af

eccabay marked this pull request as ready for review August 30, 2021 20:15

auto-assign bot assigned eccabay Aug 30, 2021

eccabay requested review from angela97lin, dsherry, bchen1116, chukarsten and freddyaboulton August 30, 2021 20:16

freddyaboulton approved these changes Aug 30, 2021

View reviewed changes

chukarsten approved these changes Aug 31, 2021

View reviewed changes

eccabay added 4 commits August 31, 2021 11:17

Logic simplification from PR

52a99e3

Merge branch 'main' into 2683_targetleakage_types

26857bc

lint fix

b38ebc5

Small change to re-trigger checks

8cac7ff

eccabay merged commit 635f3e7 into main Aug 31, 2021

eccabay deleted the 2683_targetleakage_types branch August 31, 2021 20:45

chukarsten mentioned this pull request Sep 1, 2021

Release v0.32.0 #2729

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TargetLeakageDataCheck maintains user logical types #2711

TargetLeakageDataCheck maintains user logical types #2711

eccabay commented Aug 30, 2021

codecov bot commented Aug 30, 2021 •

edited

Loading

freddyaboulton left a comment

freddyaboulton Aug 30, 2021

freddyaboulton Aug 30, 2021

TargetLeakageDataCheck maintains user logical types #2711

TargetLeakageDataCheck maintains user logical types #2711

Conversation

eccabay commented Aug 30, 2021

codecov bot commented Aug 30, 2021 • edited Loading

Codecov Report

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Aug 30, 2021

Choose a reason for hiding this comment

freddyaboulton Aug 30, 2021

Choose a reason for hiding this comment

codecov bot commented Aug 30, 2021 •

edited

Loading