TargetLeakageDataCheck wipes user-selected logical types #2683

freddyaboulton · 2021-08-23T19:41:30Z

Repro

import pandas as pd
import woodwork as ww
from evalml.data_checks import TargetLeakageDataCheck

X = pd.DataFrame({"A": pd.Series([1, 2, 3]),
                  "B": pd.Series([4, 5, 6])})
y = pd.Series([1, 2, 3])

X.ww.init(logical_types={"A": "Unknown", "B": "Double"})

assert any(w['message'].startswith("Column 'A'") for w in TargetLeakageDataCheck().validate(X, y)['warnings'])

Woodwork only allows mutual information on numeric, categorical, boolean, and datetime features so if the TargetLeakageDataCheck were preserving the typing information set by the user, the mutual information for column A should not be calculated!

The issue is that in the TargetLeakageDataCheck, we create a dataframe without going through the ww accessor, so this wipes the logical types by mistake.

The text was updated successfully, but these errors were encountered:

freddyaboulton added bug Issues tracking problems with existing features. priority labels Aug 23, 2021

eccabay self-assigned this Aug 30, 2021

eccabay mentioned this issue Aug 30, 2021

TargetLeakageDataCheck maintains user logical types #2711

Merged

eccabay closed this as completed in #2711 Aug 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TargetLeakageDataCheck wipes user-selected logical types #2683

TargetLeakageDataCheck wipes user-selected logical types #2683

freddyaboulton commented Aug 23, 2021

TargetLeakageDataCheck wipes user-selected logical types #2683

TargetLeakageDataCheck wipes user-selected logical types #2683

Comments

freddyaboulton commented Aug 23, 2021