Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TargetLeakageDataCheck wipes user-selected logical types #2683

Closed
freddyaboulton opened this issue Aug 23, 2021 · 0 comments · Fixed by #2711
Closed

TargetLeakageDataCheck wipes user-selected logical types #2683

freddyaboulton opened this issue Aug 23, 2021 · 0 comments · Fixed by #2711
Assignees
Labels
bug Issues tracking problems with existing features. priority

Comments

@freddyaboulton
Copy link
Contributor

Repro

import pandas as pd
import woodwork as ww
from evalml.data_checks import TargetLeakageDataCheck

X = pd.DataFrame({"A": pd.Series([1, 2, 3]),
                  "B": pd.Series([4, 5, 6])})
y = pd.Series([1, 2, 3])

X.ww.init(logical_types={"A": "Unknown", "B": "Double"})

assert any(w['message'].startswith("Column 'A'") for w in TargetLeakageDataCheck().validate(X, y)['warnings'])

Woodwork only allows mutual information on numeric, categorical, boolean, and datetime features so if the TargetLeakageDataCheck were preserving the typing information set by the user, the mutual information for column A should not be calculated!

The issue is that in the TargetLeakageDataCheck, we create a dataframe without going through the ww accessor, so this wipes the logical types by mistake.

@freddyaboulton freddyaboulton added bug Issues tracking problems with existing features. priority labels Aug 23, 2021
@eccabay eccabay self-assigned this Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features. priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants