-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TargetLeakageDataCheck maintains user logical types #2711
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2711 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 301 301
Lines 27600 27607 +7
=======================================
+ Hits 27556 27563 +7
Misses 44 44
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @eccabay !!
cols_to_compare = infer_feature_types( | ||
pd.DataFrame({col: X[col], str(col) + "y": y}) | ||
) | ||
logical_types = {col: type(X.ww.logical_types[col]), str(col) + "y": y_type} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this with this:
cols_to_compare = X.ww[[col]]
cols_to_compare.ww[str(col) + "y"] = y
What I also like about this is that this implementation will also preserve other parts of the schema, like semantic tags and metadata. For the sake of this data check, I think preserving the logical types is enough but we should get in the habit of preserving as much of the schema as possible in our implementation.
|
||
X.ww.init(logical_types={"A": "Unknown", "B": "Double"}) | ||
warnings = TargetLeakageDataCheck().validate(X, y)["warnings"] | ||
assert not any(w["message"].startswith("Column 'A'") for w in warnings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add a comment explaining that mutual information is not supported for Unknown logical types so they should not be included.
I'm also thinking we should just mock ww.mutual information()
and verify the logical types are consistent there? Not sure how tricky that would be though.
Closes #2683