-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TargetLeakageDataCheck to use Woodwork's dependence_dict #3728
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3728 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 339 339
Lines 34786 34810 +24
=======================================
+ Hits 34655 34679 +24
Misses 131 131
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
||
Args: | ||
pct_corr_threshold (float): The correlation threshold to be considered leakage. Defaults to 0.95. | ||
method (string): The method to determine correlation. Use 'mutual' for mutual information, otherwise 'pearson' for Pearson correlation. Defaults to 'mutual'. | ||
method (string): The method to determine correlation. Use 'max' for the maximum correlation, or for specific correlation metrics, use their name (ie 'mutual_info' for mutual information, 'pearson' for Pearson correlation, etc). | ||
Defaults to 'mutual_info'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to mutual_info
for now as we fix a bug in WW. Will change to max
once the bug is fixed and new release is out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider this a breaking change, since mutual
is no longer accepted as an input to this Data Check?
Pearson correlation returns a value in [-1, 1], while mutual information returns a value in [0, 1]. | ||
If `method='mutual_info'`, this data check uses mutual information and supports all target and feature types. | ||
Other correlation metrics only support binary with numeric and boolean dtypes, and return a value in [-1, 1], while mutual information returns a value in [0, 1]. | ||
Correlation metrics available can be found in Woodwork's `dependence_dict method <https://woodwork.alteryx.com/en/stable/generated/woodwork.table_accessor.WoodworkTableAccessor.dependence_dict.html#woodwork.table_accessor.WoodworkTableAccessor.dependence_dict>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Just left a few comments
|
||
Args: | ||
pct_corr_threshold (float): The correlation threshold to be considered leakage. Defaults to 0.95. | ||
method (string): The method to determine correlation. Use 'mutual' for mutual information, otherwise 'pearson' for Pearson correlation. Defaults to 'mutual'. | ||
method (string): The method to determine correlation. Use 'max' for the maximum correlation, or for specific correlation metrics, use their name (ie 'mutual_info' for mutual information, 'pearson' for Pearson correlation, etc). | ||
Defaults to 'mutual_info'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider this a breaking change, since mutual
is no longer accepted as an input to this Data Check?
evalml/tests/data_checks_tests/test_target_leakage_data_check.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just some docstring and error raising nits
Update data checks