-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add util function to drop rows with NaN values in the target #487
Conversation
Codecov Report
@@ Coverage Diff @@
## master #487 +/- ##
==========================================
+ Coverage 98.67% 98.68% +0.01%
==========================================
Files 113 114 +1
Lines 3985 4026 +41
==========================================
+ Hits 3932 3973 +41
Misses 53 53
Continue to review full report at Codecov.
|
evalml/pipelines/components/transformers/drop_na_rows_transformer.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/drop_na_rows_transformer.py
Outdated
Show resolved
Hide resolved
@angela97lin this is a valuable addition, thanks for doing it! I'm sure we'll end up having automl add this to pipelines in the future if the data checks say there are I've added #337 to the data checks project -- see ticket for further discussion. Let's update this PR to not close #337. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just have a concern that there will be mismatched sizes between X and y when used with a pipeline.
evalml/pipelines/components/transformers/drop_na_rows_transformer.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/drop_na_rows_transformer.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/transformers/drop_na_rows_transformer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple suggestions. Thanks for adding this, will be good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, 🚢 !
Closes #335 by introducing a util function that will drop any rows with NaN values in y.
Note: this is updated to only drop rows in X and y for every NaN row in y. I figured it wouldn't always make the most sense to drop rows in X for any NaN value that might appear in X since there could be so many, and that's what imputation is for. So this util function instead checks for NaN values in y and drops the corresponding rows in X :)