Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add util function to drop rows with NaN values in the target #487

Merged
merged 26 commits into from Mar 28, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Mar 12, 2020

Closes #335 by introducing a util function that will drop any rows with NaN values in y.

Note: this is updated to only drop rows in X and y for every NaN row in y. I figured it wouldn't always make the most sense to drop rows in X for any NaN value that might appear in X since there could be so many, and that's what imputation is for. So this util function instead checks for NaN values in y and drops the corresponding rows in X :)

@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #487 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #487      +/-   ##
==========================================
+ Coverage   98.67%   98.68%   +0.01%     
==========================================
  Files         113      114       +1     
  Lines        3985     4026      +41     
==========================================
+ Hits         3932     3973      +41     
  Misses         53       53              
Impacted Files Coverage Δ
evalml/pipelines/components/__init__.py 100.00% <ø> (ø)
evalml/preprocessing/utils.py 100.00% <100.00%> (ø)
...lml/tests/preprocessing_tests/test_drop_na_rows.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89f7eae...5a99de4. Read the comment docs.

@angela97lin angela97lin changed the title [WIP] Add transformer to drop rows with NaN values Add transformer to drop rows with NaN values Mar 18, 2020
@angela97lin angela97lin requested a review from dsherry Mar 18, 2020
@dsherry
Copy link
Collaborator

dsherry commented Mar 19, 2020

@angela97lin this is a valuable addition, thanks for doing it! I'm sure we'll end up having automl add this to pipelines in the future if the data checks say there are NaN values.

I've added #337 to the data checks project -- see ticket for further discussion. Let's update this PR to not close #337.

@angela97lin angela97lin requested a review from dsherry Mar 19, 2020
@angela97lin angela97lin requested a review from kmax12 Mar 20, 2020
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looks good! Just have a concern that there will be mismatched sizes between X and y when used with a pipeline.

@angela97lin angela97lin removed request for dsherry and kmax12 Mar 24, 2020
@angela97lin angela97lin changed the title Add transformer to drop rows with NaN values Add util function to drop rows with NaN values Mar 25, 2020
@angela97lin angela97lin requested a review from dsherry Mar 25, 2020
evalml/preprocessing/utils.py Outdated Show resolved Hide resolved
@dsherry dsherry changed the title Add util function to drop rows with NaN values Add util function to drop rows with NaN target values Mar 26, 2020
@dsherry dsherry changed the title Add util function to drop rows with NaN target values Add util function to drop rows with NaN values in the target Mar 26, 2020
evalml/preprocessing/utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dsherry dsherry left a comment

Left a couple suggestions. Thanks for adding this, will be good!

@angela97lin angela97lin requested a review from dsherry Mar 27, 2020
Copy link
Collaborator

@dsherry dsherry left a comment

Looks great, 🚢 !

@angela97lin angela97lin merged commit 9ba98f3 into master Mar 28, 2020
2 checks passed
@angela97lin angela97lin deleted the 335_dropna_transformer branch Mar 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add transformer that drops rows with nulls (in a given column)
3 participants