Skip to content

drop_duplicates() support for positional subset parameter#5410

Merged
jcrist merged 4 commits intodask:masterfrom
WesRoach:drop-duplicates-subset
Sep 16, 2019
Merged

drop_duplicates() support for positional subset parameter#5410
jcrist merged 4 commits intodask:masterfrom
WesRoach:drop-duplicates-subset

Conversation

@WesRoach
Copy link
Copy Markdown
Contributor

  • Tests added / passed
  • Passes black dask / flake8 dask

@WesRoach
Copy link
Copy Markdown
Contributor Author

fixes #2735

def drop_duplicates(self, split_every=None, split_out=1, **kwargs):
def drop_duplicates(self, subset=None, split_every=None, split_out=1, **kwargs):
# Let pandas error on bad inputs
self._meta_nonempty.drop_duplicates(**kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WesRoach Should this call include subset parameter? In case of non-existing column(s) provided in subset parameter this needs to error out?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you need to add the subset parameter here.

self._meta_nonempty.drop_duplicates(subset=subset, **kwargs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop_duplicates is used by both DataFrame and Series. If this self._meta_nonempty.drop_duplicates(subset=subset, **kwargs) was added it would error out for series as Series.drop_duplicates doesn't accept subset param.
So check if subset present, then call drop_duplicates with subset else call without it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback! Let me know if this looks correct. New to project - feedback is very welcome.

@jcrist
Copy link
Copy Markdown
Member

jcrist commented Sep 15, 2019

Thanks for the PR @WesRoach, overall this looks good, just the small fix needed above.

@jcrist
Copy link
Copy Markdown
Member

jcrist commented Sep 16, 2019

LGTM, will merge on tests pass.

@jcrist jcrist merged commit 6aab98c into dask:master Sep 16, 2019
@WesRoach WesRoach deleted the drop-duplicates-subset branch September 16, 2019 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants