drop_duplicates() support for positional subset parameter#5410
drop_duplicates() support for positional subset parameter#5410jcrist merged 4 commits intodask:masterfrom WesRoach:drop-duplicates-subset
Conversation
... breaks Series drop_duplicates
Move subset kwarg back to kwargs
|
fixes #2735 |
dask/dataframe/core.py
Outdated
| def drop_duplicates(self, split_every=None, split_out=1, **kwargs): | ||
| def drop_duplicates(self, subset=None, split_every=None, split_out=1, **kwargs): | ||
| # Let pandas error on bad inputs | ||
| self._meta_nonempty.drop_duplicates(**kwargs) |
There was a problem hiding this comment.
@WesRoach Should this call include subset parameter? In case of non-existing column(s) provided in subset parameter this needs to error out?
There was a problem hiding this comment.
Yeah, you need to add the subset parameter here.
self._meta_nonempty.drop_duplicates(subset=subset, **kwargs)
There was a problem hiding this comment.
This drop_duplicates is used by both DataFrame and Series. If this self._meta_nonempty.drop_duplicates(subset=subset, **kwargs) was added it would error out for series as Series.drop_duplicates doesn't accept subset param.
So check if subset present, then call drop_duplicates with subset else call without it.
There was a problem hiding this comment.
Thanks for feedback! Let me know if this looks correct. New to project - feedback is very welcome.
|
Thanks for the PR @WesRoach, overall this looks good, just the small fix needed above. |
|
LGTM, will merge on tests pass. |
black dask/flake8 dask