New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for normalize in value_counts #7342
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @jsignell ! My only real concern is that split_out
>1 may not be supported. Related test coverage would clarify if this is an issue.
|
||
if split_out > 1 and normalize: | ||
aggregate_kwargs["length"] = ( | ||
len(self) if dropna is False else len(self.dropna()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is acceptable. Basically I needed a way to pass the total len into the split_out combine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me - I guess an "optimal" solution for multi-partition results would be to collect/aggregate/combine the counts within the first aca pass. However, I'm not sure it is completely worth the additional code complexity to do this, because the result would need to be quite large to benefit (and split_out is often 1 anyway).
|
||
if split_out > 1 and normalize: | ||
aggregate_kwargs["length"] = ( | ||
len(self) if dropna is False else len(self.dropna()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me - I guess an "optimal" solution for multi-partition results would be to collect/aggregate/combine the counts within the first aca pass. However, I'm not sure it is completely worth the additional code complexity to do this, because the result would need to be quite large to benefit (and split_out is often 1 anyway).
Ok I will merge this if it passes the regular tests. |
black dask
/flake8 dask