-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrameGroupBy.value_counts method raises an error when at least a partition is empty. #7065
Comments
Well that is interesting. I think this might be a pandas bug. You can't do a import pandas as pd
pd.DataFrame(columns=["A", "B"]).groupby("A")["B"].value_counts() |
Thank you, @jsignell. Yes, Pandas does not allow that, but I am not sure if this behaviour is what the developers want or if it is just a bug. Is this something you can find out? Otherwise, I can open an issue on Pandas Github page. Meanwhile, do you know a way to ensure that none of the partitions are empty? |
Thank you, @quasiben. I saw that post but I was wondering if there was a method to do that. |
Yeah I am also not sure about that. I would expect this kind of thing to work on an empty dataframe though. So raising an issue on pandas would be good.
Maybe that is the wrong approach. What if we only run the function on partitions that are non-empty? That seems like it would be fine for value_counts. |
We can probably make our version of value_counts robust though? Or maybe
we can raise an issue / ask a question upstream.
…On Thu, Jan 14, 2021 at 6:15 AM Filippo Bovo ***@***.***> wrote:
Thank you, @quasiben <https://github.com/quasiben>. I saw that post but I
was wondering if there was a method to do that.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7065 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGDOHYA3W6VJMEKWYDSZ337ZANCNFSM4WBFTFGA>
.
|
Done: pandas-dev/pandas#39172
Does this have to be done manually or is there a simple way to do it? |
I looked into this for a little bit. My first attempt was to use a special function on each chunk so instead of using dask/dask/dataframe/groupby.py Lines 1932 to 1938 in 1a52a0c
We could use something like: def _value_counts(x, **kwargs):
if len(x):
return M.value_counts(x, **kwargs)
else:
return pd.Series() That seems to work locally. Another option would be to add a kwarg to |
@jsignell, it seems that it's a bug in Pandas: pandas-dev/pandas#39172 (comment) |
What happened:
DataFrameGroupBy.value_counts
gives an error if there is at least one empty partition.What you expected to happen:
I expect that no error is raised and that the result matches the respective Pandas DataFrame result.
Minimal Complete Verifiable Example:
The output of the above script is the following.
There seems to be an error when at least one of the partitions is empty.
Anything else we need to know?:
No.
Environment:
The text was updated successfully, but these errors were encountered: