-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe isin function with strange behaviour on large lists #3198
Comments
-bump @mrocklin any ideas? |
I agree that there should be a right way to do this. Currently it isn't supported. It seems like a bug to me. I agree that it should be fixed. |
I ran into a related problem today: I found that setting the index, creating a dataframe from filter_set and doing a merge was faster and less error prone. |
@mrocklin do we have any new addition to this? You mentioned it seemed like a bug to you. Was that ever fixed? |
We usually close issues after they have been fixed (though not always). If the issue is still open then I suspect that the bug remains. If you wanted to verify this you could look through git history or try out the failing example above on master. Unfortuantely I'm not personally able to keep track of all bugs. If you have any interest in helping to resolve this issue that would be welcome. |
Thanks for following up on this @bnaul! |
This issue is based on this stackoverflow question and motivated on @mrocklin 's answer.
I was playing around on filtering dask dataframes using the isin function with big lists to check on:
Now I tried different ways to filter A from dask_df with filter_list:
Each one of them gave me the warning:
I think there should be a 'right way' to do this, which should not produce the warning and should also be in the docs.
The text was updated successfully, but these errors were encountered: