-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement min_count for the DataFrame methods sum and prod #4090
Conversation
Tests for Python 3.5 have failed, but it does not seem to be directly related to changes in this pull request. It was |
Thanks for the contribution @bartbroere . CC'ing @jrbourbeau for the failing test, I suspect that this is due to a recent PR. Speaking of tests ... @bartbroere would you be willing to add tests that verify the functionality that you've added in this PR? There should be plenty of examples in |
Thanks for the PR @bartbroere! The failing test is not related to changes in this PR. Hmmmm looks like the failing test was just added in #4086. I'll open up a new issue for this. |
Thanks for the quick reply and for looking into the failing tests. When developing I had a scratch file that I can convert to a proper test for the new functionality. |
for axis in axes: | ||
for min_count in [0, 1, 2, 3]: | ||
assert_eq(df.sum(min_count=min_count, axis=axis), | ||
ddf.sum(min_count=min_count, axis=axis).compute()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to call .compute()
here. See the bottom section of http://docs.dask.org/en/latest/develop.html#test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that you also test prod
here, perhaps just with a second assert_eq
statement in the same for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the hint. I have made the tests without the call to .compute()
now.
This looks fine to me. Merging in a few hours if there are no further comments (cc @TomAugspurger ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame changes look good. I'm not sure why the changes from #4093 are still showing up in this diff though.
I've rebased on master and removed the errant commit. I've force-pushed to your branch @bartbroere . I hope that you don't mind. Merging on a green light from CI. |
I don't mind the force-push of course. The diff now makes more sense. Good to hear it's probably ready to merge. |
Thanks @bartbroere! |
As mentioned in #3981 dask DataFrames do not support the relatively new
min_count
parameter on the methodssum
andprod
. I've tried to address that with this pull request.flake8 dask