-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby "every" assumes forward sorted data #847
Comments
(For anyone trying to repro this: it requires a suitable large input. A small handcrafted input that fits into one |
I now notice that this is related to the broader issue filed at #501. In any case worth having as a standalone issue as it should be fixed by a forthcoming PR. |
I've confirmed with @henridf that this issue is no longer necessary, so I'm closing it. What's happened is that the change in #893 makes sure this problem is avoided by not emitting any bins until the end. Here's verification. The original issue still existed as of
Then with the commit
While unique, note that they are not sorted by default, so a downstream |
This is all correct, but adding one additional piece of context just in case anyone came across this issue without other context on what we're doing with groupby: in general, #893 does indeed wait till EOF to emit all bins. But that PR also included a feature for early-emitting bins when the incoming stream is sorted in the primary grouping key. In the case of |
If records presented to a
every X
groupby are not in forward order, then that groupby will emit multiple groups for a given time bin. For example:The text was updated successfully, but these errors were encountered: