-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11050] Duplicate accumulator if it appears in multiple windows. #13061
Conversation
R: @iemejia |
Accumulators can be mutated during merging by the combine fn so we must ensure that we use a unique instance of the accumulator per window.
Run Spark StructuredStreaming ValidatesRunner |
Run Java PreCommit |
1 similar comment
Run Java PreCommit |
This LGTM but I prefer that @echauchot takes a look before merging because he has been optimizing this code for a while so better to make him aware of the issue and the minor performance hit of the extra encoding needed. |
I would prefer taking the fix and then further optimizing for performance as the implementation I suggested only duplicates when a value is in multiple windows which is uncommon in practice. |
Good point, I suppose if @echauchot has a suggestion or a better way to do this we can improve it in the future, at least this fixes the breakage on tests and it produces correct results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@lukecwik I don't see why this change is necessary because of 2 reasons:
|
@echauchot The VR tests were breaking on this (I don't know why, maybe the tests were improved). That's the reason why Luke did this PR, it was needed at least for correctness. You can reproduce this by reverting this PR and running the tests:
produces
Something odd I noticed is that if you run the single test instance it passes so I am not sure if there is some interleaving issue with other tests. The VR suite of the Structured Streaming Runner has been broken since September 10 also because of this issue and BEAM-11023 too. |
thanks @iemejia for the context. Strange |
Accumulators can be mutated during merging by the combine fn so we must ensure that we use a unique instance of the accumulator per window.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.