-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RowCoderGenerator to use the encodingPositions when encoding and decoding the bit set representing null fields. #32389
Conversation
R: @reuvenlax |
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Also added internal dataflow tests verifying that this fixes update with reordered schemas with null fields. |
@Abacn @reuvenlax friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, left a few questions
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
Show resolved
Hide resolved
|
||
@VisibleForTesting | ||
static void clearRowCoderCache() { | ||
synchronized (setOverridesLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GENERATED_CODERS is already a synchronized map. Usually does not need to be wrapped with synchronized block. Here I see "setOverridesLock" is used in other places, probably this is the reason. If this is the case, consider adding a comment to note this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coders can be read without the synchronized block but are written with the extra synchronization because concurrenthashmap updateifnotpresent is not reentrant (unlike synchronized) and the get/insert pattern is possibly racy.
That said I since the read coders are cached, I think I will just change to regular maps under synchronization and ditch the concurrenthashmap.
@@ -425,7 +538,7 @@ static Row decodeDelegate( | |||
// in which case we drop the extra fields. | |||
if (encodingPos < coders.length) { | |||
int rowIndex = encodingPosToIndex[encodingPos]; | |||
if (nullFields.get(rowIndex)) { | |||
if (nullFields.get(encodingPos)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this a bug? rowIndex and encodingPos looks different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this along with other nullfields fix above is the purpose of this PR to fix #32388 .
The stack trace and synchronization changes were added as the initial belief was that encoded corruption was due to late overrides arriving. Since that could still be an issue, I think we should keep those changes but I can separate them to a separate PR if you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that unless there are encoding overrides rowIndex and encodingPos are equal. But the improved unit tests catch the issue, previous tests with encoding overrides didn't have null fields and thus missed it.
…decoding the bit set representing null fields.
70ab14b
to
a52564a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
Add tests that fail without the change covering encoding and decoding
Also add tests that cover the static position overrides which was not tested previously.
Some other cleanup to help debug other possible encoding positions issues in the future:
fixes #32388
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.