[SPARK-15822][SQL] Prevent byte array backed classes from referencing freed memory #13723
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
UTF8String
and allUnsafe*
classes are backed by either on-heap or off-heap byte arrays. The code generated versionSortMergeJoin
buffers the left hand side join keys during iteration. This was actually problematic in off-heap mode when one of the keys is aUTF8String
(or any other 'Unsafe*` object) and the left hand side iterator was exhausted (and released its memory); the buffered keys would reference freed memory. This causes Seg-faults and all kinds of other undefined behavior when we would use one these buffered keys.This PR fixes this problem by creating copies of the buffered variables. I have added a general method to the
CodeGenerator
for this. I have checked all places in which this could happen, and onlySortMergeJoin
had this problem.This PR is largely based on the work of @robbinspg and he should be credited for this.
closes #13707
How was this patch tested?
Manually tested on problematic workloads.