Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15822][SQL] Prevent byte array backed classes from referencing freed memory #13723

Closed
wants to merge 7 commits into from

Conversation

hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

UTF8String and all Unsafe* classes are backed by either on-heap or off-heap byte arrays. The code generated version SortMergeJoin buffers the left hand side join keys during iteration. This was actually problematic in off-heap mode when one of the keys is a UTF8String (or any other 'Unsafe*` object) and the left hand side iterator was exhausted (and released its memory); the buffered keys would reference freed memory. This causes Seg-faults and all kinds of other undefined behavior when we would use one these buffered keys.

This PR fixes this problem by creating copies of the buffered variables. I have added a general method to the CodeGenerator for this. I have checked all places in which this could happen, and only SortMergeJoin had this problem.

This PR is largely based on the work of @robbinspg and he should be credited for this.

closes #13707

How was this patch tested?

Manually tested on problematic workloads.

@hvanhovell
Copy link
Contributor Author

cc @davies @rxin @sameeragarwal

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60674 has finished for PR 13723 at commit 5c33eac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 17, 2016

lgtm.

@davies
Copy link
Contributor

davies commented Jun 17, 2016

LGTM,
Merging this into master and 2.0, thanks!

asfgit pushed a commit that referenced this pull request Jun 17, 2016
…g freed memory

## What changes were proposed in this pull request?
`UTF8String` and all `Unsafe*` classes are backed by either on-heap or off-heap byte arrays. The code generated version `SortMergeJoin` buffers the left hand side join keys during iteration. This was actually problematic in off-heap mode when one of the keys is a `UTF8String` (or any other 'Unsafe*` object) and the left hand side iterator was exhausted (and released its memory); the buffered keys would reference freed memory. This causes Seg-faults and all kinds of other undefined behavior when we would use one these buffered keys.

This PR fixes this problem by creating copies of the buffered variables. I have added a general method to the `CodeGenerator` for this. I have checked all places in which this could happen, and only `SortMergeJoin` had this problem.

This PR is largely based on the work of robbinspg and he should be credited for this.

closes #13707

## How was this patch tested?
Manually tested on problematic workloads.

Author: Pete Robbins <robbinspg@gmail.com>
Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #13723 from hvanhovell/SPARK-15822-2.

(cherry picked from commit 5ada606)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
@asfgit asfgit closed this in 5ada606 Jun 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants