Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22143][SQL][BRANCH-2.2] Fix memory leak in OffHeapColumnVector #19378

Closed
wants to merge 3 commits into from

Conversation

hvanhovell
Copy link
Contributor

This is a backport of 02bb068.

What changes were proposed in this pull request?

WriteableColumnVector does not close its child column vectors. This can create memory leaks for OffHeapColumnVector where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).

How was this patch tested?

I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnosis was done locally.

## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can create memory leaks for `OffHeapColumnVector` where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).

## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnoses was done locally.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#19367 from hvanhovell/SPARK-22143.

# Conflicts:
#	sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
#	sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
#	sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82273 has finished for PR 19378 at commit 20540ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82279 has finished for PR 19378 at commit a82c494.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 28, 2017
This is a backport of 02bb068.

## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can create memory leaks for `OffHeapColumnVector` where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).

## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnosis was done locally.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #19378 from hvanhovell/SPARK-22143-2.2.
@hvanhovell
Copy link
Contributor Author

Merging to 2.2

@hvanhovell hvanhovell closed this Sep 28, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
This is a backport of apache@02bb068.

## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can create memory leaks for `OffHeapColumnVector` where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).

## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnosis was done locally.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#19378 from hvanhovell/SPARK-22143-2.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants