[SPARK-21860][core]Improve memory reuse for heap memory in `HeapMemoryAllocator` #19077

10110346 · 2017-08-29T08:52:44Z

What changes were proposed in this pull request?

In HeapMemoryAllocator, when allocating memory from pool, and the key of pool is memory size.
Actually some size of memory ,such as 1025bytes,1026bytes,......1032bytes, we can think they are the same，because we allocate memory in multiples of 8 bytes.
In this case, we can improve memory reuse.

How was this patch tested?

Existing tests and added unit tests

srowen · 2017-08-29T09:43:30Z

This and the JIRA need a better title

srowen

I don't feel so qualified to review this but it does make sense. In practice, is it common to allocate arrays of size +/- 8 bytes? just wondering if we have any idea of the impact

srowen · 2017-08-29T09:45:28Z

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java

@@ -47,24 +47,25 @@ private boolean shouldPool(long size) {

  @Override
  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-    if (shouldPool(size)) {
+    int arraySize = (int)((size + 7) / 8);
+    if (shouldPool(arraySize * 8)) {


Factor out arraySize * 8 as alignedSize or something

SparkQA · 2017-08-29T11:28:15Z

Test build #81208 has finished for PR 19077 at commit f168325.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-29T12:11:20Z

Test build #81212 has finished for PR 19077 at commit 6862403.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-30T10:55:46Z

Test build #81251 has finished for PR 19077 at commit b00b685.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2017-08-30T13:39:12Z

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java

        if (pool != null) {
          while (!pool.isEmpty()) {
            final WeakReference<MemoryBlock> blockReference = pool.pop();
            final MemoryBlock memory = blockReference.get();
            if (memory != null) {
-              assert (memory.size() == size);
+              assert ((int)((memory.size() + 7) / 8) == arraySize);
+              memory.resetSize(size);


Hmm, from my understanding the size of MemoryBlock is always the actual size, not the aligned size, so looks like we dont need to reset the size here.

yes, MemoryBlock is always the actual size, if we reuse the previous memory,we should use the actual size to modify the size of MemoryBlock

I got it, thanks for the explanation.

jerryshao · 2017-08-30T13:41:01Z

Can you please add some unit test to verify your changes.

10110346 · 2017-08-31T01:35:50Z

@jerryshao Thanks,i will add unit tests.

SparkQA · 2017-08-31T04:57:06Z

Test build #81272 has finished for PR 19077 at commit fc8b895.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-31T04:59:27Z

Test build #81273 has finished for PR 19077 at commit ba5717e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2017-08-31T13:49:45Z

This PR generally looks fine to me, my concern is that will this change bring in subtle impact on the code which leverage it.

CC @JoshRosen to take a review.

JoshRosen · 2017-09-01T02:11:59Z

Just curious: do you know where are we allocating these close-in-size chunks of memory? I understand the motivation, but just curious to know what's causing this pattern. I think the original idea here was that most allocations would come from a small set of sizes (usually the page size, or a configurable buffer size) and would not generally be arbitrary sized allocations.

JoshRosen · 2017-09-01T02:15:44Z

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java

@@ -47,23 +47,29 @@ private boolean shouldPool(long size) {

  @Override
  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-    if (shouldPool(size)) {
+    int arraySize = (int)((size + 7) / 8);


You might be able to use ByteAraryMethods.roundNumberOfBytesToNearestWord for this, which we'e done for similar rounding elsewhere. Makes it a bit easier to spot what's happening.

But the type of input parameter for roundNumberOfBytesToNearestWord is int

Maybe we should make the method to tackle long values.

10110346 · 2017-09-01T03:06:48Z

@jerryshao @JoshRosen yes, it would not generally be arbitrary sized allocations. Basically, we allocate memory in multiples of 4 or 8 bytes，even so, I think this change is also beneficial .
Also，I think this change will not impact on the code which leverage it, because MemoryBlock is not changed

viirya · 2017-09-07T03:37:31Z

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java

@@ -47,23 +48,29 @@ private boolean shouldPool(long size) {

  @Override
  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-    if (shouldPool(size)) {
+    long alignedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(size);


Maybe minor but some small allocations will be counted for pooling mechanism but they are not before, e.g. POOLING_THRESHOLD_BYTES - 1.

yeah，I think it's acceptable

viirya · 2017-09-07T04:00:43Z

common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java

@@ -48,6 +48,13 @@ public long size() {
  }

  /**
+   * Reset the size of the memory block.
+   */


It is dangerous to reset to a invalid size. We should add a check here or put a WARNING in the method comment.

Thanks，i will add a check.

SparkQA · 2017-09-07T06:00:21Z

Test build #81491 has finished for PR 19077 at commit 729df24.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-07T07:04:47Z

Test build #81492 has finished for PR 19077 at commit 0c6647c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

10110346 · 2017-09-07T07:29:17Z

retest this please

SparkQA · 2017-09-07T10:34:22Z

Test build #81508 has finished for PR 19077 at commit 0c6647c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-10-06T18:05:28Z

gentle ping @jerryshao for review

jerryshao

The change itself looks fine to me. However, I'm not sure if there's any potential impact on the code which relies on it, hopes someone could take another look.

jerryshao · 2017-10-09T03:30:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

      val unsafeArraySizeInBytes =
        UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
-        ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements)
+        ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes).toInt


Minor: why don't we inline this instead of creating a new variable?

The size of this line is larger than 200 bytes

We should really inline that.

…MemoryAllocator` apache#19077 In HeapMemoryAllocator, when allocating memory from pool, and the key of pool is memory size. Actually some size of memory ,such as 1025bytes,1026bytes,......1032bytes, we can think they are the same，because we allocate memory in multiples of 8 bytes. In this case, we can improve memory reuse.

PR-apache#19077 introduced a Java style error (too long line). Quick fix.

srowen reviewed Aug 29, 2017

View reviewed changes

10110346 force-pushed the headmemoptimize branch from f168325 to 6862403 Compare August 29, 2017 10:00

10110346 changed the title ~~[SPARK-21860][core]Optimize heap memory allocator~~ [SPARK-21860][core]Improve memory reuse for heap memory in HeapMemoryAllocator Aug 29, 2017

10110346 force-pushed the headmemoptimize branch from 6862403 to b00b685 Compare August 30, 2017 07:44

jerryshao reviewed Aug 30, 2017

View reviewed changes

10110346 force-pushed the headmemoptimize branch 2 times, most recently from fc8b895 to ba5717e Compare August 31, 2017 01:43

dubin555 approved these changes Aug 31, 2017

View reviewed changes

JoshRosen reviewed Sep 1, 2017

View reviewed changes

10110346 force-pushed the headmemoptimize branch from ba5717e to 729df24 Compare September 7, 2017 02:43

viirya reviewed Sep 7, 2017

View reviewed changes

10110346 force-pushed the headmemoptimize branch from 729df24 to 0c6647c Compare September 7, 2017 04:22

jerryshao reviewed Oct 9, 2017

View reviewed changes

zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 20, 2019

[EXT][SPARK-21860][CORE][FOLLOWUP] fix java style error apache#20558

290b040

PR-apache#19077 introduced a Java style error (too long line). Quick fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21860][core]Improve memory reuse for heap memory in `HeapMemoryAllocator` #19077

[SPARK-21860][core]Improve memory reuse for heap memory in `HeapMemoryAllocator` #19077

10110346 commented Aug 29, 2017 •

edited

Loading

srowen commented Aug 29, 2017

srowen left a comment

srowen Aug 29, 2017

SparkQA commented Aug 29, 2017

SparkQA commented Aug 29, 2017

SparkQA commented Aug 30, 2017

jerryshao Aug 30, 2017

10110346 Aug 31, 2017 •

edited

Loading

jerryshao Aug 31, 2017

jerryshao commented Aug 30, 2017

10110346 commented Aug 31, 2017

SparkQA commented Aug 31, 2017

SparkQA commented Aug 31, 2017

jerryshao commented Aug 31, 2017

JoshRosen commented Sep 1, 2017

JoshRosen Sep 1, 2017

10110346 Sep 1, 2017

jiangxb1987 Sep 6, 2017

10110346 commented Sep 1, 2017

viirya Sep 7, 2017

10110346 Sep 7, 2017

viirya Sep 7, 2017

10110346 Sep 7, 2017

SparkQA commented Sep 7, 2017

SparkQA commented Sep 7, 2017

10110346 commented Sep 7, 2017 •

edited

Loading

SparkQA commented Sep 7, 2017

kiszk commented Oct 6, 2017

jerryshao left a comment

jerryshao Oct 9, 2017

10110346 Oct 9, 2017

jiangxb1987 Oct 11, 2017

[SPARK-21860][core]Improve memory reuse for heap memory in HeapMemoryAllocator #19077

[SPARK-21860][core]Improve memory reuse for heap memory in HeapMemoryAllocator #19077

Conversation

10110346 commented Aug 29, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Aug 29, 2017

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 29, 2017

SparkQA commented Aug 29, 2017

SparkQA commented Aug 30, 2017

Choose a reason for hiding this comment

10110346 Aug 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao commented Aug 30, 2017

10110346 commented Aug 31, 2017

SparkQA commented Aug 31, 2017

SparkQA commented Aug 31, 2017

jerryshao commented Aug 31, 2017

JoshRosen commented Sep 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

10110346 commented Sep 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 7, 2017

SparkQA commented Sep 7, 2017

10110346 commented Sep 7, 2017 • edited Loading

SparkQA commented Sep 7, 2017

kiszk commented Oct 6, 2017

jerryshao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[SPARK-21860][core]Improve memory reuse for heap memory in `HeapMemoryAllocator` #19077

[SPARK-21860][core]Improve memory reuse for heap memory in `HeapMemoryAllocator` #19077

10110346 commented Aug 29, 2017 •

edited

Loading

10110346 Aug 31, 2017 •

edited

Loading

10110346 commented Sep 7, 2017 •

edited

Loading