Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14290][SPARK-13352][CORE][backport-1.6] avoid significant memory copy in Netty's tran… #12296

Conversation

Projects
None yet
3 participants
@liyezhang556520
Copy link
Contributor

commented Apr 11, 2016

What changes were proposed in this pull request?

When netty transfer data that is not FileRegion, data will be in format of ByteBuf, If the data is large, there will occur significant performance issue because there is memory copy underlying in sun.nio.ch.IOUtil.write, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call WritableByteChannel.write(), so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call transferTo multiple times.

How was this patch tested?

Spark unit test and manual test.
Manual test:
sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length

For more details, please refer to SPARK-14290

@liyezhang556520

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2016

cc @davies

@liyezhang556520 liyezhang556520 changed the title [SPARK-14290][CORE][backport-1.6] avoid significant memory copy in Netty's tran… [SPARK-14290][SPARK-13352][CORE][backport-1.6] avoid significant memory copy in Netty's tran… Apr 11, 2016

@SparkQA

This comment has been minimized.

Copy link

commented Apr 11, 2016

Test build #55516 has finished for PR 12296 at commit 9e37e7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@davies

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2016

LGTM, merging this into branch-1.6, thanks!

asfgit pushed a commit that referenced this pull request Apr 11, 2016

[SPARK-14290] [SPARK-13352] [CORE] [BACKPORT-1.6] avoid significant m…
…emory copy in Netty's tran…

## What changes were proposed in this pull request?
When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times.

## How was this patch tested?
Spark unit test and manual test.
Manual test:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`

For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290)

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #12296 from liyezhang556520/apache-branch-1.6-spark-14290.

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 12, 2016

[SPARK-14290] [SPARK-13352] [CORE] [BACKPORT-1.6] avoid significant m…
…emory copy in Netty's tran…

When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times.

Spark unit test and manual test.
Manual test:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`

For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290)

Author: Zhang, Liye <liye.zhang@intel.com>

Closes apache#12296 from liyezhang556520/apache-branch-1.6-spark-14290.

(cherry picked from commit baf2985)

Conflicts:
	network/common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.