Netty performance tracking #1161

zuston · 2023-08-21T09:23:47Z

Netty Performance tracking

sub tasks

Benchmark

Tested with grpc and netty.

Environment

Software: Uniffle master / Hadoop 3.2.2 / Spark 3.1.2

Hardware: Machine 96 cores, 512G memory, 1T * 4 SSD, network bandwidth 8GB/s

Hadoop Yarn Cluster: 1 * ResourceManager + 40 * NodeManager, every machine 1T * 4 SSD

Uniffle Cluster: 1 * Coordinator + 5 * Shuffle Server, every machine 1T * 4 SSD

Configuration

spark's conf

spark.executor.instances 400
spark.executor.cores 1
spark.executor.memory 2g
spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
spark.rss.storage.type MEMORY_LOCALFILE

uniffle grpc-based server's conf

JVM XMX=200g

...
rss.server.buffer.capacity 100g
rss.server.read.buffer.capacity 20g
rss.server.flush.thread.alive 20
rss.server.flush.threadPool.size 50
rss.server.high.watermark.write 80
rss.server.low.watermark.write 70
...

uniffle netty-based server's conf

XMX_SIZE="140g"
MAX_DIRECT_MEMORY_SIZE=200g

...
rss.server.buffer.capacity 100g
rss.server.read.buffer.capacity 20g
rss.server.flush.thread.alive 20
rss.server.flush.threadPool.size 50
rss.server.high.watermark.write 80
rss.server.low.watermark.write 70
...

report

type	5T (run with 400 executors)
grpc-based	3.6min/5.9min
netty	3.4min/7.7min

And I found the spark executor with netty uniffle gc time is higher than grpc based.

The text was updated successfully, but these errors were encountered:

jerqi · 2023-08-21T09:27:35Z

Could you enable spark.rss.client.off.heap.memory.enable?

zuston · 2023-08-22T02:30:46Z

Could you enable spark.rss.client.off.heap.memory.enable?

This is only for hdfs.

zuston · 2023-08-22T02:31:37Z

Another problem: the remote fetch from localfile by netty is unstable, compared with grpc, it costs too much time.

…buffer len (#1162) ### What changes were proposed in this pull request? If we use the off heap memory and then we use the method `getData`, we will copy the off heap memory to heap data memory. So we should avoid using it in the Netty mode. ### Why are the changes needed? Fix: #1161 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Code Review

jerqi added a commit to jerqi/incubator-uniffle that referenced this issue Aug 21, 2023

[apache#1161] improvement: Reduce the gc time

ec73ab2

jerqi mentioned this issue Aug 21, 2023

[#1161] improvement: Reduce the data copy #1162

Merged

zuston changed the title ~~[Improvement] netty performance tracking~~ Netty performance tracking Aug 22, 2023

zuston closed this as completed in #1162 Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netty performance tracking #1161

Netty performance tracking #1161

zuston commented Aug 21, 2023 •

edited

Loading

jerqi commented Aug 21, 2023

zuston commented Aug 22, 2023

zuston commented Aug 22, 2023 •

edited

Loading

Netty performance tracking #1161

Netty performance tracking #1161

Comments

zuston commented Aug 21, 2023 • edited Loading