[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

zuston · 2023-03-02T04:02:12Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

In our production environment, the shuffle server is always stopped due to the full GC. After increasing more memory, it still occurs, that makes me confused. So I suspect some abusing memory exists in uniffle.

After digging into the Grpc service api of sendShuffleData, I found something is abnormal. Please see this part code.

incubator-uniffle/server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java

Lines 751 to 769 in 1fbdfe5

    
           private ShufflePartitionedBlock[] toPartitionedBlock(List<ShuffleBlock> blocks) { 
        
             if (blocks == null || blocks.size() == 0) { 
        
               return new ShufflePartitionedBlock[]{}; 
        
             } 
        
             ShufflePartitionedBlock[] ret = new ShufflePartitionedBlock[blocks.size()]; 
        
             int i = 0; 
        
             for (ShuffleBlock block : blocks) { 
        
               ret[i] = new ShufflePartitionedBlock( 
        
                   block.getLength(), 
        
                   block.getUncompressLength(), 
        
                   block.getCrc(), 
        
                   block.getBlockId(), 
        
                   block.getTaskAttemptId(), 
        
                   block.getData().toByteArray()); 
        
               i++; 
        
             } 
        
             return ret; 
        
           }

The byte[] will always be created in the invoking of block.getData().toByteArray(), which will cause GC if receiving data speed is high. And this has been proved in our dashboard

Besides, the byte[] will be created when reading shuffle data in memory, which may be not the main factor to trigger this. Anyway, it also should be fixed.

How should we improve?

We could use the block.getData().asReadOnlyByteBuffer() to replace toByteArray to avoid unnecessary memory allocation.

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

zuston · 2023-03-02T04:03:05Z

I think this is a critical problem for Uniffle and I will fix this ASAP. cc @jerqi @advancedxy @xianjingfeng

jerqi · 2023-03-03T01:52:29Z

Will it block the release of version 0.7?

zuston · 2023-03-03T07:46:23Z

Will it block the release of version 0.7?

No.

advancedxy · 2023-03-03T08:28:19Z

We could use the block.getData().asReadOnlyByteBuffer() to replace toByteArray to avoid unnecessary memory allocation.

This sounds good to me. Do you apply this in your prod env or not?

However I believe it would require some careful management when this data is reused. It won't be released until the data is flushed.

zuston · 2023-03-04T07:20:12Z

This sounds good to me. Do you apply this in your prod env or not?

No.

I have to say after applying this optimization, the GC pause is still serious in my point to point writing benchmark. So maybe this is not the main problem. Anyway, this is still an improvement.

BTW, I want to share the benchmark result and I think uniffle's users will benefit from this.

Shuffle server environment

XMX 20G
buffer capacity 10G
read capacity 5G

Shuffle Writer program

using spark to write shuffle data with 20 executors. Single executor will total write 1G, and each time write 14M

Java version	ShuffleServer GC	Max pause time	ThroughOutput
8	G1	30s	0.3
11	G1	2.5s	0.8
18	G1	2.5s	0.8
18	ZGC	0.2ms	0.99997

It's amazing for me. I think we should drop java8 version

advancedxy · 2023-03-05T12:10:19Z

This sounds good to me. Do you apply this in your prod env or not?

No.

I have to say after applying this optimization, the GC pause is still serious in my point to point writing benchmark. So maybe this is not the main problem. Anyway, this is still an improvement.

BTW, I want to share the benchmark result and I think uniffle's users will benefit from this.

Shuffle server environment

XMX 20G

buffer capacity 10G

read capacity 5G

Shuffle Writer program

using spark to write shuffle data with 20 executors. Single executor will total write 1G, and each time write 14M

Java version ShuffleServer GC Max pause time ThroughOutput
8 G1 30s 0.3
11 G1 2.5s 0.8
18 G1 2.5s 0.8
18 ZGC 0.2ms 0.99997
It's amazing for me. I think we should drop java8 version

Is there any parameter turning for Java 8 to achieve similar performance gain like Java 11?
As we discussed earlier, Hadoop 2 and various Hadoop 3 versions support java 8 only, https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions, I don't think we can drop Java 8 support. I believe that's why Spark still supports Java 8.

So I believe we cannot simply change the default JDK version to 11 as you proposed in #683. However, it's always possible to build rss-server with alternative JDK 11 based images.

zuston · 2023-03-05T13:33:19Z

Is there any parameter turning for Java 8 to achieve similar performance gain like Java 11?

I don’t apply any param turning

So I believe we cannot simply change the default JDK version to 11 as you proposed in #683. However, it's always possible to build rss-server with alternative JDK 11 based images.

Yes, we don’t directly drop java8. But we’d better to recommend using java11 or higher version, especially for shuffle server and avoid performance issues for users.

Maybe this should be underlined in the readme.

advancedxy · 2023-03-05T14:38:08Z

I don’t apply any param turning

Could you do some gc parameter turning if possible since we are going to use Java 1.8 for a long time?

Yes, we don’t directly drop java8. But we’d better to recommend using java11 or higher version, especially for shuffle server and avoid performance issues for users.
Maybe this should be underlined in the readme.

I'm fine for promoting Java 11 use especially for shuffle server without HDFS storage used.

advancedxy · 2023-03-08T14:05:27Z

It occurred to me that Uniffle only uses hdfs client, It might be possible to use JDK 11 to run 3.x hadoop's hdfs client.

But I didn't see any article about this setup... 😭

zuston · 2023-03-09T02:00:41Z

It occurred to me that Uniffle only uses hdfs client, It might be possible to use JDK 11 to run 3.x hadoop's hdfs client.

But I didn't see any article about this setup... 😭

I have upgraded to JDK11 in uniffle shuffle-server, and it works well with 3.1.1 hdfs client.

And I have tested the JAVA17, but it failed due to kerberos.

zuston · 2023-03-09T02:02:06Z

JDK11 has been supported in hadoop 3.3.3, refer to: https://issues.apache.org/jira/browse/HADOOP-15338

advancedxy · 2023-03-09T03:38:02Z

JDK11 has been supported in hadoop 3.3.3, refer to: https://issues.apache.org/jira/browse/HADOOP-15338

Yeah, I know. But Hadoop 3.3.3 was released about one year ago which means it wasn't widely adopted or used.

I have upgraded to JDK11 in uniffle shuffle-server, and it works well with 3.1.1 hdfs client.

This is good news. is it pretty smooth that not even configuration changes are needed? Or we need to do some tricks to make it work?

zuston · 2023-03-09T04:13:18Z

This is good news. is it pretty smooth that not even configuration changes are needed? Or we need to do some tricks to make it work?

Nothing is needed to change.

…ile (#683) ### What changes were proposed in this pull request? Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s. ### Why are the changes needed? As the GC problems mentioned in #674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1 Based on above practice, it's better to use JDK11 as the default java version to improve stability ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Have tested in production env. Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>

zuston · 2023-03-09T10:20:44Z

Share some performance test thoughts about using the point-point tests to find performance problems.

For this case, I wrote a spark job to directly create some mock data to write to single shuffle-server by using ShuffleServerGrpcClient directly, rather than leveraging spark own shuffle mechanism, which has too long process.

I think some important features or improvements could be tested by above way, this is more controllable.

WDYT? @advancedxy

… to reduce the memory allocation

…ockerfile (apache#683) ### What changes were proposed in this pull request? Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s. ### Why are the changes needed? As the GC problems mentioned in apache#674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1 Based on above practice, it's better to use JDK11 as the default java version to improve stability ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Have tested in production env. Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>

zuston · 2023-07-10T09:57:01Z

Mark this as a good first issue. Feel free to pick up!

zuston self-assigned this Mar 2, 2023

zuston pushed a commit to zuston/incubator-uniffle that referenced this issue Mar 4, 2023

[apache#674] Use jdk11 as default java version

f0cdc5d

zuston mentioned this issue Mar 4, 2023

[#674] feat(docker): use JDK11 as the default java version in Dockerfile #683

Merged

zuston added a commit to zuston/incubator-uniffle that referenced this issue Mar 14, 2023

[apache#674] improvement(server): use ByteString#asReadOnlyByteBuffer…

16b43fb

… to reduce the memory allocation

zuston mentioned this issue Mar 14, 2023

[#674] improvement(server): use ByteString#asReadOnlyByteBuffer to reduce the memory allocation #717

Closed

jerqi mentioned this issue May 29, 2023

[Umbrella] Netty replace Grpc on data transfer #133

Closed

21 tasks

zuston added the good first issue Good for newcomers label Jul 10, 2023

xianjingfeng mentioned this issue Jan 12, 2024

[#1433] fix(server): Race conditions with ShuffleServer state #1434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

zuston commented Mar 2, 2023 •

edited

Loading

zuston commented Mar 2, 2023 •

edited

Loading

jerqi commented Mar 3, 2023

zuston commented Mar 3, 2023

advancedxy commented Mar 3, 2023

zuston commented Mar 4, 2023

advancedxy commented Mar 5, 2023

zuston commented Mar 5, 2023

advancedxy commented Mar 5, 2023

advancedxy commented Mar 8, 2023

zuston commented Mar 9, 2023

zuston commented Mar 9, 2023

advancedxy commented Mar 9, 2023

zuston commented Mar 9, 2023

zuston commented Mar 9, 2023

zuston commented Jul 10, 2023

[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

Comments

zuston commented Mar 2, 2023 • edited Loading

Code of Conduct

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

zuston commented Mar 2, 2023 • edited Loading

jerqi commented Mar 3, 2023

zuston commented Mar 3, 2023

advancedxy commented Mar 3, 2023

zuston commented Mar 4, 2023

advancedxy commented Mar 5, 2023

zuston commented Mar 5, 2023

advancedxy commented Mar 5, 2023

advancedxy commented Mar 8, 2023

zuston commented Mar 9, 2023

zuston commented Mar 9, 2023

advancedxy commented Mar 9, 2023

zuston commented Mar 9, 2023

zuston commented Mar 9, 2023

zuston commented Jul 10, 2023

zuston commented Mar 2, 2023 •

edited

Loading

zuston commented Mar 2, 2023 •

edited

Loading