Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674

Open
2 of 3 tasks
zuston opened this issue Mar 2, 2023 · 15 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@zuston
Copy link
Member

zuston commented Mar 2, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

In our production environment, the shuffle server is always stopped due to the full GC. After increasing more memory, it still occurs, that makes me confused. So I suspect some abusing memory exists in uniffle.

After digging into the Grpc service api of sendShuffleData, I found something is abnormal. Please see this part code.

private ShufflePartitionedBlock[] toPartitionedBlock(List<ShuffleBlock> blocks) {
if (blocks == null || blocks.size() == 0) {
return new ShufflePartitionedBlock[]{};
}
ShufflePartitionedBlock[] ret = new ShufflePartitionedBlock[blocks.size()];
int i = 0;
for (ShuffleBlock block : blocks) {
ret[i] = new ShufflePartitionedBlock(
block.getLength(),
block.getUncompressLength(),
block.getCrc(),
block.getBlockId(),
block.getTaskAttemptId(),
block.getData().toByteArray());
i++;
}
return ret;
}

The byte[] will always be created in the invoking of block.getData().toByteArray(), which will cause GC if receiving data speed is high. And this has been proved in our dashboard

Besides, the byte[] will be created when reading shuffle data in memory, which may be not the main factor to trigger this. Anyway, it also should be fixed.

How should we improve?

We could use the block.getData().asReadOnlyByteBuffer() to replace toByteArray to avoid unnecessary memory allocation.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@zuston
Copy link
Member Author

zuston commented Mar 2, 2023

I think this is a critical problem for Uniffle and I will fix this ASAP. cc @jerqi @advancedxy @xianjingfeng

@zuston zuston self-assigned this Mar 2, 2023
@jerqi
Copy link
Contributor

jerqi commented Mar 3, 2023

Will it block the release of version 0.7?

@zuston
Copy link
Member Author

zuston commented Mar 3, 2023

Will it block the release of version 0.7?

No.

@advancedxy
Copy link
Contributor

We could use the block.getData().asReadOnlyByteBuffer() to replace toByteArray to avoid unnecessary memory allocation.

This sounds good to me. Do you apply this in your prod env or not?

However I believe it would require some careful management when this data is reused. It won't be released until the data is flushed.

@zuston
Copy link
Member Author

zuston commented Mar 4, 2023

This sounds good to me. Do you apply this in your prod env or not?

No.

I have to say after applying this optimization, the GC pause is still serious in my point to point writing benchmark. So maybe this is not the main problem. Anyway, this is still an improvement.

BTW, I want to share the benchmark result and I think uniffle's users will benefit from this.

Shuffle server environment

  1. XMX 20G
  2. buffer capacity 10G
  3. read capacity 5G

Shuffle Writer program

  1. using spark to write shuffle data with 20 executors. Single executor will total write 1G, and each time write 14M
Java version ShuffleServer GC Max pause time ThroughOutput
8 G1 30s 0.3
11 G1 2.5s 0.8
18 G1 2.5s 0.8
18 ZGC 0.2ms 0.99997

It's amazing for me. I think we should drop java8 version

@advancedxy
Copy link
Contributor

This sounds good to me. Do you apply this in your prod env or not?

No.

I have to say after applying this optimization, the GC pause is still serious in my point to point writing benchmark. So maybe this is not the main problem. Anyway, this is still an improvement.

BTW, I want to share the benchmark result and I think uniffle's users will benefit from this.

Shuffle server environment

  1. XMX 20G
  2. buffer capacity 10G
  3. read capacity 5G

Shuffle Writer program

  1. using spark to write shuffle data with 20 executors. Single executor will total write 1G, and each time write 14M

Java version ShuffleServer GC Max pause time ThroughOutput
8 G1 30s 0.3
11 G1 2.5s 0.8
18 G1 2.5s 0.8
18 ZGC 0.2ms 0.99997
It's amazing for me. I think we should drop java8 version

Is there any parameter turning for Java 8 to achieve similar performance gain like Java 11?
As we discussed earlier, Hadoop 2 and various Hadoop 3 versions support java 8 only, https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions, I don't think we can drop Java 8 support. I believe that's why Spark still supports Java 8.

So I believe we cannot simply change the default JDK version to 11 as you proposed in #683. However, it's always possible to build rss-server with alternative JDK 11 based images.

@zuston
Copy link
Member Author

zuston commented Mar 5, 2023

Is there any parameter turning for Java 8 to achieve similar performance gain like Java 11?

I don’t apply any param turning

So I believe we cannot simply change the default JDK version to 11 as you proposed in #683. However, it's always possible to build rss-server with alternative JDK 11 based images.

Yes, we don’t directly drop java8. But we’d better to recommend using java11 or higher version, especially for shuffle server and avoid performance issues for users.

Maybe this should be underlined in the readme.

@advancedxy
Copy link
Contributor

I don’t apply any param turning

Could you do some gc parameter turning if possible since we are going to use Java 1.8 for a long time?

Yes, we don’t directly drop java8. But we’d better to recommend using java11 or higher version, especially for shuffle server and avoid performance issues for users.
Maybe this should be underlined in the readme.

I'm fine for promoting Java 11 use especially for shuffle server without HDFS storage used.

@advancedxy
Copy link
Contributor

It occurred to me that Uniffle only uses hdfs client, It might be possible to use JDK 11 to run 3.x hadoop's hdfs client.

But I didn't see any article about this setup... 😭

@zuston
Copy link
Member Author

zuston commented Mar 9, 2023

It occurred to me that Uniffle only uses hdfs client, It might be possible to use JDK 11 to run 3.x hadoop's hdfs client.

But I didn't see any article about this setup... 😭

I have upgraded to JDK11 in uniffle shuffle-server, and it works well with 3.1.1 hdfs client.

And I have tested the JAVA17, but it failed due to kerberos.

@zuston
Copy link
Member Author

zuston commented Mar 9, 2023

JDK11 has been supported in hadoop 3.3.3, refer to: https://issues.apache.org/jira/browse/HADOOP-15338

@advancedxy
Copy link
Contributor

JDK11 has been supported in hadoop 3.3.3, refer to: https://issues.apache.org/jira/browse/HADOOP-15338

Yeah, I know. But Hadoop 3.3.3 was released about one year ago which means it wasn't widely adopted or used.

I have upgraded to JDK11 in uniffle shuffle-server, and it works well with 3.1.1 hdfs client.

This is good news. is it pretty smooth that not even configuration changes are needed? Or we need to do some tricks to make it work?

@zuston
Copy link
Member Author

zuston commented Mar 9, 2023

This is good news. is it pretty smooth that not even configuration changes are needed? Or we need to do some tricks to make it work?

Nothing is needed to change.

zuston added a commit that referenced this issue Mar 9, 2023
…ile (#683)

### What changes were proposed in this pull request?

Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s.

### Why are the changes needed?

As the GC problems mentioned in #674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1

Based on above practice, it's better to use JDK11 as the default java version to improve stability

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

Have tested in production env.

Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
@zuston
Copy link
Member Author

zuston commented Mar 9, 2023

Share some performance test thoughts about using the point-point tests to find performance problems.

  1. For this case, I wrote a spark job to directly create some mock data to write to single shuffle-server by using ShuffleServerGrpcClient directly, rather than leveraging spark own shuffle mechanism, which has too long process.

I think some important features or improvements could be tested by above way, this is more controllable.

WDYT? @advancedxy

zuston added a commit to zuston/incubator-uniffle that referenced this issue Mar 14, 2023
advancedxy pushed a commit to advancedxy/incubator-uniffle that referenced this issue Mar 21, 2023
…ockerfile (apache#683)

### What changes were proposed in this pull request?

Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s.

### Why are the changes needed?

As the GC problems mentioned in apache#674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1

Based on above practice, it's better to use JDK11 as the default java version to improve stability

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

Have tested in production env.

Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Apr 5, 2023
…ockerfile (apache#683)

### What changes were proposed in this pull request?

Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s.

### Why are the changes needed?

As the GC problems mentioned in apache#674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1

Based on above practice, it's better to use JDK11 as the default java version to improve stability

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

Have tested in production env.

Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
@zuston zuston added the good first issue Good for newcomers label Jul 10, 2023
@zuston
Copy link
Member Author

zuston commented Jul 10, 2023

Mark this as a good first issue. Feel free to pick up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
3 participants