-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Use ByteString#asReadOnlyByteBuffer to reduce the memory allocation and GC pressure #674
Comments
I think this is a critical problem for Uniffle and I will fix this ASAP. cc @jerqi @advancedxy @xianjingfeng |
Will it block the release of version 0.7? |
No. |
This sounds good to me. Do you apply this in your prod env or not? However I believe it would require some careful management when this data is reused. It won't be released until the data is flushed. |
No. I have to say after applying this optimization, the GC pause is still serious in my point to point writing benchmark. So maybe this is not the main problem. Anyway, this is still an improvement. BTW, I want to share the benchmark result and I think uniffle's users will benefit from this. Shuffle server environment
Shuffle Writer program
It's amazing for me. I think we should drop java8 version |
Is there any parameter turning for Java 8 to achieve similar performance gain like Java 11? So I believe we cannot simply change the default JDK version to 11 as you proposed in #683. However, it's always possible to build rss-server with alternative JDK 11 based images. |
I don’t apply any param turning
Yes, we don’t directly drop java8. But we’d better to recommend using java11 or higher version, especially for shuffle server and avoid performance issues for users. Maybe this should be underlined in the readme. |
Could you do some gc parameter turning if possible since we are going to use Java 1.8 for a long time?
I'm fine for promoting Java 11 use especially for shuffle server without HDFS storage used. |
It occurred to me that Uniffle only uses hdfs client, It might be possible to use JDK 11 to run 3.x hadoop's hdfs client. But I didn't see any article about this setup... 😭 |
I have upgraded to JDK11 in uniffle shuffle-server, and it works well with 3.1.1 hdfs client. And I have tested the JAVA17, but it failed due to kerberos. |
JDK11 has been supported in hadoop 3.3.3, refer to: https://issues.apache.org/jira/browse/HADOOP-15338 |
Yeah, I know. But Hadoop 3.3.3 was released about one year ago which means it wasn't widely adopted or used.
This is good news. is it pretty smooth that not even configuration changes are needed? Or we need to do some tricks to make it work? |
Nothing is needed to change. |
…ile (#683) ### What changes were proposed in this pull request? Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s. ### Why are the changes needed? As the GC problems mentioned in #674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1 Based on above practice, it's better to use JDK11 as the default java version to improve stability ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Have tested in production env. Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
Share some performance test thoughts about using the point-point tests to find performance problems.
I think some important features or improvements could be tested by above way, this is more controllable. WDYT? @advancedxy |
… to reduce the memory allocation
…ockerfile (apache#683) ### What changes were proposed in this pull request? Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s. ### Why are the changes needed? As the GC problems mentioned in apache#674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1 Based on above practice, it's better to use JDK11 as the default java version to improve stability ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Have tested in production env. Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
…ockerfile (apache#683) ### What changes were proposed in this pull request? Use JDK11 as the default java version in Dockerfile when deploying shuffle servers on K8s. ### Why are the changes needed? As the GC problems mentioned in apache#674, I upgraded the JVM version from JDK8 to JDK11 for shuffle server. And after benchmark, I found the effect of the upgrade is remarkable. And it also works well with Hadoop3.1.1 Based on above practice, it's better to use JDK11 as the default java version to improve stability ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Have tested in production env. Co-authored-by: zhangjunfan <zhangjunfan@qiyi.com>
Mark this as a good first issue. Feel free to pick up! |
Code of Conduct
Search before asking
What would you like to be improved?
In our production environment, the shuffle server is always stopped due to the full GC. After increasing more memory, it still occurs, that makes me confused. So I suspect some abusing memory exists in uniffle.
After digging into the Grpc service api of
sendShuffleData
, I found something is abnormal. Please see this part code.incubator-uniffle/server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java
Lines 751 to 769 in 1fbdfe5
The byte[] will always be created in the invoking of
block.getData().toByteArray()
, which will cause GC if receiving data speed is high. And this has been proved in our dashboardBesides, the byte[] will be created when reading shuffle data in memory, which may be not the main factor to trigger this. Anyway, it also should be fixed.
How should we improve?
We could use the
block.getData().asReadOnlyByteBuffer()
to replacetoByteArray
to avoid unnecessary memory allocation.Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: