[Bug] Reading local shuffle data in high-pressure scenarios may lead to high system load #1596

rickyma · 2024-03-21T12:32:31Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the bug

When enabling Netty, the current timing of releasing readMemory in the code is incorrect because the method client.getChannel().writeAndFlush() is asynchronous. If we release readMemory directly, it will result in readMemory being released before it has a chance to take effect. And the file reading only occurs after the writeAndFlush method is called. We should add a ChannelFutureListener and use its callback mechanism to release readMemory. This can ensure the writeAndFlush method is truly completed.
We haven't set a limit on the maximum number of concurrent requests(for reading local shuffle data) that can be processed at the same time; we only controlled the maximum readCapacity of the buffer when reading local shuffle data. This approach falls short. In our tests, we discovered that the metric read_used_buffer_size could potentially only reach up to 3.75GB, yet the system load on the shuffle server was already high. This suggests that relying solely on readCapacity to manage the reading of local shuffle data is inadequate.

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

rickyma · 2024-03-21T12:34:18Z

I will use a thread pool to control the concurrency when reading shuffle data. If anyone has any good suggestions, please feel free to let me know.

zuston · 2024-03-22T01:50:17Z

Read concurrency is not limited in current codebase. +1 for this issue.

jerqi · 2024-03-22T02:55:03Z

We ever calculated the number of write threads. 2 threads per HDD or 3 threads per SSD will obtain the best performance.
You can evaluate the number threads which we need.

…ease readMemory

… release readMemory (#1605) ### What changes were proposed in this pull request? 1. Add a `ChannelFutureListener` and use its callback mechanism to release `readMemory` only after the `writeAndFlush` method is truly completed. 2. Change the descriptions of configurations `rss.server.buffer.capacity.ratio` and `rss.server.read.buffer.capacity.ratio`. ### Why are the changes needed? This is actually a bug, which was introduced by PR #879. The issue has been present since the very beginning when the Netty feature was first integrated. Fix #1596. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I don't think we need new tests. Tested in our env. The new log will be: ``` [2024-03-26 23:11:51.039] [epollEventLoopGroup-3-158] [INFO] ShuffleServerNettyHandler.operationComplete - Successfully executed getLocalShuffleData for appId[application_1703049085550_7359933_1711463990606], shuffleId[0], partitionId[1328], offset[0], length[14693742]. Took 1457 ms and retrieved 14693742 bytes of data [2024-03-26 23:11:51.040] [epollEventLoopGroup-3-130] [INFO] ShuffleServerNettyHandler.operationComplete - Successfully executed getMemoryShuffleData for appId[application_1703049085550_7359933_1711463990606], shuffleId[0], partitionId[1262]. Took 1 ms and retrieved 0 bytes of data [2024-03-26 23:11:51.068] [epollEventLoopGroup-3-177] [INFO] ShuffleServerNettyHandler.operationComplete - Successfully executed getLocalShuffleIndex for appId[application_1703049085550_7359933_1711463990606], shuffleId[0], partitionId[1366]. Took 918 ms and retrieved 1653600 bytes of data ```

…he channel is writable

…nnel is writable (#1641) ### What changes were proposed in this pull request? 1. Send failed responses only when the channel is writable. 2. Print debug logs when the data is successfully sent, reducing log output. 3. Reduce the duplicated error log. ### Why are the changes needed? A follow-up PR for: #1605. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs.

dingshun3016 · 2024-05-27T02:46:17Z

Is this error message also affected by this problem? @rickyma
24/05/26 12:13:01 WARN [netty-rpc-client-57-5] TransportClient: Failed to send request RPC 1472 to /...:49999: org.apache.uniffle.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4076863488, max: 4080271360), channel will be closed

rickyma · 2024-05-27T02:55:02Z

No, OutOfDirectMemoryError should be related to #1651.

rickyma mentioned this issue Mar 25, 2024

[#1598] fix(server) Fix inaccurate used_direct_memory_size metric #1599

Merged

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Mar 26, 2024

[apache#1596] fix(netty): Use a ChannelFutureListener callback to rel…

46f075a

…ease readMemory

rickyma mentioned this issue Mar 26, 2024

[#1596] fix(netty): Use a ChannelFutureListener callback mechanism to release readMemory #1605

Merged

zuston closed this as completed in #1605 Mar 28, 2024

rickyma mentioned this issue Apr 10, 2024

[Improvement] Use thread pool to control the concurrency of data reading threads when enabling Netty #1639

Closed

3 tasks

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Apr 11, 2024

[apache#1596][FOLLOWUP] fix(netty): Send failed responses only when t…

88b2564

…he channel is writable

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Apr 11, 2024

[apache#1596][FOLLOWUP] fix(netty): Send failed responses only when t…

ed04ee6

…he channel is writable

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Apr 11, 2024

[apache#1596][FOLLOWUP] fix(netty): Send failed responses only when t…

71288ae

…he channel is writable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Reading local shuffle data in high-pressure scenarios may lead to high system load #1596

[Bug] Reading local shuffle data in high-pressure scenarios may lead to high system load #1596

rickyma commented Mar 21, 2024 •

edited

Loading

rickyma commented Mar 21, 2024

zuston commented Mar 22, 2024 •

edited

Loading

jerqi commented Mar 22, 2024

dingshun3016 commented May 27, 2024

rickyma commented May 27, 2024

[Bug] Reading local shuffle data in high-pressure scenarios may lead to high system load #1596

[Bug] Reading local shuffle data in high-pressure scenarios may lead to high system load #1596

Comments

rickyma commented Mar 21, 2024 • edited Loading

Code of Conduct

Search before asking

Describe the bug

Affects Version(s)

Uniffle Server Log Output

Uniffle Engine Log Output

Uniffle Server Configurations

Uniffle Engine Configurations

Additional context

Are you willing to submit PR?

rickyma commented Mar 21, 2024

zuston commented Mar 22, 2024 • edited Loading

jerqi commented Mar 22, 2024

dingshun3016 commented May 27, 2024

rickyma commented May 27, 2024

rickyma commented Mar 21, 2024 •

edited

Loading

zuston commented Mar 22, 2024 •

edited

Loading