[Improvement] Introduce local allocation buffer to store blocks in memory #1727

xianjingfeng · 2024-05-21T02:29:51Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently we have put the shuffle data into the off-heap memory in shuffle server . But I found it still occupancy a lot of heap memory.
The following is the result of printing by using jmap -histo.

   1:     189601376    16684921088  io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
   2:     189860728    15188858240  java.nio.DirectByteBuffer (java.base@11.0.1)
   3:     189605871    13651622712  jdk.internal.ref.Cleaner (java.base@11.0.1)
   4:     189018520    10585037120  org.apache.uniffle.common.ShufflePartitionedBlock
   5:     189605871     7584234840  java.nio.DirectByteBuffer$Deallocator (java.base@11.0.1)

From the above results, we can see that the main reason for high memory usage is that there are too many blocks. And the reason why there are so many blocks is because the blocks are very small.

How should we improve?

Introduce local allocation buffer like MSLAB in Hbase.
Refer: https://hbase.apache.org/book.html#gcpause

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

xianjingfeng · 2024-05-21T02:38:34Z

@jerqi @zuston @advancedxy @rickyma PTAL.
I'm quite busy recently. If anyone interested in it, welcome to pick it up.

rickyma · 2024-05-21T07:23:32Z

This issue seems feasible. I'll take a look first. We need this too.

Currently, there are a few things that we can do to make blocks smaller:

Set spark.rss.writer.buffer.spill.size to a higher value to make blocks larger, e.g. 1g or 2g.
Set rss.client.memory.spill.ratio less than 0.5, e.g. 0.3, let larger blocks spill first.
Set spark.rss.writer.buffer.size to a larger value refer to [FEATURE] support generating larger block size during shuffle map task by spill partial partitions data #1594 (comment), e.g. 10m.

…when flushing a single buffer to reduce gc

…buffer flush to mitigate GC issues (#1759) ### What changes were proposed in this pull request? Introduce block number threshold when flushing a single buffer, mitigating GC/OOM issues from potential excessive small blocks. ### Why are the changes needed? For: #1727. In a production environment, the Uniffle server may run jobs with various unreasonable configurations. These jobs might have a large number of partitions (tens of thousands, hundreds of thousands, or even millions), or they might have manually been configured with a very small spill size, or some other reasons. This may ultimately bring a large number of small blocks to the server, and the server has no choice but to maintain them in the heap memory for a long time, simply because **_their data size does not meet the conditions for flushing_**. This can cause severe garbage collection issues on the server side, and in extreme cases, it can even lead to out-of-heap-memory errors. In Netty mode, we use off-heap memory to store shuffle data. However, when facing jobs with extremely unreasonable configurations, the total size of the reference objects of the blocks maintained in the heap memory by the server may even exceed the size of the data stored off-heap. This can bring great instability to the server. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs.

xianjingfeng added the help wanted Extra attention is needed label May 21, 2024

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue May 30, 2024

[apache#1727] improvement(server): Introduce blocks number threshold …

e2cbcc1

…when flushing a single buffer to reduce gc

rickyma mentioned this issue May 30, 2024

[#1727] improvement(server): Introduce block num threshold for early buffer flush to mitigate GC issues #1759

Merged

xianjingfeng mentioned this issue Jun 4, 2024

[#1708] feat(server): support use skip list to store shuffleBuffer in memory #1763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

xianjingfeng commented May 21, 2024

xianjingfeng commented May 21, 2024

rickyma commented May 21, 2024

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

Comments

xianjingfeng commented May 21, 2024

Code of Conduct

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

xianjingfeng commented May 21, 2024

rickyma commented May 21, 2024