Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

Open
3 tasks done
xianjingfeng opened this issue May 21, 2024 · 2 comments
Open
3 tasks done
Labels
help wanted Extra attention is needed

Comments

@xianjingfeng
Copy link
Member

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently we have put the shuffle data into the off-heap memory in shuffle server . But I found it still occupancy a lot of heap memory.
The following is the result of printing by using jmap -histo.

   1:     189601376    16684921088  io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
   2:     189860728    15188858240  java.nio.DirectByteBuffer (java.base@11.0.1)
   3:     189605871    13651622712  jdk.internal.ref.Cleaner (java.base@11.0.1)
   4:     189018520    10585037120  org.apache.uniffle.common.ShufflePartitionedBlock
   5:     189605871     7584234840  java.nio.DirectByteBuffer$Deallocator (java.base@11.0.1)

From the above results, we can see that the main reason for high memory usage is that there are too many blocks. And the reason why there are so many blocks is because the blocks are very small.

How should we improve?

Introduce local allocation buffer like MSLAB in Hbase.
Refer: https://hbase.apache.org/book.html#gcpause

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@xianjingfeng
Copy link
Member Author

@jerqi @zuston @advancedxy @rickyma PTAL.
I'm quite busy recently. If anyone interested in it, welcome to pick it up.

@xianjingfeng xianjingfeng added the help wanted Extra attention is needed label May 21, 2024
@rickyma
Copy link
Contributor

rickyma commented May 21, 2024

This issue seems feasible. I'll take a look first. We need this too.

Currently, there are a few things that we can do to make blocks smaller:

  1. Set spark.rss.writer.buffer.spill.size to a higher value to make blocks larger, e.g. 1g or 2g.
  2. Set rss.client.memory.spill.ratio less than 0.5, e.g. 0.3, let larger blocks spill first.
  3. Set spark.rss.writer.buffer.size to a larger value refer to [FEATURE] support generating larger block size during shuffle map task by spill partial partitions data #1594 (comment), e.g. 10m.

rickyma added a commit to rickyma/incubator-uniffle that referenced this issue May 30, 2024
zuston pushed a commit that referenced this issue Jun 7, 2024
…buffer flush to mitigate GC issues (#1759)

### What changes were proposed in this pull request?

Introduce block number threshold when flushing a single buffer, mitigating GC/OOM issues from potential excessive small blocks.

### Why are the changes needed?

For: #1727.

In a production environment, the Uniffle server may run jobs with various unreasonable configurations. These jobs might have a large number of partitions (tens of thousands, hundreds of thousands, or even millions), or they might have manually been configured with a very small spill size, or some other reasons. This may ultimately bring a large number of small blocks to the server, and the server has no choice but to maintain them in the heap memory for a long time, simply because **_their data size does not meet the conditions for flushing_**. This can cause severe garbage collection issues on the server side, and in extreme cases, it can even lead to out-of-heap-memory errors.

In Netty mode, we use off-heap memory to store shuffle data. However, when facing jobs with extremely unreasonable configurations, the total size of the reference objects of the blocks maintained in the heap memory by the server may even exceed the size of the data stored off-heap. This can bring great instability to the server.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants