-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Introduce local allocation buffer to store blocks in memory #1727
Labels
help wanted
Extra attention is needed
Comments
@jerqi @zuston @advancedxy @rickyma PTAL. |
This issue seems feasible. I'll take a look first. We need this too. Currently, there are a few things that we can do to make blocks smaller:
|
rickyma
added a commit
to rickyma/incubator-uniffle
that referenced
this issue
May 30, 2024
…when flushing a single buffer to reduce gc
zuston
pushed a commit
that referenced
this issue
Jun 7, 2024
…buffer flush to mitigate GC issues (#1759) ### What changes were proposed in this pull request? Introduce block number threshold when flushing a single buffer, mitigating GC/OOM issues from potential excessive small blocks. ### Why are the changes needed? For: #1727. In a production environment, the Uniffle server may run jobs with various unreasonable configurations. These jobs might have a large number of partitions (tens of thousands, hundreds of thousands, or even millions), or they might have manually been configured with a very small spill size, or some other reasons. This may ultimately bring a large number of small blocks to the server, and the server has no choice but to maintain them in the heap memory for a long time, simply because **_their data size does not meet the conditions for flushing_**. This can cause severe garbage collection issues on the server side, and in extreme cases, it can even lead to out-of-heap-memory errors. In Netty mode, we use off-heap memory to store shuffle data. However, when facing jobs with extremely unreasonable configurations, the total size of the reference objects of the blocks maintained in the heap memory by the server may even exceed the size of the data stored off-heap. This can bring great instability to the server. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code of Conduct
Search before asking
What would you like to be improved?
Currently we have put the shuffle data into the off-heap memory in shuffle server . But I found it still occupancy a lot of heap memory.
The following is the result of printing by using
jmap -histo
.From the above results, we can see that the main reason for high memory usage is that there are too many blocks. And the reason why there are so many blocks is because the blocks are very small.
How should we improve?
Introduce local allocation buffer like
MSLAB
in Hbase.Refer: https://hbase.apache.org/book.html#gcpause
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: