You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Describe the bug
Currently using the code of the master branch to test a large data set, an io.netty.util.internal.OutOfDirectMemoryError exception will occur.
Is there a memory leak?
I tried to use the end-to-end test method. I found that in the scenario of 5 concurrency and 500G, the oom exception did not appear, but in the scenario of 40 concurrency and 500G, the oom exception appeared. It seems that the problem of oom is caused by actual memory occupies larger than used_buffer instead of memory leak.
When this line of code ByteBuf data = ByteBufUtils.readSlice(byteBuf); was executed in org.apache.uniffle.common.netty.protocol.Decoders#decodeShuffleBlockInfo, the life cycle of the ByteBuf applied inside the netty framework is extended, TransportFrameDecoder# When channelRead executes frame.release();, ByteBuf cannot be released. I am not sure whether the Bytebuf object generated by the netty framework contains other content. Judging from the current test results, it seems to lead to off-heap memory Occupies much larger than used_buffer.
Affects Version(s)
master
Uniffle Server Log Output
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used:85899345920, max:85899345920)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:802)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:731)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:632)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202)
at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:186)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:140)
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:150)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
…1151)
### What changes were proposed in this pull request?
Fix io.netty.util.internal.OutOfDirectMemoryError.
### Why are the changes needed?
Fix io.netty.util.internal.OutOfDirectMemoryError.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UTs.
Co-authored-by: leixianming <leixianming@didiglobal.com>
Code of Conduct
Search before asking
Describe the bug
Currently using the code of the master branch to test a large data set, an
io.netty.util.internal.OutOfDirectMemoryError
exception will occur.Is there a memory leak?
I tried to use the end-to-end test method. I found that in the scenario of 5 concurrency and 500G, the oom exception did not appear, but in the scenario of 40 concurrency and 500G, the oom exception appeared. It seems that the problem of oom is caused by
actual memory occupies larger than used_buffer
instead ofmemory leak
.When this line of code
ByteBuf data = ByteBufUtils.readSlice(byteBuf);
was executed in org.apache.uniffle.common.netty.protocol.Decoders#decodeShuffleBlockInfo, the life cycle of the ByteBuf applied inside the netty framework is extended, TransportFrameDecoder# When channelRead executesframe.release();
, ByteBuf cannot be released. I am not sure whether the Bytebuf object generated by the netty framework contains other content. Judging from the current test results, it seems to lead to off-heap memory Occupies much larger thanused_buffer
.Affects Version(s)
master
Uniffle Server Log Output
Uniffle Engine Log Output
No response
Uniffle Server Configurations
Uniffle Engine Configurations
No response
Additional context
Test script:
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: