Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] IllegalReferenceCountException: refCnt: 0 in netty encode message #1127

Closed
2 of 3 tasks
zuston opened this issue Aug 10, 2023 · 11 comments · Fixed by #1150
Closed
2 of 3 tasks

[Bug] IllegalReferenceCountException: refCnt: 0 in netty encode message #1127

zuston opened this issue Aug 10, 2023 · 11 comments · Fixed by #1150

Comments

@zuston
Copy link
Member

zuston commented Aug 10, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

23/08/09 18:22:47 ERROR util.ByteBufUtils: Failed to copy. size: 120228
23/08/09 18:22:47 ERROR netty.MessageEncoder: Unexpected exception during process encode!
org.apache.uniffle.common.exception.RssException: org.apache.uniffle.io.netty.util.IllegalReferenceCountException: refCnt: 0
    at org.apache.uniffle.common.util.ByteBufUtils.copyByteBuf(ByteBufUtils.java:72)
    at org.apache.uniffle.common.ShuffleBlockInfo.copyDataTo(ShuffleBlockInfo.java:157)
    at org.apache.uniffle.common.netty.protocol.Encoders.encodeShuffleBlockInfo(Encoders.java:46)
    at org.apache.uniffle.common.netty.protocol.SendShuffleDataRequest.encodePartitionData(SendShuffleDataRequest.java:116)
    at org.apache.uniffle.common.netty.protocol.SendShuffleDataRequest.encode(SendShuffleDataRequest.java:80)
    at org.apache.uniffle.common.netty.MessageEncoder.write(MessageEncoder.java:54)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
    at org.apache.uniffle.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at org.apache.uniffle.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at org.apache.uniffle.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at org.apache.uniffle.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at org.apache.uniffle.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at org.apache.uniffle.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at org.apache.uniffle.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at org.apache.uniffle.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.uniffle.io.netty.util.IllegalReferenceCountException: refCnt: 0
    at org.apache.uniffle.io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454)
    at org.apache.uniffle.io.netty.buffer.UnpooledHeapByteBuf.array(UnpooledHeapByteBuf.java:147)
    at org.apache.uniffle.io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:526)
    at org.apache.uniffle.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:193)
    at org.apache.uniffle.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104)
    at org.apache.uniffle.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1096)
    at org.apache.uniffle.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1087)
    at org.apache.uniffle.common.util.ByteBufUtils.copyByteBuf(ByteBufUtils.java:68)
    ... 20 more

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@zuston
Copy link
Member Author

zuston commented Aug 10, 2023

cc @leixm @xumanbu

@zuston
Copy link
Member Author

zuston commented Aug 10, 2023

The ShuffleBlockInfo 's copyDataTo will be invoked multiple times. But I think the data in this will be released?

@jerqi
Copy link
Contributor

jerqi commented Aug 10, 2023

The ShuffleBlockInfo 's copyDataTo will be invoked multiple times. But I think the data in this will be released?

The memory won't be released in some cases if the memory isn't allowed to release by users.

@jerqi
Copy link
Contributor

jerqi commented Aug 10, 2023

public class UnpooledDirectByteBuf extends AbstractReferenceCountedByteBuf {
  private final ByteBufAllocator alloc;
  ByteBuffer buffer;
  private ByteBuffer tmpNioBuf;
  private int capacity;
  private boolean doNotFree

The variable doNotFree matters.

@zuston
Copy link
Member Author

zuston commented Aug 10, 2023

I didn't find we set this variable. And the concrete byte buf is UnpooledHeapByteBuf instead of UnpooledDirectByteBuf

@jerqi
Copy link
Contributor

jerqi commented Aug 10, 2023

I didn't find we set this variable. And the concrete byte buf is UnpooledHeapByteBuf instead of UnpooledDirectByteBuf

If we use HeapByteBuf, we don't enable off heap memory.

@jerqi
Copy link
Contributor

jerqi commented Aug 10, 2023

UnsafeByteBufUtil

Oh, it's just tmp data.

@zuston
Copy link
Member Author

zuston commented Aug 10, 2023

What do you mean? Sorry I'm not familiar with netty.

@jerqi
Copy link
Contributor

jerqi commented Aug 10, 2023

What do you mean? Sorry I'm not familiar with netty.

My mistake, igore it.

@zuston
Copy link
Member Author

zuston commented Aug 15, 2023

Could you help check this failure? @leixm

@leixm
Copy link
Contributor

leixm commented Aug 15, 2023

Sure, i will check this failure.

zuston added a commit that referenced this issue Aug 16, 2023
…ta in client side (#1150)

### What changes were proposed in this pull request?

The principle of data being released is that the data has been sent. 
However, under the current implementation, all blocks will be released in the last event. 
Once executed out of order, the unsent block data will be released prematurely, which is wrong.

### Why are the changes needed?

Fix: #1127 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

1. existing UTs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants