Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate Netty channel flushes to mitigate syscall overhead #3383

Merged
merged 1 commit into from
Jul 26, 2022

Conversation

merlimat
Copy link
Contributor

@merlimat merlimat commented Jul 3, 2022

Motivation

When we are writing/reading a lot of small entries, one of the main bottleneck in BK client/server becomes the CPU usage. A big part of it is caused by the syscall overhead when writing to the socket.

The reason is that we have many independent (and small) operations happening on the connection and each time we call writeAndFlush() on each of them, causing many write() calls on the socket.

Netty has a mechanism to consolidate the flushes on the channel and it improves the handling of many small entries.

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

codelipenghui pushed a commit to apache/pulsar that referenced this pull request Jul 4, 2022
…ll overhead (#16361)

### Motivation

This is similar to apache/bookkeeper#3383.

When we are writing/reading a lot of small entries, one of the main bottleneck in Pulsar client/broker becomes the CPU usage. A big part of it is caused by the syscall overhead when writing to the socket. 

The reason is that we have many independent (and small) operations happening on the connection and each time we call `writeAndFlush()` on each of them, causing many `write()` calls on the socket.
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we make this configurable ?

@eolivelli eolivelli self-requested a review July 4, 2022 12:06
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Jul 5, 2022
…ll overhead (apache#16361)

### Motivation

This is similar to apache/bookkeeper#3383.

When we are writing/reading a lot of small entries, one of the main bottleneck in Pulsar client/broker becomes the CPU usage. A big part of it is caused by the syscall overhead when writing to the socket. 

The reason is that we have many independent (and small) operations happening on the connection and each time we call `writeAndFlush()` on each of them, causing many `write()` calls on the socket.
@merlimat
Copy link
Contributor Author

merlimat commented Jul 5, 2022

shouldn't we make this configurable ?

@eolivelli In the tests there is no negative in keeping this even in cases where syscall overhead is not dominant, so I believe it shouldn't be necessary.

@dlg99
Copy link
Contributor

dlg99 commented Jul 5, 2022

In the tests there is no negative in keeping this even in cases where syscall overhead is not dominant

@merlimat can you share the test results with/without the change? e.g. under high/low load scenarios.
netty/netty@e3c8a92 says "The only "downside" may be a bit higher latency in the case of where only one flush is triggered by the user."
Unfortunately it is not obvious how much higher is "a bit higher" and how it is affected by choice of parameters (1024 in current case vs 256 used by the handler as default)

@dlg99 dlg99 requested a review from jvrao July 5, 2022 15:49
wuxuanqicn pushed a commit to wuxuanqicn/pulsar that referenced this pull request Jul 14, 2022
…ll overhead (apache#16361)

### Motivation

This is similar to apache/bookkeeper#3383.

When we are writing/reading a lot of small entries, one of the main bottleneck in Pulsar client/broker becomes the CPU usage. A big part of it is caused by the syscall overhead when writing to the socket. 

The reason is that we have many independent (and small) operations happening on the connection and each time we call `writeAndFlush()` on each of them, causing many `write()` calls on the socket.
@hangc0276 hangc0276 mentioned this pull request Jul 20, 2022
11 tasks
Copy link
Contributor

@StevenLuMT StevenLuMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question
FlushConsolidationHandler(1024, true) is optimal in different environments?

@merlimat
Copy link
Contributor Author

@eolivelli @dlg99 @StevenLuMT

We have done a lot of perf tests with this flush-merger enabled in BK and Pulsar.

  1. At low request rates (eg: when there is batching involved), there is absolutely no visible difference in either the latency or the CPU usage.
  2. At high requests rates there is a marked improvement of the latency and the CPU utilization, due to the reduced number of syscall invocations, and in the end of bigger TCP frames sent and received.

@StevenLuMT The 1024 number is to increase the chances of bigger consolidation. It will not affect the latency.

A couple of examples with low-rate scenarios:

Persistent topic:

2022-07-04T01:54:25,174+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer -
    Throughput produced:     292 msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s
    --- Latency: mean:   2.673 ms - med:   2.652 - 95pct:   3.044 - 99pct:   3.155 -
         99.9pct:   3.261 - 99.99pct:   3.261 - Max:   3.261

Non-Persistent topic:

2022-07-04T01:55:28,361+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer -
    Throughput produced:     195 msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s 
    --- Latency: mean:   0.703 ms - med:   0.675 - 95pct:   0.978 - 99pct:   1.041 - 
          99.9pct:   1.241 - 99.99pct:   1.241 - Max:   1.241

@merlimat
Copy link
Contributor Author

@eolivelli @dlg99 PTAL again

@hangc0276
Copy link
Contributor

ping @eolivelli, Would you please help take a look? thanks.

Copy link
Contributor

@StevenLuMT StevenLuMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@StevenLuMT
Copy link
Contributor

@eolivelli @dlg99 @StevenLuMT

We have done a lot of perf tests with this flush-merger enabled in BK and Pulsar.

  1. At low request rates (eg: when there is batching involved), there is absolutely no visible difference in either the latency or the CPU usage.
  2. At high requests rates there is a marked improvement of the latency and the CPU utilization, due to the reduced number of syscall invocations, and in the end of bigger TCP frames sent and received.

@StevenLuMT The 1024 number is to increase the chances of bigger consolidation. It will not affect the latency.

A couple of examples with low-rate scenarios:

Persistent topic:

2022-07-04T01:54:25,174+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer -
    Throughput produced:     292 msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s
    --- Latency: mean:   2.673 ms - med:   2.652 - 95pct:   3.044 - 99pct:   3.155 -
         99.9pct:   3.261 - 99.99pct:   3.261 - Max:   3.261

Non-Persistent topic:

2022-07-04T01:55:28,361+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer -
    Throughput produced:     195 msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s 
    --- Latency: mean:   0.703 ms - med:   0.675 - 95pct:   0.978 - 99pct:   1.041 - 
          99.9pct:   1.241 - 99.99pct:   1.241 - Max:   1.241

thanks,look nice

@zymap
Copy link
Member

zymap commented Jul 26, 2022

@eolivelli Would you please take another look?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sorry for late reply

thanks for your clarifications

@eolivelli eolivelli merged commit de70f4f into apache:master Jul 26, 2022
zymap pushed a commit that referenced this pull request Aug 2, 2022
@zymap
Copy link
Member

zymap commented Sep 8, 2022

Hi @merlimat, we found a test failed in pulsar which was caused by this fix.
I write a test to verify that. It will fail when open a ledger with recovery at here

2022-09-08T18:12:24,810 - INFO - [bookie-io-13-1:ReadEntryProcessor@61] - Thread name bookie-io-13-1 @ processPacket 2022-09-08T18:12:24,810 - WARN - [bookie-io-13-1:ReadEntryProcessor@71] - Ledger: 0 fenced by: /127.0.0.1:65535 2022-09-08T18:12:24,810 - INFO - [bookie-io-13-1:ReadEntryProcessor@86] - Fence result is not empty 2022-09-08T18:12:24,810 - INFO - [bookie-io-13-1:ReadEntryProcessor@87] - Thread name bookie-io-13-1 @ processPacket, fence result 2022-09-08T18:12:27,318 - INFO - [bookie-io-13-1:PacketProcessorBase@145] - Thread name bookie-io-13-1 @ send response and wait | org.apache.bookkeeper.proto.BookieProtocol$ReadResponse 2022-09-08T18:12:27,318 - ERROR - [bookie-io-13-1:ReadEntryProcessor@113] - Unexpected exception reading at 0:-1 : DefaultChannelPromise@ab4113(incomplete) io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@ab4113(incomplete) at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:462) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:159) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:247) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:131) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:30) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.PacketProcessorBase.sendResponseAndWait(PacketProcessorBase.java:147) ~[classes/:?] at org.apache.bookkeeper.proto.PacketProcessorBase.sendReadReqResponse(PacketProcessorBase.java:80) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.sendResponse(ReadEntryProcessor.java:140) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.sendFenceResponse(ReadEntryProcessor.java:146) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.handleReadResultForFenceRead(ReadEntryProcessor.java:169) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.processPacket(ReadEntryProcessor.java:88) ~[classes/:?] at org.apache.bookkeeper.proto.PacketProcessorBase.safeRun(PacketProcessorBase.java:175) ~[classes/:?] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestProcessor.processReadRequest(BookieRequestProcessor.java:665) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestProcessor.processRequest(BookieRequestProcessor.java:372) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestHandler.channelRead(BookieRequestHandler.java:78) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.AuthHandler$ServerSideHandler.channelRead(AuthHandler.java:86) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.BookieProtoEncoding$RequestDecoder.channelRead(BookieProtoEncoding.java:447) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at java.lang.Thread.run(Thread.java:833) ~[?:?] 2022-09-08T18:12:27,318 - INFO - [bookie-io-13-1:ReadEntryProcessor@121] - Thread name bookie-io-13-1 @ processPacket, send response 2022-09-08T18:12:27,765 - INFO - [bookie-io-13-1:PacketProcessorBase@145] - Thread name bookie-io-13-1 @ send response and wait | org.apache.bookkeeper.proto.BookieProtocol$ReadResponse 2022-09-08T18:12:27,765 - ERROR - [bookie-io-13-1:SafeRunnable@38] - Unexpected throwable caught io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@50605c50(incomplete) at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:462) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:159) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:247) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:131) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:30) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.PacketProcessorBase.sendResponseAndWait(PacketProcessorBase.java:147) ~[classes/:?] at org.apache.bookkeeper.proto.PacketProcessorBase.sendReadReqResponse(PacketProcessorBase.java:80) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.sendResponse(ReadEntryProcessor.java:140) ~[classes/:?] at org.apache.bookkeeper.proto.ReadEntryProcessor.processPacket(ReadEntryProcessor.java:122) ~[classes/:?] at org.apache.bookkeeper.proto.PacketProcessorBase.safeRun(PacketProcessorBase.java:175) ~[classes/:?] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestProcessor.processReadRequest(BookieRequestProcessor.java:665) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestProcessor.processRequest(BookieRequestProcessor.java:372) ~[classes/:?] at org.apache.bookkeeper.proto.BookieRequestHandler.channelRead(BookieRequestHandler.java:78) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.AuthHandler$ServerSideHandler.channelRead(AuthHandler.java:86) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at org.apache.bookkeeper.proto.BookieProtoEncoding$RequestDecoder.channelRead(BookieProtoEncoding.java:447) ~[classes/:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) ~[netty-codec-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at java.lang.Thread.run(Thread.java:833) ~[?:?] 2022-09-08T18:12:27,766 - ERROR - [bookie-io-13-1:BookieRequestHandler@68] - Unhandled exception occurred in I/O thread or handler on [id: 0xbf0f987a, L:/127.0.0.1:65522 - R:/127.0.0.1:65535] io.netty.channel.socket.ChannelOutputShutdownException: Channel output shutdown at io.netty.channel.AbstractChannel$AbstractUnsafe.shutdownOutput(AbstractChannel.java:650) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:953) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushNow(FlushConsolidationHandler.java:204) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushIfNeeded(FlushConsolidationHandler.java:197) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.resetReadAndFlushIfNeeded(FlushConsolidationHandler.java:192) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.channelReadComplete(FlushConsolidationHandler.java:145) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:390) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelReadComplete(DefaultChannelPipeline.java:1415) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelReadComplete(DefaultChannelPipeline.java:925) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:171) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at java.lang.Thread.run(Thread.java:833) ~[?:?] Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.UnpooledHeapByteBuf.nioBuffer(UnpooledHeapByteBuf.java:305) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.nioBuffer(AbstractDerivedByteBuf.java:122) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.internalNioBuffer(AbstractDerivedByteBuf.java:117) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelOutboundBuffer.nioBuffers(ChannelOutboundBuffer.java:447) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:399) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:931) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] ... 26 more 2022-09-08T18:12:27,767 - INFO - [bookkeeper-io-18-1:PerChannelBookieClient@1245] - Disconnected from bookie channel [id: 0x5b9f1585, L:/127.0.0.1:65535 ! R:/127.0.0.1:65522] 2022-09-08T18:12:27,767 - ERROR - [bookie-io-13-1:BookieRequestHandler@68] - Unhandled exception occurred in I/O thread or handler on [id: 0xbf0f987a, L:/127.0.0.1:65522 ! R:/127.0.0.1:65535] io.netty.channel.socket.ChannelOutputShutdownException: Channel output shutdown at io.netty.channel.AbstractChannel$AbstractUnsafe.shutdownOutput(AbstractChannel.java:650) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:953) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushNow(FlushConsolidationHandler.java:204) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushIfNeeded(FlushConsolidationHandler.java:197) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.resetReadAndFlushIfNeeded(FlushConsolidationHandler.java:192) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.channelReadComplete(FlushConsolidationHandler.java:145) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:390) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelReadComplete(DefaultChannelPipeline.java:1415) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelReadComplete(DefaultChannelPipeline.java:925) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:171) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at java.lang.Thread.run(Thread.java:833) ~[?:?] Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.UnpooledHeapByteBuf.nioBuffer(UnpooledHeapByteBuf.java:305) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.nioBuffer(AbstractDerivedByteBuf.java:122) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.internalNioBuffer(AbstractDerivedByteBuf.java:117) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelOutboundBuffer.nioBuffers(ChannelOutboundBuffer.java:447) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:399) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:931) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] ... 26 more 2022-09-08T18:12:27,768 - WARN - [bookie-io-13-1:ReferenceCountUtil@118] - Failed to release a message: UnpooledDuplicatedByteBuf(freed) io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1 at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:147) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.release0(AbstractDerivedByteBuf.java:98) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.buffer.AbstractDerivedByteBuf.release(AbstractDerivedByteBuf.java:94) ~[netty-buffer-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:90) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.ReferenceCountUtil.safeRelease(ReferenceCountUtil.java:116) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:306) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.closeOutboundBufferForShutdown(AbstractChannel.java:672) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.shutdownOutput(AbstractChannel.java:666) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:953) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushNow(FlushConsolidationHandler.java:204) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.flushIfNeeded(FlushConsolidationHandler.java:197) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.resetReadAndFlushIfNeeded(FlushConsolidationHandler.java:192) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.handler.flush.FlushConsolidationHandler.channelReadComplete(FlushConsolidationHandler.java:145) ~[netty-handler-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:390) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelReadComplete(DefaultChannelPipeline.java:1415) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:410) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:397) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelReadComplete(DefaultChannelPipeline.java:925) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:171) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.77.Final.jar:4.1.77.Final] at java.lang.Thread.run(Thread.java:833) ~[?:?] 2022-09-08T18:12:27,769 - INFO - [bookie-io-13-1:BookieRequestHandler@59] - Channels disconnected: [id: 0xbf0f987a, L:/127.0.0.1:65522 ! R:/127.0.0.1:65535] 2022-09-08T18:12:27,770 - ERROR - [BookKeeperClientWorker-OrderedExecutor-0-0:ReadLastConfirmedOp@151] - While readLastConfirmed ledger: 0 did not hear success responses from all quorums, QuorumCoverage(e:1,w:1,a:1) = [-8] bk returned code -10 : Error while recovering ledger error

Looks like the await is conflict with Consolidate . I found if setNumHighPriorityWorkerThreads > 0, the test will pass. I suspect there has a deadlock but I haven't found other evidence. Do you have some ideas?

hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 7, 2022
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Nov 7, 2022
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this pull request Jan 11, 2023
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants