Pre-check
Search before asking
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo Java 3.3 (also affects 3.2.x). Netty 4.1.x.
Steps to reproduce this issue
Scenario: Provider has a broken Netty dependency (e.g., incompatible netty-buffer version causing NoClassDefFoundError: Could not initialize class io.netty.buffer.PooledUnsafeDirectByteBuf).
- Consumer connects to Provider — TCP three-way handshake succeeds (handled by OS kernel, unaffected by the Netty bug).
NettyClient.doConnect() only waits for TCP handshake completion, so it considers the connection successful. DubboInvoker is created and added to validInvokers.
- TLS handshake begins asynchronously — Consumer sends
ClientHello.
- Provider's Netty read loop tries to allocate a
ByteBuf to read the incoming data → NoClassDefFoundError is thrown.
- Netty's
NioByteUnsafe.handleReadException() fires pipeline.fireExceptionCaught(cause) but does not close the channel (Netty only auto-closes for IOException or OutOfMemoryError).
- The exception reaches
SslServerTlsHandler.exceptionCaught(), which only logs the error — it neither closes the channel nor propagates the exception.
- The channel remains TCP-active but is completely non-functional at the application layer.
- Consumer's
DubboInvoker.isAvailable() returns true (it only checks channel.isActive()), so the invoker is never removed from validInvokers.
- All RPC requests routed to this Provider time out after 10 seconds.
What you expected to happen
When SslServerTlsHandler.exceptionCaught() is invoked, the channel should be closed (via ctx.close()), just like the userEventTriggered() method in the same class already does on TLS handshake failure. This would allow:
- The Consumer to detect
channelInactive → isConnected()=false → isAvailable()=false
- Dubbo's
addInvalidateInvoker mechanism to remove the broken invoker from validInvokers
- The self-healing loop to work as designed
Current behavior of SslServerTlsHandler.exceptionCaught() (line 60-68):
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
logger.error(INTERNAL_ERROR, "unknown error in remoting module", "",
"TLS negotiation failed when trying to accept new connection.", cause);
// BUG: no ctx.close() and no ctx.fireExceptionCaught(cause)
// The exception is silently swallowed, channel stays open but broken
}
Compare with userEventTriggered() in the same class (line 81-89), which correctly closes the channel:
} else {
logger.error(INTERNAL_ERROR, "", "",
"TLS negotiation failed when trying to accept new connection.",
handshakeEvent.cause());
ctx.close(); // ← correctly closes the channel
}
Similarly, SslClientTlsHandler.userEventTriggered() on the Consumer side fires ctx.fireExceptionCaught() on TLS failure but does not close the channel, which can also lead to half-open connections.
Anything else
Root cause analysis:
The exception propagation chain breaks at SslServerTlsHandler.exceptionCaught():
Netty read loop: allocate ByteBuf → NoClassDefFoundError
↓
NioByteUnsafe.handleReadException() → pipeline.fireExceptionCaught(cause)
(Netty does NOT auto-close: NoClassDefFoundError is not IOException/OutOfMemoryError)
↓
SslServerTlsHandler.exceptionCaught() → logs error, BUT:
✗ Does NOT call ctx.close()
✗ Does NOT call ctx.fireExceptionCaught(cause)
→ Exception is silently swallowed
→ Channel remains TCP-active but application-dead
↓
NettyServerHandler.exceptionCaught() → NEVER reached (exception stopped above)
↓
Consumer side: channel still active → isAvailable()=true → invoker never removed
→ Continuous timeout on every RPC call routed to this Provider
This is not limited to NoClassDefFoundError — any non-IOException/non-OutOfMemoryError exception during the Netty read loop would trigger the same behavior, leaving the channel in a zombie state.
Are you willing to submit a pull request to fix on your own?
Code of Conduct
Pre-check
Search before asking
Apache Dubbo Component
Java SDK (apache/dubbo)
Dubbo Version
Dubbo Java 3.3 (also affects 3.2.x). Netty 4.1.x.
Steps to reproduce this issue
Scenario: Provider has a broken Netty dependency (e.g., incompatible
netty-bufferversion causingNoClassDefFoundError: Could not initialize class io.netty.buffer.PooledUnsafeDirectByteBuf).NettyClient.doConnect()only waits for TCP handshake completion, so it considers the connection successful.DubboInvokeris created and added tovalidInvokers.ClientHello.ByteBufto read the incoming data →NoClassDefFoundErroris thrown.NioByteUnsafe.handleReadException()firespipeline.fireExceptionCaught(cause)but does not close the channel (Netty only auto-closes forIOExceptionorOutOfMemoryError).SslServerTlsHandler.exceptionCaught(), which only logs the error — it neither closes the channel nor propagates the exception.DubboInvoker.isAvailable()returnstrue(it only checkschannel.isActive()), so the invoker is never removed fromvalidInvokers.What you expected to happen
When
SslServerTlsHandler.exceptionCaught()is invoked, the channel should be closed (viactx.close()), just like theuserEventTriggered()method in the same class already does on TLS handshake failure. This would allow:channelInactive→isConnected()=false→isAvailable()=falseaddInvalidateInvokermechanism to remove the broken invoker fromvalidInvokersCurrent behavior of
SslServerTlsHandler.exceptionCaught()(line 60-68):Compare with
userEventTriggered()in the same class (line 81-89), which correctly closes the channel:Similarly,
SslClientTlsHandler.userEventTriggered()on the Consumer side firesctx.fireExceptionCaught()on TLS failure but does not close the channel, which can also lead to half-open connections.Anything else
Root cause analysis:
The exception propagation chain breaks at
SslServerTlsHandler.exceptionCaught():This is not limited to
NoClassDefFoundError— any non-IOException/non-OutOfMemoryErrorexception during the Netty read loop would trigger the same behavior, leaving the channel in a zombie state.Are you willing to submit a pull request to fix on your own?
Code of Conduct