Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Race condition while instantiating new NioTcpMessageChannel causing silent Packet drops subsequently #67
The NioTcpMessageProcessor.createMessageChannel() method has unguarded Critical section which can cause problem when multiple MSS Executor threads try to create new NioTcpMessageChannel instance at same moment.
This method internally calls NioTcpMessageChannel constructor which further creates a new SocketChannel through NIOHandler.createOrReuseSocket() method. The socket creation logic in NIOHandler.createOrReuseSocket() is threadsafe; so when multiple threads calls NioTcpMessageProcessor.createMessageChannel() at same instant, only 1 socketChannel is created which gets mapped to multiple instances of NioTcpMessageChannel.
It has been observed that this simultaneous attempt to create NioTcpMessageChannel instance by multiple threads can cause different in-compatible mappings to be added to ConnectionOrientedMessageProcessor.messageChannels and NioTcpMessageChannel.channelMap structures (which has been observed to happen 1 out of every 7-8 times with multiple parallel threads), which further on leads to packet drops destined to that target once the SocketChannel gets cleaned up on receipt of TCP FIN packet (leading to IOException on read operation). In such a scenario, on cleanup of SocketChannel, the socket is closed and removed from NioHandler/NioTcpMessageChannel structures as well but its entry in ConnectionOrientedMessageProcessor.messageChannels structure is not cleaned up due to following additional check in ConnectionOrientedMessageProcessor.remove() method
if (messageChannels.get(key) == messageChannel)
This causes a invalid NioTcpMessageChannel instance to stay in messageChannels map with its SocketChannel instance in close state. When next connection attempt is made to same target (with same key used to lookup messageChannels), the dangling NioTcpMessageChannel instance is used for NIO operation and causes a Dead socket operation and cleanedup. Then a new socket is created for the target but it somehow fails with null return value on call to change.socket.keyFor(selector) in ConnectionOrientedMessageProcessor.ProcessorTask.run() method. This causes failure to send any further packets to that tcp target and no error reported at all.
I am using MSS 2.0.0 with tomcat 7 wherein this issue was observed, but this issue would be present in master as well as the relevant code blocks have no apparant fix added.
I am also attaching the logfile with filtered statements for these 3 main classes where this issue was observed. There are 2 MSS threads trying to create new NioTcpMessageChannel instances:
Uploaded logs for classes NioTcpMessageChannel and NioTcpMessageProcessor and NIOHandler