New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition while instantiating new NioTcpMessageChannel causing silent Packet drops subsequently #67

Closed
gkumar78 opened this Issue Mar 2, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@gkumar78

gkumar78 commented Mar 2, 2016

The NioTcpMessageProcessor.createMessageChannel() method has unguarded Critical section which can cause problem when multiple MSS Executor threads try to create new NioTcpMessageChannel instance at same moment.

This method internally calls NioTcpMessageChannel constructor which further creates a new SocketChannel through NIOHandler.createOrReuseSocket() method. The socket creation logic in NIOHandler.createOrReuseSocket() is threadsafe; so when multiple threads calls NioTcpMessageProcessor.createMessageChannel() at same instant, only 1 socketChannel is created which gets mapped to multiple instances of NioTcpMessageChannel.

It has been observed that this simultaneous attempt to create NioTcpMessageChannel instance by multiple threads can cause different in-compatible mappings to be added to ConnectionOrientedMessageProcessor.messageChannels and NioTcpMessageChannel.channelMap structures (which has been observed to happen 1 out of every 7-8 times with multiple parallel threads), which further on leads to packet drops destined to that target once the SocketChannel gets cleaned up on receipt of TCP FIN packet (leading to IOException on read operation). In such a scenario, on cleanup of SocketChannel, the socket is closed and removed from NioHandler/NioTcpMessageChannel structures as well but its entry in ConnectionOrientedMessageProcessor.messageChannels structure is not cleaned up due to following additional check in ConnectionOrientedMessageProcessor.remove() method

if (messageChannels.get(key) == messageChannel)
this.messageChannels.remove(key);

This causes a invalid NioTcpMessageChannel instance to stay in messageChannels map with its SocketChannel instance in close state. When next connection attempt is made to same target (with same key used to lookup messageChannels), the dangling NioTcpMessageChannel instance is used for NIO operation and causes a Dead socket operation and cleanedup. Then a new socket is created for the target but it somehow fails with null return value on call to change.socket.keyFor(selector) in ConnectionOrientedMessageProcessor.ProcessorTask.run() method. This causes failure to send any further packets to that tcp target and no error reported at all.

I am using MSS 2.0.0 with tomcat 7 wherein this issue was observed, but this issue would be present in master as well as the relevant code blocks have no apparant fix added.

I am also attaching the logfile with filtered statements for these 3 main classes where this issue was observed. There are 2 MSS threads trying to create new NioTcpMessageChannel instances:

  1. Instance gov.nist.javax.sip.stack.NioTcpMessageChannel@591cac4b is created by MSS-Executor-Thread-28 and saved in ConnectionOrientedMessageProcessor.messageChannels with key tcp:10.152.151.162:55080
  2. Instance gov.nist.javax.sip.stack.NioTcpMessageChannel@69739da1 created by MSS-Executor-Thread-29 and saved in NioTcpMessageChannel.channelMap against the key of created socketChannel. Hence, this instance is used for every read & write on the socket.
@gkumar78

This comment has been minimized.

gkumar78 commented Mar 2, 2016

logs_race_condition_niotcpmessagechannel_creation.txt

Uploaded logs for classes NioTcpMessageChannel and NioTcpMessageProcessor and NIOHandler

@deruelle deruelle added the bug label Mar 2, 2016

@deruelle deruelle added this to the 2.0.0 milestone Mar 2, 2016

@deruelle

This comment has been minimized.

Member

deruelle commented Mar 2, 2016

Thanks we will look into it as time permits. Let us know if you would like to contribute a fix/ pull request for that with non regression test, that would help a lot in taking care of it.

@gkumar78

This comment has been minimized.

gkumar78 commented Mar 4, 2016

Submitted pull request #69 with simple fix.

Used Double checked locking to avoid thread blocking for normal flow i.e. when there is already a NioTcpMessageChannel present for destination tcp target

@deruelle

This comment has been minimized.

Member

deruelle commented Mar 4, 2016

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment