Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerGroup.handlePeerDeath() + BlockingClient.socket.close() called twice #1765

Open
oscarguindzberg opened this issue Apr 1, 2019 · 6 comments

Comments

@oscarguindzberg
Copy link
Contributor

oscarguindzberg commented Apr 1, 2019

When using BlockingClient instances, PeerGroup.handlePeerDeath() will be called twice if there is a connectivity problem during connection setup to a peer.

  1. BlockingClient constructor creates a thread. That thread has a finally statement. Assuming a connection problem, an exception will be thrown and the finally statement will be executed. That calls connection.connectionClosed(), i.e Peer.connectionClosed(). That calls listener.onPeerDisconnected. One of the listeners is PeerGroup.PeerStartupListener. That calls PeerGroup.handlePeerDeath().

  2. At the same time, Peer extends AbstractTimeoutHandler. When the peer is created, it is configured to issue a timeout if version ack msg was not received after N time. That calls Peer.timeoutOccurred() too which eventually ends up calling PeerGroup.handlePeerDeath().

I have a suspect in the same scenario BlockingClient's socket might be attempted to be closed twice.

  1. BlockingClient constructor created thread, connection problem, finally statement, socket.close(). There is a catch that ignores the exception if socket.close() fails.
  2. Peer.timeoutOccurred(), PeerSocketHandler.timeoutOccurred(), PeerSocketHandler.close(), writeTarget.closeConnection() i.e. BlockingClient.closeConnection(), socket.close(). There is a catch that throws an exception if socket.close() fails.

There is a race condition between BlockingClient's socket timeout (defaults to BlockingClientManager.connectTimeoutMillis i.e. 1 second) and Peer version ack timeout (default to PeerGroup.DEFAULT_CONNECT_TIMEOUT_MILLIS i.e. 5 seconds )

@oscarguindzberg
Copy link
Contributor Author

@schildbach would you confirm my analysis is right?

@schildbach
Copy link
Member

I've never looked at how the ClientConnectionManagers work. I think PeerGroup has used NioClientManager for ages. Is there a specific reason you're using BlockingClientManager?

@schildbach
Copy link
Member

Ah, I guess Tor is the reason?

@oscarguindzberg
Copy link
Contributor Author

yes, Tor is the reason.

@schildbach
Copy link
Member

schildbach commented Jul 22, 2019

Hmm, I'm not sure how to continue with this.

BlockingClient(Manager) was mostly written by @TheBlueMatt via 534cec9, but I don't expect him to return to look into such issues. Also, judging by his commit description he was mostly focused on NIO and blocking I/O was just a byproduct (or maybe even just a leftover from the ancient Netty code).

I guess we will need someone who takes care of the networking code. Maybe explore ways to run Tor over NIO? Or maybe switch over to okio?

@oscarguindzberg
Copy link
Contributor Author

I am taking some time off so I won't follow up on this subject in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants