Add P2PClientSupervisor #4509

solo1g · 2022-07-14T16:51:15Z

Fixes logging for errors thrown in P2PClientActor.
The way P2PClient awaits response for a message like getheaders is that I send an ExpectResponseCommand to the node. So, in the case of headers, 2 messages would be sent consecutively, first getheaders then an ExpectResponseCommand that schedules a timeout function to be executed sometime in the future. Problem here was that sometimes a response from the peer may be received in between these 2 messages that we send. I noticed that, especially in the case of tests, sometimes response for getheaders is sent before the ExpectResponseCommand is sent. This is fixed now, the timer is started before even sending a getheaders so no messages are missed.
Second was with disconnecting peers, so that now we ignore messages once we have started disconnecting a peer.
Third would be the use of Await at some places. I had a quite big misunderstanding wrt foreach that cleared up with your earlier comments on this. Now I'm following the function signatures properly. So Await was introduced in postStop of Actor as that returned a Unit and schedulers or runnables that I'm using as they too expect a function with a Unit return type.

node/src/main/scala/org/bitcoins/node/networking/P2PClient.scala

node/src/main/scala/org/bitcoins/node/PeerFinder.scala

node/src/main/scala/org/bitcoins/node/PeerManager.scala

solo1g · 2022-08-02T10:50:20Z

I'll be looking into the failed compact filter header sync mac test and fix it in another PR. This PR, concerned with adding a supervisor, should be ready now.

Christewart

Overall good job, this is looking much more async safe! 🎉

Christewart · 2022-08-03T17:50:13Z

node/src/main/scala/org/bitcoins/node/networking/P2PClient.scala

      case Terminated(actor) if actor == peerConnection =>
        reconnect()
    }

+  private def ignoreNetworkMessages(


Can you add a scaladoc of why we would want to ignore certain network messages?

So what happens if we receive a normal network message when we were expecting a specific message.

For instance,

I send an inventory message to request a bitcoin transaction

A bitcoin block gets mined, and I receive a spontaneous blockmsg

My peer sends me the txmsg with the transaction i requested in (1).

Does 2 (the block message) get dropped on the floor here?

Not all requests expect a response, it's only for getheaders, getcfilters, getcfheaders atm. But I get what you want to ask, in such a case both are processed. But only a matching response cancels the timer, so suppose I asked for headers and got a block, it will process the block and keep on waiting for headers.

As for why and when we ignore messages, this only happens when we have started a disconnection from the peer. initializeDisconnect does not stop the actor in one actor message, it first sends a Tcp.Close to the peerConnection which when complete is intercepted by the actor and then the actor is stopped. The relevant part is that actor may process a message if its in the queue even while its disconnecting and that was causing some issues. We do not at all ignore any message in the usual operation.

Christewart · 2022-08-03T17:51:45Z

node/src/main/scala/org/bitcoins/node/networking/P2PClient.scala

-  def handleExpectResponse(msg: NetworkPayload): Unit = {
-    currentPeerMsgHandlerRecv =
-      currentPeerMsgHandlerRecv.handleExpectResponse(msg)
+  def handleExpectResponse(msg: NetworkPayload): Future[Unit] = {


Can you add a scaladoc to this with an example of an expected response, and what we do when we did not expect a response (I presume call ignoreNetworkMessages)

node/src/main/scala/org/bitcoins/node/networking/peer/PeerMessageReceiver.scala

Christewart · 2022-08-03T18:11:03Z

node/src/main/scala/org/bitcoins/node/PeerFinder.scala

@@ -130,17 +131,20 @@ case class PeerFinder(
            .filterNot(p => skipPeers().contains(p) || _peerData.contains(p))

          logger.debug(s"Trying next set of peers $peers")
-          peers.foreach(tryPeer)
+          val peersF = Future.sequence(peers.map(tryPeer))
+          Await.result(peersF, 10.seconds)


Is there any benefit here to using Await.result compared to just mapping on peersF like so

peersF.failed.foreach(err => logger.error(s"Failed to connect to all peers,err"))

What is the significance of the 10.seconds timeout?

In my solution it doesn't block the thread at least, and I believe there is anything dependent on peersF being completed in 10 seconds?

Nothing special about 10.seconds, Await needed a duration and 10.second is a generous one. But you are correct, there's nothing dependent on it.

As for why Await, the idea was that otherwise this block of code would be completed and scheduler will start the next countdown while things are still happening. With Await, the next countdown would only start once the future is done. ~~Runnable run in their own thread~~ so I felt using Await is good enough.

I'm, not really sure what would be better and to be fair, I think it doesn't really matter, both will do just fine. I will make this change, so that we are clearly sure that any Await that we have in our code base as far as node is concerned, is in an actor.

As for why Await, the idea was that otherwise this block of code would be completed and scheduler will start the next countdown while things are still happening. With Await, the next countdown would only start once the future is done. Runnable run in their own thread so I felt using Await is good enough.

This is a good point. Maybe we should add

val isConnectionSchedulerRunning = new AtomicBoolean(false) system.scheduler.scheduleWithFixedDelay( initialDelay = initialDelay, delay = nodeAppConfig.tryNextPeersInterval) { () => if (isConnectionSchedulerRunning.compareAndSet(false,true)) { logger.debug(s"Cache size: ${_peerData.size}. ${_peerData.keys}") if (_peersToTry.size < 32) _peersToTry.pushAll(getPeersFromDnsSeeds) val peers = (for { _ <- 1 to 32 } yield _peersToTry.pop()).distinct .filterNot(p => skipPeers().contains(p) || _peerData.contains(p)) logger.debug(s"Trying next set of peers $peers") val peersF = Future.sequence(peers.map(tryPeer)) peersF.onComplete { case Success(_) => isConnectionSchedulerRunning.set(false) case Failure(err) => isConnectionSchedulerRunning.set(false) logger.error(s"Failed to connect to peers", err) } } else { logger.warn(s"Previous connection scheduler is still running, skipping this run, it will run again in ${nodeAppConfig.tryNextPeersInterval}") } }

Note, the Scala compiler interprets anonymous functions as a Runnable if the type signature () => Unit so you don't have to do the overhead of new Runnable { ... }

node/src/main/scala/org/bitcoins/node/PeerFinder.scala

Christewart · 2022-08-03T18:13:57Z

node/src/main/scala/org/bitcoins/node/PeerFinder.scala


-    AsyncUtil
+    val waitStopF = AsyncUtil
      .retryUntilSatisfied(_peerData.isEmpty,


Why wait for _peerData.isEmpty, is it not good enough for closeF future to be completed?

So peerData map is such that the keys correspond to a running actor. The close called here has a return type of Unit and just sends a message to the client to stop the actor along with properly changing the state of PeerMessageReceiver. The actor actually stops sometime in the future, and the key-value pair is deleted from the map in the postStop callback of actors, so this is just an additional check to ensure that all actors have actually stopped.

Christewart · 2022-08-04T13:24:57Z

These CI failures will be fixed in #4565

https://github.com/bitcoin-s/bitcoin-s/runs/7672367084?check_suite_focus=true#step:5:3640

Christewart · 2022-08-04T13:27:29Z

node/src/main/scala/org/bitcoins/node/networking/P2PClient.scala

+    * Currently, such a situation is not meant to happen.
+    */
+  def handleExpectResponse(msg: NetworkPayload): Future[Unit] = {
+    require(msg.isInstanceOf[ExpectsResponse],


nit: It's always nice to put the unexpected thing you receive in the error message, i.e.

require(msg.isInstanceOf[ExpectsResponse], "Tried to wait for response to message which is not a query, got=$msg")

Christewart · 2022-08-04T13:36:14Z

node/src/main/scala/org/bitcoins/node/PeerFinder.scala

@@ -130,17 +131,20 @@ case class PeerFinder(
            .filterNot(p => skipPeers().contains(p) || _peerData.contains(p))

          logger.debug(s"Trying next set of peers $peers")
-          peers.foreach(tryPeer)
+          val peersF = Future.sequence(peers.map(tryPeer))
+          Await.result(peersF, 10.seconds)


As for why Await, the idea was that otherwise this block of code would be completed and scheduler will start the next countdown while things are still happening. With Await, the next countdown would only start once the future is done. Runnable run in their own thread so I felt using Await is good enough.

This is a good point. Maybe we should add

val isConnectionSchedulerRunning = new AtomicBoolean(false) system.scheduler.scheduleWithFixedDelay( initialDelay = initialDelay, delay = nodeAppConfig.tryNextPeersInterval) { () => if (isConnectionSchedulerRunning.compareAndSet(false,true)) { logger.debug(s"Cache size: ${_peerData.size}. ${_peerData.keys}") if (_peersToTry.size < 32) _peersToTry.pushAll(getPeersFromDnsSeeds) val peers = (for { _ <- 1 to 32 } yield _peersToTry.pop()).distinct .filterNot(p => skipPeers().contains(p) || _peerData.contains(p)) logger.debug(s"Trying next set of peers $peers") val peersF = Future.sequence(peers.map(tryPeer)) peersF.onComplete { case Success(_) => isConnectionSchedulerRunning.set(false) case Failure(err) => isConnectionSchedulerRunning.set(false) logger.error(s"Failed to connect to peers", err) } } else { logger.warn(s"Previous connection scheduler is still running, skipping this run, it will run again in ${nodeAppConfig.tryNextPeersInterval}") } }

Note, the Scala compiler interprets anonymous functions as a Runnable if the type signature () => Unit so you don't have to do the overhead of new Runnable { ... }

Christewart

Great job @shreyanshyad 🎉

solo1g force-pushed the add-supervisor branch from 057cd76 to fcec3da Compare July 14, 2022 17:25

Christewart added the node work for the node project label Jul 14, 2022

Christewart added this to the 2.0 milestone Jul 14, 2022

Christewart requested changes Jul 14, 2022

View reviewed changes

node/src/main/scala/org/bitcoins/node/networking/P2PClient.scala Outdated Show resolved Hide resolved

solo1g force-pushed the add-supervisor branch from 9b71038 to 487571d Compare July 15, 2022 10:44

solo1g requested a review from Christewart July 15, 2022 11:55

Christewart requested changes Jul 16, 2022

View reviewed changes

solo1g force-pushed the add-supervisor branch 2 times, most recently from 48be02f to b88ed90 Compare July 21, 2022 14:27

solo1g force-pushed the add-supervisor branch 2 times, most recently from 6e4d53e to d2159e6 Compare August 2, 2022 08:29

solo1g requested a review from Christewart August 2, 2022 10:42

solo1g mentioned this pull request Aug 2, 2022

Neutrino Node stalls on syncing #4511

Closed

Christewart requested changes Aug 3, 2022

View reviewed changes

solo1g requested a review from Christewart August 4, 2022 13:01

Christewart requested changes Aug 4, 2022

View reviewed changes

solo1g added 5 commits August 4, 2022 19:35

add P2PClientSupervisor

4ac93a7

changes from comments: made P2PClient Future

f4cc278

empty commit to see if mac failures are consistent on ci

a0807e0

changes from comments

bd59e3f

changes from comments

37ab92d

solo1g force-pushed the add-supervisor branch from 818f036 to 37ab92d Compare August 4, 2022 14:06

solo1g requested a review from Christewart August 4, 2022 14:33

Christewart approved these changes Aug 4, 2022

View reviewed changes

Christewart merged commit c4d3580 into bitcoin-s:master Aug 4, 2022

Christewart modified the milestones: 2.0, 1.9.3 Aug 6, 2022

Christewart modified the milestones: 1.9.2, 2.0, 1.9.3 Aug 6, 2022

Christewart added this to In Progress in 1.9.3 via automation Aug 6, 2022

Christewart moved this from In Progress to Done in 1.9.3 Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add P2PClientSupervisor #4509

Add P2PClientSupervisor #4509

solo1g commented Jul 14, 2022 •

edited by Christewart

solo1g commented Aug 2, 2022

Christewart left a comment

Christewart Aug 3, 2022

Christewart Aug 3, 2022

solo1g Aug 4, 2022

solo1g Aug 4, 2022

Christewart Aug 3, 2022

Christewart Aug 3, 2022

solo1g Aug 4, 2022 •

edited

Christewart Aug 4, 2022 •

edited

Christewart Aug 3, 2022

solo1g Aug 4, 2022

Christewart commented Aug 4, 2022

Christewart Aug 4, 2022

Christewart Aug 4, 2022 •

edited

Christewart left a comment

Add P2PClientSupervisor #4509

Add P2PClientSupervisor #4509

Conversation

solo1g commented Jul 14, 2022 • edited by Christewart

solo1g commented Aug 2, 2022

Christewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

solo1g Aug 4, 2022 • edited

Choose a reason for hiding this comment

Christewart Aug 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Christewart commented Aug 4, 2022

Choose a reason for hiding this comment

Christewart Aug 4, 2022 • edited

Choose a reason for hiding this comment

Christewart left a comment

Choose a reason for hiding this comment

solo1g commented Jul 14, 2022 •

edited by Christewart

solo1g Aug 4, 2022 •

edited

Christewart Aug 4, 2022 •

edited

Christewart Aug 4, 2022 •

edited