New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find and switch peers #4408
Find and switch peers #4408
Conversation
ea29d12
to
6a14c00
Compare
So this should be working as intended now. I will leave comments on all the changes before taking the draft off. Meanwhile, this could be tested by running with the following config
|
changes timeouts, fix issue with ipv6 dns seeds, initDisconnect when disconnected bug, dbPeers order fix
@@ -166,13 +166,27 @@ object GetDataMessage extends Factory[GetDataMessage] { | |||
GetDataMessage(Seq(inventory)) | |||
} | |||
|
|||
sealed trait ExpectsResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I needed a way to if the peer failed to respond to a query for eg getheaders
etc that is used in sync, so that I can switch to a different peer and continue sync from there. For NetworkPayload
that extends this, an ExpectReponse
message is sent to the P2PClient
right after sending the message. The message initiates a timer in PeerMessageReceiver
that notifies if the message timed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this common when testing? This is an interesting blog post about the bitcoin p2p network possibly slowing down: https://blog.lopp.net/is-bitcoin-network-slowing-down/
it details the logic bitcoin core has about detecting peers that are stalling, probably worth a read. It seems the behavior detailed there is what you experienced as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting 👀 .
Not very common but did see this a few times. Instead of being connected and not responding, what I found more common was for the peer to straight up disconnect us after initialization. Reconnection would be successful followed by the same behavior.
Current behavior is for timeouts to complete instantly and requery another peer (same if that's all we have) in case of a disconnection and wait for the whole duration in case its still connected. This covers both cases.
f <- isAllDisconnectedF(node) | ||
} yield f | ||
disconnF.map(assert(_)) | ||
def peerManager = node.peerManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did away with using indexes and added check to ensure we have 2 peers.
node-test/src/test/scala/org/bitcoins/node/UnsyncedNeutrinoNodeTest.scala
Show resolved
Hide resolved
} | ||
|
||
//todo: so the disconnection should rather originate from the remote client rather than our side | ||
ignore must "sync with another second peer if the first one is disconnected" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconnect after stopping was throwing some exception about a closed port. Looking into it, didn't want to block review on minor stuff so marked to ignore test at the moment,
node/src/main/resources/postgresql/node/migration/V3__peer_table.sql
Outdated
Show resolved
Hide resolved
@@ -26,7 +26,7 @@ case class NeutrinoNode( | |||
nodeConfig: NodeAppConfig, | |||
chainConfig: ChainAppConfig, | |||
actorSystem: ActorSystem, | |||
configPeersOverride: Vector[Peer] = Vector.empty) | |||
paramPeers: Vector[Peer] = Vector.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't override the config ones anymore, rather use both. This was one of the suggestion earlier so added in this.
val sendCompactFilterHeaderMsgF = { | ||
peerManager.randomPeerMsgSenderWithCompactFilters | ||
syncPeerMsgSender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So unless the peers are all ideal, can't really use a random one, besides without parallel sync, not much point in switching. The previous change too was by me, so fixed it now. Would add this later with properly working parallel sync.
} | ||
isInitializedF | ||
def send(msg: NetworkPayload, peer: Peer): Future[Unit] = { | ||
peerManager.peerData(peer).peerMessageSender.sendMsg(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alll of this is now in PeerManager
onReconnect = node.sync | ||
onReconnect = node.peerManager.onReconnect, | ||
onStop = node.peerManager.onP2PClientStopped, | ||
maxReconnectionTries = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed max attempts to 4 as this would roughly be the time that we spend on each peer waiting for it to initialize. Ig all of these related constant need to be put together.
extends P2PLogger { | ||
|
||
/** Returns peers by querying each dns seed once. These will be IPv4 addresses. */ | ||
def getPeersFromDnsSeeds: Vector[Peer] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this from manager to a new Finder
to separate things.
logger.debug(s"Removing persistent peer $peer") | ||
val client = peerData(peer).client | ||
_peerData.remove(peer) | ||
//so we need to remove if from the map for connected peers so no more request could be sent to it but we before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we don't want a peer as one of our persistent ones, we remove it from the peerData
map, but still keep it around to check if the client actor for it actually stopped.
I was able to sync correctly on c5f0f9c with the default out of the box settings 🎉 . I'm going to start playing with the various settings introduced in this PR but wanted to document that things do seem to work out of the box. |
Sync worked correctly for me on c5f0f9c with these settings
|
Things work on c5f0f9c for me with IBD and running a few hours with the following settings
IIUC what the logs means, it should be reworded to something about disconnecting. Also does this mean they disconnected from us or does it mean we disconnected from them?
|
Was this common? Ideally, you should not be seeing this. This basically means somewhere down the line a peer got "lost" from the Edit: Fixed this. This happened when one of the peers while it still is initalizing is received again in an addr message and another actor is created for it while still having only one entry in peerData. |
* fix node test shared fixtures bug The cached bitcoind fixtures were used and stopped in UnsyncedNeutrinoNodeTest which causes an error if NeutrinoNodeTest is run at the same time on high performant systems, which is why it escaped CI. Merges NeutrinoNodeTest and UnsyncedNeutrinoNodeTest * fix possible issues with PeerMessageReceiverTest This reverts commit 55e7caf. * fix filter sync issue when wallet creation time indicates already synced * move switch peer test to NeutrinoNodeWithUnachedBitcoindTest
Should do the following:
Key changes
PeerFinder
which deals with peer discovery. Moved relevant parts fromPeerManager
toPeerFinder
.Initializing
and a new stateWaiting
inPeerMessageReceiver
.InitializeDisconnect
state to now disconnect when connecting too. AddedStopReconnect
state toPeerMessageReceiver
to stop reconnect from our side when not connected.P2PClient
andPeerMessageReceiver
.