Skip to content

peer/server: Split handshake to synchronous func.#3632

Open
davecgh wants to merge 6 commits intodecred:masterfrom
davecgh:peer_sync_handshake
Open

peer/server: Split handshake to synchronous func.#3632
davecgh wants to merge 6 commits intodecred:masterfrom
davecgh:peer_sync_handshake

Conversation

@davecgh
Copy link
Member

@davecgh davecgh commented Mar 3, 2026

This requires #3631.

The current design where the handshake happens asynchronously when the async I/O is started is less than ideal and is quite brittle. It also significantly complicates everything as evidenced by several minor bugs over the years that have resulted from faulty assumptions which directly stem from its asynchronous nature.

For an example of some of the complexity it causes, it means that a bunch of additional flags are required that solely related to the handshake. Namely, whether or not the version if known, whether the verack has been received, and whether the handshake is done. Then, because it's all happening asynchronously, later code has to be vigilant about checking that those events have happened.

All of this complexity can entirely be avoided by simply requiring a successful synchronous handshake to take place prior to starting async I/O.

With that in mind, this significantly reworks the way the handshake is handled so that happens via a separate blocking method and removes async handlers which are no longer required as a result.

The changes have been split in a series a commits to ease the review. Each commit fully compiles, passes all tests, and describes the changes in detail.

The following is a high level overview of the changes:

  • Introduce programmatically detectable errors consistent with other code throughout the repository
  • Accept connection in constructors instead of a separate AssociateConnection
  • Move the handshake code to a separate blocking method named Handshake that accepts a callback to invoke with the
    received version message
    • The new method returns an error that callers can use to reliably detect a failed handshake
    • The callback can return an error to cause the handshake to fail and pass the error along to the caller
  • Make the initial handshake block until both the version and verack message are received
  • Any further received version or verack messages in the async I/O handlers are now unconditionally an error
  • Removes the OnVersion and OnVerAck async listeners that no longer apply
  • Removes the VersionKnown, VerAckReceived, and HandshakeDone methods and associated internal fields that no longer apply
  • Updates the calling server code thread the overall process context down to the Handshake and Run methods
  • Adds several additional tests for correctness
  • Updates the example to clearly show the new semantics
  • Includes extra documentation to elucidate the exact requirements for establishing a new peer as well as exactly which properties the caller can and can't rely on during the handshake
  • Other docs and test cleanup to help make them a little more modern

@davecgh davecgh added this to the 2.2.0 milestone Mar 3, 2026
@davecgh davecgh force-pushed the peer_sync_handshake branch 3 times, most recently from 147855d to 6bdd445 Compare March 4, 2026 01:36
@davecgh davecgh force-pushed the peer_sync_handshake branch 2 times, most recently from 488cb21 to 1daea09 Compare March 6, 2026 18:29
return nil, 0, err
}
return hash, 234439, nil
// Repeat, but in the other direction so the outbound peer has the error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the benefit of repeating? Isn't this just testing the same code path twice?

Copy link
Member Author

@davecgh davecgh Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is calling the same funcs, but in a different order. That isn't at all obvious, especially at this point in the sequence of commits since the handshake is all happening asynchronously. I think maybe it is more obvious in a later commit when it switches over to the synchronous handshake.

The outbound peer always goes first, so, in the case of the first sequence, the inbound peer is already established and blocking on readRemoteVersionMsg until the outbound peer sends the version message at which point the handshake process proceeds and it ends up failing to produce its own local version message.

In the other direction, the peer fails when attempting to go first (via writeLocalVersionMsg).

So, in other words, it forces them both to fail for slightly different reasons depending on which one is going first.

@davecgh davecgh force-pushed the peer_sync_handshake branch from 1daea09 to 26d472b Compare March 9, 2026 23:45
davecgh added 3 commits March 9, 2026 18:50
This correct the version in README.md to the most recent released
version and brings the documentation in doc.go to more modern standards.
This does some basic test cleanup and modernizes some of the peer tests
as follows:

- Consolidates the mock peer config used throughout the tests
- Consolidates and simplifies the mock pipe creation
- Marks peer state tests as a helper
- Uses t.Fatalf where appropriate
- Removes additional newlines in failure strings
The majority of the tests in TestOutboundPeer are not actually testing
anything because nothing is checked.  This moves the one thing that is
being tested into a separate test func and removes the rest since it is
already tested elsewhere.
@davecgh davecgh force-pushed the peer_sync_handshake branch from 26d472b to d9eeafe Compare March 10, 2026 01:49
Copy link
Member Author

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review. I've addressed most of the feedback. I'll address the rest a bit later.

return nil, 0, err
}
return hash, 234439, nil
// Repeat, but in the other direction so the outbound peer has the error.
Copy link
Member Author

@davecgh davecgh Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is calling the same funcs, but in a different order. That isn't at all obvious, especially at this point in the sequence of commits since the handshake is all happening asynchronously. I think maybe it is more obvious in a later commit when it switches over to the synchronous handshake.

The outbound peer always goes first, so, in the case of the first sequence, the inbound peer is already established and blocking on readRemoteVersionMsg until the outbound peer sends the version message at which point the handshake process proceeds and it ends up failing to produce its own local version message.

In the other direction, the peer fails when attempting to go first (via writeLocalVersionMsg).

So, in other words, it forces them both to fail for slightly different reasons depending on which one is going first.

@davecgh davecgh force-pushed the peer_sync_handshake branch from d9eeafe to 2841d7f Compare March 10, 2026 01:52
Due to legacy reasons that no longer apply, connections are currently
associated with a peer after the constructors have been called via
AssociateConnection.

This modifies the code to instead accept the connections in the inbound
and outbound constructors and exports the Start method in its place.

Ultimately, the goal is to split the handshake into a separate method
and convert the lifecycle over to use contexts.
@davecgh davecgh force-pushed the peer_sync_handshake branch from 2841d7f to 8fed4dd Compare March 10, 2026 02:56
davecgh added 2 commits March 10, 2026 00:17
The current design where the handshake happens asynchronously when the
async I/O is started is less than ideal and is quite brittle.  It also
significantly complicates everything as evidenced by several minor bugs
over the years that have resulted from faulty assumptions which directly
stem from its asynchronous nature.

For an example of some of the complexity it causes, it means that a
bunch of additional flags are required that solely related to the
handshake.  Namely, whether or not the version if known, whether the
verack has been received, and whether the handshake is done.  Then,
because it's all happening asynchronously, later code has to be vigilant
about checking that those events have happened.

All of this complexity can entirely be avoided by simply requiring a
successful synchronous handshake to take place prior to starting async
I/O.

With that in mind, this significantly reworks the way the handshake is
handled so that happens via a separate blocking method and removes async
handlers which are no longer required as a result.

The following is a high level overview of the changes:

- Introduce programmatically detectable errors consistent with other
  code throughout the repository
- Move the handshake code to a separate blocking method named Handshake
  that accepts a callback to invoke with the received version message
  - The new method returns an error that callers can use to reliably
    detect a failed handshake
  - The callback can return an error to cause the handshake to fail
    and pass the error along to the caller
- Make the initial handshake block until both the version and verack
  message are received
- Any further received version or verack messages in the async I/O
  handlers are now unconditionally an error
- Removes the OnVersion and OnVerAck async listeners that no longer apply
- Updates the calling server code thread the overall process context
  down to the handshake and Run methods
- Adds several additional tests for correctness
- Updates the example to clearly show the new semantics
- Includes extra documentation to elucidate the exact requirements for
  establishing a new peer as well as exactly which properties the caller
  can and can't rely on during the handshake
Now that the handshake is required to take place prior to starting
async i/o processing, the version and verack messages are guaranteed to
have been seen for a successful handshake.

Given that, this removes the related fields and methods since they are
no longer needed.
@davecgh davecgh force-pushed the peer_sync_handshake branch from 8fed4dd to 4bd63a7 Compare March 10, 2026 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants