-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix intermittent failure in p2p_v2_earlykeyresponse #29352
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever find! will ACK after this is fixed.
@@ -59,6 +59,8 @@ def data_received(self, t): | |||
# check that data can be received on recvbuf only when mismatch from V1_PREFIX happens (send_net_magic = False) | |||
assert self.v2_state.can_data_be_received and not self.v2_state.send_net_magic | |||
|
|||
def on_open(self): | |||
self.connection_made = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1d04d58: i think we should call it a different name - self.connected
or something like that? i'm still getting a crash with the sleep statement because python is confusing connection_made
variable and callback function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, renamed to connection_opened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I think you need to also initialize the newly introduced variable in the PeerEarlyKey
constructor, otherwise the following can happen in the wait_until
loop:
AttributeError: 'PeerEarlyKey' object has no attribute 'connection_made2'
This fixes a possible race between the python NetworkThread and the actual test, which will both call initiate_v2_handshake.
1d04d58
to
9642aef
Compare
Fixed both issues, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we will not better off having a simpler initiate_v2_handshake
in this case and avoiding sync + async logic. Given we end up having to manually send raw messages and accessing peer1
internal state anyway, I think it would be simpler to send everything manually:
class TestEncryptedP2PState(EncryptedP2PState):
def __init__(self):
super().__init__(initiating=True, net='regtest')
self.magic_sent = False
def initiate_v2_handshake(self):
self.privkey_ours, self.ellswift_ours = ellswift_create()
self.sent_garbage = random.randbytes(random.randrange(4096))
# We'll take care to do the handshake manually
return b""
class PeerEarlyKey(P2PInterface):
"""Custom implementation of P2PInterface which uses modified v2 P2P protocol functions for testing purposes."""
def __init__(self):
super().__init__()
self.v2_state = None
def connection_made(self, transport):
"""64 bytes ellswift is sent in 2 parts during `initial_v2_handshake()`"""
self.v2_state = TestEncryptedP2PState()
super().connection_made(transport)
def data_received(self, t):
# check that data can be received on recvbuf only when mismatch from V1_PREFIX happens
assert self.v2_state.magic_sent
class P2PEarlyKey(BitcoinTestFramework):
def set_test_params(self):
self.num_nodes = 1
self.extra_args = [["-v2transport=1", "-peertimeout=3"]]
def run_test(self):
self.log.info('Sending ellswift bytes in parts to ensure that response from responder is received only when')
self.log.info('ellswift bytes have a mismatch from the 16 bytes(network magic followed by "version\\x00\\x00\\x00\\x00\\x00")')
node0 = self.nodes[0]
self.log.info('Sending first 4 bytes of ellswift which match network magic')
self.log.info('If a response is received, assertion failure would happen in our custom data_received() function')
peer1 = node0.add_p2p_connection(PeerEarlyKey(), wait_for_verack=False, send_version=False, supports_v2_p2p=True)
# Send only the network magic first
peer1.send_raw_message(b"\xfa\xbf\xb5\xda")
peer1.v2_state.magic_sent = True
self.log.info('Sending remaining ellswift and garbage which are different from V1_PREFIX. Since a response is')
self.log.info('expected now, our custom data_received() function wouldn\'t result in assertion failure')
# Send the remaining bytes, if any bytes were received before this point `data_received` would have resulted in an assertion fail
peer1.send_raw_message(peer1.v2_state.ellswift_ours[4:] + peer1.v2_state.sent_garbage)
peer1.wait_for_disconnect(timeout=5)
self.log.info('successful disconnection when MITM happens in the key exchange phase')
if __name__ == '__main__':
P2PEarlyKey().main()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 9642aef
Reproduced the issue and verified its fix by adding an artificial delay in initiate_v2_handshake
, as suggested in the PR description.
(With artificial delays of 3 seconds or more, the test fails again with an IOError('Not connected')
, but this is expected as the TestNode closes the connection if there is no activity: ... [InactivityCheck] [net] socket no message in first 3 seconds ...
)
Sounds reasonable to me, but I'd like to refer that to @stratospher, who I believe has plans to extend / rewrite the test anyway (see the WIP branch linked here), and just fix the CI here straightforwardly. |
tested ACK 9642aef.
yes, will look into it! extending this test for more v2 disconnection scenarios in this WIP branch |
ACK 9642aef |
The test fails intermittently, see https://cirrus-ci.com/task/6403578080788480?logs=ci#L3521 and #24748 (comment).
I think it's because of a race between the python NetworkThread and the actual
test, which will both call
initiate_v2_handshake
. I could reproduce it by adding a sleep intoinitiate_v2_handshake
after the lineself.sent_garbage = random.randbytes(garbage_len)
.Fix this by waiting for the first
initiate_v2_handshake
to have finished before calling it a second time.