Summary
linux64_tsan-test / Test source intermittently fails in p2p_node_network_limited.py --v2transport with AssertionError: Error: peer disconnected. This is not caused by PR-specific code in dashpay/dash#7230; the same head SHA passed on rerun without any branch changes.
Evidence
- Failing TSAN job on PR head
e7a422bbab60e271a504b3401b6a444cea1c27f1:
- Passing rerun on the exact same head SHA:
Failure mode
The failure happens here:
File "test/functional/p2p_node_network_limited.py", line 83, in run_test
self.connect_nodes(0, 2)
...
AssertionError: Error: peer disconnected
Combined logs show node 0 immediately disconnecting node 2 after node 2 requests a block below the NODE_NETWORK_LIMITED threshold:
ProcessGetBlockData [net] Ignore block request below NODE_NETWORK_LIMITED threshold, disconnect peer=2
connect_nodes() is still waiting for the outbound peer to stay connected long enough to exchange a pong, so the helper fails with peer disconnected.
Diagnosis
This looks timing-sensitive / transport-sensitive rather than PR-specific:
- PR
#7230 only changes src/node/interfaces.cpp and src/wallet/wallet.cpp.
- The failing test is
test/functional/p2p_node_network_limited.py.
- The exact same PR head passed on rerun, so there is no deterministic wallet-side regression here.
The likely issue is that the test currently assumes connect_nodes(0, 2) will remain connected long enough for the helper handshake, but under TSAN + --v2transport the pruned node can disconnect node 2 quickly enough that the helper trips first.
Reproduction ideas
I have not reproduced this locally outside CI yet. The closest reproduction path is to loop the test under a slow / TSAN-like environment:
python3 test/functional/test_runner.py p2p_node_network_limited.py --v2transport
or repeatedly rerun the TSAN functional shard in CI until the timing window appears.
Suggested direction
Harden the test so it does not rely on connect_nodes() succeeding when the scenario itself can legitimately trigger a fast disconnect. For example, make the unsynced-node phase explicitly tolerate the disconnect and assert the expected postcondition (node2 stays at height 0) without requiring a stable pong handshake first.
Summary
linux64_tsan-test / Test sourceintermittently fails inp2p_node_network_limited.py --v2transportwithAssertionError: Error: peer disconnected. This is not caused by PR-specific code indashpay/dash#7230; the same head SHA passed on rerun without any branch changes.Evidence
e7a422bbab60e271a504b3401b6a444cea1c27f1:Failure mode
The failure happens here:
Combined logs show node 0 immediately disconnecting node 2 after node 2 requests a block below the
NODE_NETWORK_LIMITEDthreshold:connect_nodes()is still waiting for the outbound peer to stay connected long enough to exchange apong, so the helper fails withpeer disconnected.Diagnosis
This looks timing-sensitive / transport-sensitive rather than PR-specific:
#7230only changessrc/node/interfaces.cppandsrc/wallet/wallet.cpp.test/functional/p2p_node_network_limited.py.The likely issue is that the test currently assumes
connect_nodes(0, 2)will remain connected long enough for the helper handshake, but under TSAN +--v2transportthe pruned node can disconnect node 2 quickly enough that the helper trips first.Reproduction ideas
I have not reproduced this locally outside CI yet. The closest reproduction path is to loop the test under a slow / TSAN-like environment:
or repeatedly rerun the TSAN functional shard in CI until the timing window appears.
Suggested direction
Harden the test so it does not rely on
connect_nodes()succeeding when the scenario itself can legitimately trigger a fast disconnect. For example, make the unsynced-node phase explicitly tolerate the disconnect and assert the expected postcondition (node2stays at height 0) without requiring a stableponghandshake first.