Skip to content

test: p2p_node_network_limited.py --v2transport intermittently disconnects during connect_nodes #7288

@thepastaclaw

Description

@thepastaclaw

Summary

linux64_tsan-test / Test source intermittently fails in p2p_node_network_limited.py --v2transport with AssertionError: Error: peer disconnected. This is not caused by PR-specific code in dashpay/dash#7230; the same head SHA passed on rerun without any branch changes.

Evidence

Failure mode

The failure happens here:

File "test/functional/p2p_node_network_limited.py", line 83, in run_test
    self.connect_nodes(0, 2)
...
AssertionError: Error: peer disconnected

Combined logs show node 0 immediately disconnecting node 2 after node 2 requests a block below the NODE_NETWORK_LIMITED threshold:

ProcessGetBlockData [net] Ignore block request below NODE_NETWORK_LIMITED threshold, disconnect peer=2

connect_nodes() is still waiting for the outbound peer to stay connected long enough to exchange a pong, so the helper fails with peer disconnected.

Diagnosis

This looks timing-sensitive / transport-sensitive rather than PR-specific:

  • PR #7230 only changes src/node/interfaces.cpp and src/wallet/wallet.cpp.
  • The failing test is test/functional/p2p_node_network_limited.py.
  • The exact same PR head passed on rerun, so there is no deterministic wallet-side regression here.

The likely issue is that the test currently assumes connect_nodes(0, 2) will remain connected long enough for the helper handshake, but under TSAN + --v2transport the pruned node can disconnect node 2 quickly enough that the helper trips first.

Reproduction ideas

I have not reproduced this locally outside CI yet. The closest reproduction path is to loop the test under a slow / TSAN-like environment:

python3 test/functional/test_runner.py p2p_node_network_limited.py --v2transport

or repeatedly rerun the TSAN functional shard in CI until the timing window appears.

Suggested direction

Harden the test so it does not rely on connect_nodes() succeeding when the scenario itself can legitimately trigger a fast disconnect. For example, make the unsynced-node phase explicitly tolerate the disconnect and assert the expected postcondition (node2 stays at height 0) without requiring a stable pong handshake first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions