Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer connection test cases #2010

Closed
kilrau opened this issue Dec 1, 2020 · 1 comment · Fixed by #2015
Closed

Peer connection test cases #2010

kilrau opened this issue Dec 1, 2020 · 1 comment · Fixed by #2015
Assignees
Labels
P1 top priority p2p Peer to peer networking

Comments

@kilrau
Copy link
Contributor

kilrau commented Dec 1, 2020

This issue is to collect a set of peer connection test cases with the goal to reliably reproduce issues regarding connecting/re-connecting to peers that continue to pop up: #2009 (comment)

@kilrau kilrau added p2p Peer to peer networking P1 top priority labels Dec 1, 2020
@sangaman sangaman self-assigned this Dec 2, 2020
@sangaman
Copy link
Collaborator

sangaman commented Dec 2, 2020

In my tests, I encountered the following behavior that I believe is directly related to the comment linked above.

  1. Start xud and connect to an outbound peer.
  2. Turn off wifi and wait a bit, peer remains in the local xud list. This is an issue in itself that requires some more investigation, but I'm guessing this has to do with the way TCP is designed.
  3. Restart xud and re-enable wifi. Reconnection attempts happen but they get rejected because the peer thinks it is already connected to us and does not accept the connection. This happens in a loop.

I'll look into why these "zombie" peers are remaining in the peers list even when connectivity is lost. Our ping/pong packets should be disconnecting these peers when the ping doesn't receive a corresponding pong, so my first thought is that something isn't working correctly there.

As a quick overview of the behavior we expect:

So can anyone please describe how P2P connection establishing / connection supporting / connection restoring cases work in xud?

Nodes advertise listening addresses and gossip other known nodes and their respective listening addresses. A given node will listen for inbound peers on its inbound addresses and also attempt outbound connections for known nodes with listening addresses. Once connected, nodes exchange ping and pong packets on a timer, and most packets have a time threshold by which they expect a response. If a node expects a response from a peer but doesn't get one in time, it should disconnect that peer and send it a final packet explaining that it's disconnecting due to stalling. In case two peers lose connection (due to stalling, for example) the peer that originally initiated the connection should attempt to reconnect.

sangaman added a commit that referenced this issue Dec 3, 2020
This fixes a regression introduced in #1994 that prevents enabling the
stall timer for peers that checks whether peers are not responding to
our packets in a reasonable timeframe and disconnecting those that are
not. Without these checks, peers that have gone offline may remain in
the list of connected peers, resulting in unexpected behavior and also
preventing those peers from establishing new connections with us once
they come back online.

Closes #2010.
raladev pushed a commit that referenced this issue Dec 3, 2020
This fixes a regression introduced in #1994 that prevents enabling the
stall timer for peers that checks whether peers are not responding to
our packets in a reasonable timeframe and disconnecting those that are
not. Without these checks, peers that have gone offline may remain in
the list of connected peers, resulting in unexpected behavior and also
preventing those peers from establishing new connections with us once
they come back online.

Closes #2010.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 top priority p2p Peer to peer networking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants