Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve libp2p connection logs #986

Closed
corverroos opened this issue Aug 17, 2022 · 0 comments
Closed

Improve libp2p connection logs #986

corverroos opened this issue Aug 17, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@corverroos
Copy link
Contributor

corverroos commented Aug 17, 2022

Problem to be solved

We currently log libp2p connection issues in three places:

  • p2p/ping.go
  • p2p/sender.go
  • p2p/relay.go

Most connection logs are however swarm.ErrDialBackoff which provide no information and is just noise and it hides the actual reason for connection issue making debugging and resolving network issues very hard.

Proposed solution

Clean up libp2p errors:

  • Add a function p2p.ConvertErr(error) error that converts libp2p swarm.DialErrors to our error format.
  • Remove noisy variables from message
  • pick the first error in the batch (add other errors as fields)
  • Add a p2p.IsNoisyErr function that returns true if the error is swarm.ErrDialBackoff
  • Do not log noisy errors in p2p/sender.go, p2p/relay.go
  • Add tests for all of these

Refactor ping.go:

  • Use expbackoff for better backoff and retry and less noisy logs.
  • Set hysteresis=1 maybe
  • When p2p.IsNoisyErr do host.Connect with network.WithForceDirectDial to obtain real error.
  • Test this manually via compose
@corverroos corverroos added the enhancement New feature or request label Aug 17, 2022
obol-bulldozer bot pushed a commit that referenced this issue Aug 21, 2022
Refactors the p2p connectedness logging:
 - Use ping service as thing that logs whether peer X is connected or not.
 - Extract dial error reasons per address
 - Attempt to resolve "dial backoff" errors into "real" dial errors.
 - Only log when reasons change or every 10min.

category: refactor
ticket: #986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant