Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide if we need shorter timeouts on PeerSet fanout requests #3136

Closed
Tracked by #2310
teor2345 opened this issue Dec 3, 2021 · 5 comments
Closed
Tracked by #2310

Decide if we need shorter timeouts on PeerSet fanout requests #3136

teor2345 opened this issue Dec 3, 2021 · 5 comments
Labels
A-network Area: Network protocol updates or fixes C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness S-needs-investigation Status: Needs further investigation

Comments

@teor2345
Copy link
Collaborator

teor2345 commented Dec 3, 2021

TODO

After #2214 is done, check if Zebra still has this issue, and decide how important this ticket is.

Motivation

Zebra does a lot of PeerSet fanouts that never get a reply.

This can contribute to CI failures like:

test activate_mempool_mainnet ... FAILED

Dec 02 21:33:44.650 INFO zebrad::components::mempool::crawler: Failed to crawl peer for mempool transactions: request timed out
Dec 02 21:33:44.650 INFO zebrad::components::mempool::crawler: Failed to crawl peer for mempool transactions: request timed out
Dec 02 21:33:44.650 INFO zebrad::components::mempool::crawler: Failed to crawl peer for mempool transactions: request timed out

0: stdout of command did not contain any matches for the given regex
"stopping at configured height"

https://github.com/ZcashFoundation/zebra/runs/4400724206?check_suite_focus=true#step:14:1727

https://github.com/ZcashFoundation/zebra/runs/4400724206?check_suite_focus=true#step:14:1727

Designs

  • Put a shorter timeout on PeerSet fanouts - see the CandidateSet::update timeout for an example

Related Work

@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage P-Medium I-slow Problems with performance or responsiveness I-integration-fail Continuous integration fails, including build and test failures I-usability Zebra is hard to understand or use A-network Area: Network protocol updates or fixes labels Dec 3, 2021
@teor2345 teor2345 changed the title Stop PeerSet fanouts taking a long time to fail If PeerSet fanouts that don't get a response, make them fail much faster Dec 3, 2021
@teor2345
Copy link
Collaborator Author

teor2345 commented Dec 5, 2021

@conradoplg
Copy link
Contributor

@teor2345
Copy link
Collaborator Author

There are 4 fanouts in Zebra:

  • mempool
  • sync obtain tips
  • sync extend tips
  • candidate set addresses (already has a shorter timeout)

You can see them all here - you might need to search in each file:
https://github.com/ZcashFoundation/zebra/search?q=FANOUT

As part of this PR, we need to test that the peers become ready after the shorter timeout. I think there might be some peer set hang bugs in Zebra, but PR #3200 should help fix or diagnose them.

@teor2345 teor2345 changed the title If PeerSet fanouts that don't get a response, make them fail much faster Decide if we need shorter timeouts on PeerSet fanout requests Dec 14, 2021
@teor2345 teor2345 added S-needs-investigation Status: Needs further investigation S-blocked Status: Blocked on other tasks and removed I-usability Zebra is hard to understand or use labels Dec 14, 2021
@mpguerra
Copy link
Contributor

@teor2345 teor2345 added P-Low and removed P-Medium labels Dec 15, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Dec 17, 2021
@teor2345
Copy link
Collaborator Author

teor2345 commented Jan 6, 2022

We should fix the underlying PeerSet bugs, rather than trying to work around them using timeouts. (Because timeouts can cause other bugs.)

@teor2345 teor2345 closed this as completed Jan 6, 2022
@mpguerra mpguerra removed the S-blocked Status: Blocked on other tasks label Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness S-needs-investigation Status: Needs further investigation
Projects
None yet
Development

No branches or pull requests

4 participants