-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "net: Avoid duplicate getheaders requests." PR #8054 #8306
Conversation
This reverts commit f93c2a1. This can cause synchronization to get stuck.
I observed a testnet node persistently stuck (even across restarts) in a case where its best header chain was invalid and the public network has a best valid chain with much more work than the invalid best header chain that I have, but it took more than one headers message to connect it. Suhas identified this PR as the probable cause, and on revert the node immediately became unstuck. Considering how near we are to release, I think simply reverting this is the right action currently. The issue it was fixing should have been rare and largely inconsequential on the Bitcoin network. |
utACK. We should tag this for 0.13.0. |
I ran into this problem, this fixed it. |
I've run into at least two people on IRC with this issue (in addition to Patrick). |
Even if the issue my original patch fixed is rare in practice, I would like to see it fixed. I understand that it is best to roll the change back after finding the issue now before the upcoming release, though. Can I open an issue to track a "fixed fix" for 0.14? Also, I do not yet fully understand how a node could become stuck with the patch even if it is on an invalid chain. Does anyone have a good explanation for what the issue is exactly? |
@domob1812 One example: before reverting this patch, if there are two competing forks with tips A and B, and a node is at tip A and the fork point C between A and B is more than 2000 blocks in the past, and a node already has the first 2000 headers from C to B but no later ones, then it's possible that the hasHeaders check added by #8054 would prevent the node from ever learning about tip B, causing chain sync to fail. |
Thanks @sdaftuar, makes sense. I'll think about it. |
@sdaftuar i know it's been a long time since initial patch, but how do you think, if we limit initial @domob1812 fix only to IBD, like:
Will it cause sync stuck case described here #8306 (comment) ? I'm trying to solve duplicate getheaders requests issue in ZCash and Komodo, bcz in these chains duplicate getheaders requests causes really huge overhead (additional traffic download), bcz of bigger blockheader size (1488 size). Just an example:
As we are see here same So, any advice is appreciated. p.s. Limiting initial fix with |
Zcashd will blindly request more block headers as long as it got 160 block headers in response to a previous query, EVEN IF THOSE HEADERS ARE ALREADY KNOWN. To dodge this behavior, return slightly fewer than the maximum, to get it to go away. https://github.com/zcash/zcash/blob/0ccc885371e01d844ebeced7babe45826623d9c2/src/main.cpp#L6274-L6280 Without this change, communication between a partially-synced `zebrad` and fully-synced `zcashd` looked like this: 1. `zebrad` connects to `zcashd`, which sends an initial `getheaders` request; 2. `zebrad` correctly computes the intersection of the provided block locator with the node's current chain and returns 160 following headers; 3. `zcashd` does not check whether it already has those headers and assumes that any provided headers are new and re-validates them; 4. `zcashd` assumes that because `zebrad` responded with 160 headers, the `zebrad` node is ahead of it, and requests the next 160 headers. 5. Because block locators are sparse, the intersection between the `zcashd` and `zebrad` chains is likely well behind the `zebrad` tip, so this process continues for thousands of blocks. To avoid this problem, we return slightly fewer than the protocol maximum (158 rather than 160, to guard against off-by-one errors in zcashd). This does not interfere with use of the returned headers by peers that check the headers, but does prevent `zcashd` from trying to download thousands of block headers it already has. This problem does not occur in the `zcashd<->zcashd` case only because `zcashd` does not respond to `getheaders` messages while it is syncing. However, implementing this behavior in Zebra would be more complicated, because we don't have a distinct "initial block sync" state (we do poll-based syncing continuously) and we don't have shared global variables to modify to set that state. Relevant links (thanks @str4d): - The PR that introduced this behavior: https://github.com/bitcoin/bitcoin/pull/4468/files#r17026905 - bitcoin/bitcoin#6861 - bitcoin/bitcoin#6755 - bitcoin/bitcoin#8306 (comment)
Zcashd will blindly request more block headers as long as it got 160 block headers in response to a previous query, EVEN IF THOSE HEADERS ARE ALREADY KNOWN. To dodge this behavior, return slightly fewer than the maximum, to get it to go away. https://github.com/zcash/zcash/blob/0ccc885371e01d844ebeced7babe45826623d9c2/src/main.cpp#L6274-L6280 Without this change, communication between a partially-synced `zebrad` and fully-synced `zcashd` looked like this: 1. `zebrad` connects to `zcashd`, which sends an initial `getheaders` request; 2. `zebrad` correctly computes the intersection of the provided block locator with the node's current chain and returns 160 following headers; 3. `zcashd` does not check whether it already has those headers and assumes that any provided headers are new and re-validates them; 4. `zcashd` assumes that because `zebrad` responded with 160 headers, the `zebrad` node is ahead of it, and requests the next 160 headers. 5. Because block locators are sparse, the intersection between the `zcashd` and `zebrad` chains is likely well behind the `zebrad` tip, so this process continues for thousands of blocks. To avoid this problem, we return slightly fewer than the protocol maximum (158 rather than 160, to guard against off-by-one errors in zcashd). This does not interfere with use of the returned headers by peers that check the headers, but does prevent `zcashd` from trying to download thousands of block headers it already has. This problem does not occur in the `zcashd<->zcashd` case only because `zcashd` does not respond to `getheaders` messages while it is syncing. However, implementing this behavior in Zebra would be more complicated, because we don't have a distinct "initial block sync" state (we do poll-based syncing continuously) and we don't have shared global variables to modify to set that state. Relevant links (thanks @str4d): - The PR that introduced this behavior: https://github.com/bitcoin/bitcoin/pull/4468/files#r17026905 - bitcoin/bitcoin#6861 - bitcoin/bitcoin#6755 - bitcoin/bitcoin#8306 (comment)
This reverts commit f93c2a1.
This can cause synchronization to get stuck.