Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "net: Avoid duplicate getheaders requests." PR #8054 #8306

Merged
merged 1 commit into from
Jul 6, 2016
Merged

Revert "net: Avoid duplicate getheaders requests." PR #8054 #8306

merged 1 commit into from
Jul 6, 2016

Conversation

gmaxwell
Copy link
Contributor

@gmaxwell gmaxwell commented Jul 5, 2016

This reverts commit f93c2a1.

This can cause synchronization to get stuck.

This reverts commit f93c2a1.

This can cause synchronization to get stuck.
@gmaxwell
Copy link
Contributor Author

gmaxwell commented Jul 5, 2016

I observed a testnet node persistently stuck (even across restarts) in a case where its best header chain was invalid and the public network has a best valid chain with much more work than the invalid best header chain that I have, but it took more than one headers message to connect it.

Suhas identified this PR as the probable cause, and on revert the node immediately became unstuck. Considering how near we are to release, I think simply reverting this is the right action currently. The issue it was fixing should have been rare and largely inconsequential on the Bitcoin network.

@sdaftuar
Copy link
Member

sdaftuar commented Jul 5, 2016

utACK. We should tag this for 0.13.0.

@pstratem
Copy link
Contributor

pstratem commented Jul 5, 2016

I ran into this problem, this fixed it.

@gmaxwell
Copy link
Contributor Author

gmaxwell commented Jul 6, 2016

I've run into at least two people on IRC with this issue (in addition to Patrick).

@laanwj laanwj added the P2P label Jul 6, 2016
@laanwj laanwj merged commit 4fbdc43 into bitcoin:master Jul 6, 2016
laanwj added a commit that referenced this pull request Jul 6, 2016


4fbdc43 Revert "net: Avoid duplicate getheaders requests." PR #8054 (Gregory Maxwell)
@domob1812
Copy link
Contributor

Even if the issue my original patch fixed is rare in practice, I would like to see it fixed. I understand that it is best to roll the change back after finding the issue now before the upcoming release, though. Can I open an issue to track a "fixed fix" for 0.14? Also, I do not yet fully understand how a node could become stuck with the patch even if it is on an invalid chain. Does anyone have a good explanation for what the issue is exactly?

@sdaftuar
Copy link
Member

sdaftuar commented Jul 10, 2016

@domob1812 One example: before reverting this patch, if there are two competing forks with tips A and B, and a node is at tip A and the fork point C between A and B is more than 2000 blocks in the past, and a node already has the first 2000 headers from C to B but no later ones, then it's possible that the hasHeaders check added by #8054 would prevent the node from ever learning about tip B, causing chain sync to fail.

@domob1812
Copy link
Contributor

Thanks @sdaftuar, makes sense. I'll think about it.

@DeckerSU
Copy link

@sdaftuar i know it's been a long time since initial patch, but how do you think, if we limit initial @domob1812 fix only to IBD, like:

bool hasNewHeaders = true;
if (IsInitialBlockDownload()) {
    hasNewHeaders = (mapBlockIndex.count(headers.back().GetHash()) == 0);
}
...
if (nCount == MAX_HEADERS_RESULTS && pindexLast && hasNewHeaders) {
...
    pfrom->PushMessage("getheaders", chainActive.GetLocator(pindexLast), uint256());
}

Will it cause sync stuck case described here #8306 (comment) ?

I'm trying to solve duplicate getheaders requests issue in ZCash and Komodo, bcz in these chains duplicate getheaders requests causes really huge overhead (additional traffic download), bcz of bigger blockheader size (1488 size). Just an example:

2020-04-16 02:03:29 more getheaders (1641598) to end to peer=1 (startheight:1836680)
2020-04-16 02:03:29 sending: getheaders (1061 bytes) peer=1
2020-04-16 02:03:29 received: headers (238081 bytes) peer=2
2020-04-16 02:03:29 more getheaders (1641598) to end to peer=2 (startheight:1836680)
2020-04-16 02:03:29 sending: getheaders (1061 bytes) peer=2
2020-04-16 02:03:29 sending: getheaders (1061 bytes) peer=3
2020-04-16 02:03:29 getheaders (1641598) 08685ebc864678e16e9094fafce2119a9c63aae7872dc6fcc974f29cfc201e1e to peer=3
...
2020-04-16 02:03:29 more getheaders (1641598) to end to peer=5 (startheight:1836680)
2020-04-16 02:03:29 sending: getheaders (1061 bytes) peer=5
2020-04-16 02:03:29 received: headers (238081 bytes) peer=5

As we are see here same getheaders request sent to peer 1, 2 and 5. And we got 3 replies (MAX_HEADERS_RESULTS x size(CBlockHeader) = 160 x 1488 = 238081 bytes each). With ~1.8M blocks in chain it can cause even ~500 Gb of download traffic during IBD and initial headers sync stage, while the size of entire blockchain is only ~25-30 Gb. I did small test-lab experiment with syncing from scratch from 2 peers in local 1 Gbps network - it downloads 1 Gb of duplicate headers just in few minutes.

So, any advice is appreciated.

p.s. Limiting initial fix with IsInitialBlockDownload() seems working, trafic download during initial headers sync reducing significantly, but I just want to make sure that there are no any pitfalls.

hdevalence added a commit to ZcashFoundation/zebra that referenced this pull request Dec 2, 2020
Zcashd will blindly request more block headers as long as it got 160
block headers in response to a previous query, EVEN IF THOSE HEADERS ARE
ALREADY KNOWN.  To dodge this behavior, return slightly fewer than the
maximum, to get it to go away.

https://github.com/zcash/zcash/blob/0ccc885371e01d844ebeced7babe45826623d9c2/src/main.cpp#L6274-L6280

Without this change, communication between a partially-synced `zebrad`
and fully-synced `zcashd` looked like this:

1.  `zebrad` connects to `zcashd`, which sends an initial `getheaders`
    request;

2.  `zebrad` correctly computes the intersection of the provided block
    locator with the node's current chain and returns 160 following
    headers;

3.  `zcashd` does not check whether it already has those headers and
    assumes that any provided headers are new and re-validates them;

4.  `zcashd` assumes that because `zebrad` responded with 160 headers,
    the `zebrad` node is ahead of it, and requests the next 160 headers.

5.  Because block locators are sparse, the intersection between the
    `zcashd` and `zebrad` chains is likely well behind the `zebrad` tip,
    so this process continues for thousands of blocks.

To avoid this problem, we return slightly fewer than the protocol
maximum (158 rather than 160, to guard against off-by-one errors in
zcashd).  This does not interfere with use of the returned headers by
peers that check the headers, but does prevent `zcashd` from trying to
download thousands of block headers it already has.

This problem does not occur in the `zcashd<->zcashd` case only because
`zcashd` does not respond to `getheaders` messages while it is syncing.
However, implementing this behavior in Zebra would be more complicated,
because we don't have a distinct "initial block sync" state (we do
poll-based syncing continuously) and we don't have shared global
variables to modify to set that state.

Relevant links (thanks @str4d):

- The PR that introduced this behavior: https://github.com/bitcoin/bitcoin/pull/4468/files#r17026905
- bitcoin/bitcoin#6861
- bitcoin/bitcoin#6755
- bitcoin/bitcoin#8306 (comment)
dconnolly pushed a commit to ZcashFoundation/zebra that referenced this pull request Dec 3, 2020
Zcashd will blindly request more block headers as long as it got 160
block headers in response to a previous query, EVEN IF THOSE HEADERS ARE
ALREADY KNOWN.  To dodge this behavior, return slightly fewer than the
maximum, to get it to go away.

https://github.com/zcash/zcash/blob/0ccc885371e01d844ebeced7babe45826623d9c2/src/main.cpp#L6274-L6280

Without this change, communication between a partially-synced `zebrad`
and fully-synced `zcashd` looked like this:

1.  `zebrad` connects to `zcashd`, which sends an initial `getheaders`
    request;

2.  `zebrad` correctly computes the intersection of the provided block
    locator with the node's current chain and returns 160 following
    headers;

3.  `zcashd` does not check whether it already has those headers and
    assumes that any provided headers are new and re-validates them;

4.  `zcashd` assumes that because `zebrad` responded with 160 headers,
    the `zebrad` node is ahead of it, and requests the next 160 headers.

5.  Because block locators are sparse, the intersection between the
    `zcashd` and `zebrad` chains is likely well behind the `zebrad` tip,
    so this process continues for thousands of blocks.

To avoid this problem, we return slightly fewer than the protocol
maximum (158 rather than 160, to guard against off-by-one errors in
zcashd).  This does not interfere with use of the returned headers by
peers that check the headers, but does prevent `zcashd` from trying to
download thousands of block headers it already has.

This problem does not occur in the `zcashd<->zcashd` case only because
`zcashd` does not respond to `getheaders` messages while it is syncing.
However, implementing this behavior in Zebra would be more complicated,
because we don't have a distinct "initial block sync" state (we do
poll-based syncing continuously) and we don't have shared global
variables to modify to set that state.

Relevant links (thanks @str4d):

- The PR that introduced this behavior: https://github.com/bitcoin/bitcoin/pull/4468/files#r17026905
- bitcoin/bitcoin#6861
- bitcoin/bitcoin#6755
- bitcoin/bitcoin#8306 (comment)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants