Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: For assumeutxo, download snapshot chain before background chain #29519

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mzumsande
Copy link
Contributor

After loading a snapshot, pindexLastCommonBlock is usually already set to some block for existing peers. That means we'd continue syncing the background chain from those peers instead of prioritising the snapshot chain, which defeats the purpose of doing assumeutxo in the first place. Only existing peers are affected by this bug.

After loading a snapshot, pindexLastCommonBlock is usually already set
to some block for existing peers. That means we'd continue syncing the
background chain from those peers instead of prioritising the snapshot
chain, which defeats the purpose of doing assumeutxo in the first place.
Only existing peers are affected by this bug.
@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 29, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@fanquake
Copy link
Member

fanquake commented Mar 5, 2024

Note that given that 27 will not ship with mainnet assumeutxo, this isn't a blocker, and a fix can still be backported to 27.x if deemed necessary.

@Sjors
Copy link
Member

Sjors commented Mar 11, 2024

Concept ACK

ac547ea looks reasonable, but it would be nice to somehow add a test for it.

What happens once we've reached the tip and need more peers for background sync? And once background sync is done, is it updated to the tip again?

Would it make sense to introduce pindexLastCommonBackgroundSyncBlock so we have more control over what peers to dedicate to the tip vs background (which might take weeks on some devices)?

@mzumsande
Copy link
Contributor Author

mzumsande commented Mar 11, 2024

ac547ea looks reasonable, but it would be nice to somehow add a test for it.

I tested this syncing on signet, but I'll think about a functional test; I imagine that it would be hard to prevent race conditions in a functional test because we'd:

  • need to first connect to a peer that has the entire chain and start syncing the background chain
  • then load the snapshot (at the same time the peer would continue syncing the background chain)
  • then check that we switch to requesting from the snapshot chain (while the bg chain hopefully hasn't finished downloading yet)

It might be possible though if that peer was a P2PConnection() (that could stall / withhold certain blocks) instead of a real node.

What happens once we've reached the tip and need more peers for background sync? And once background sync is done, is it updated to the tip again?

Would it make sense to introduce pindexLastCommonBackgroundSyncBlock so we have more control over what peers to dedicate to the tip vs background (which might take weeks on some devices)?

We have this logic in SendMessages: It first tries FindNextBlocksToDownload meant for the active chainstate, and, if it doesn't fill up the vToDownload buffer (e.g. because we are at / near the tip), then calls TryDownloadingHistoricalBlocks specifically for the background chainstate. I think that this existing separation should work well, the problem that this PR fixes is that FindNextBlocksToDownload would pick historical blocks from the background chain when it shouldn't because higher-prio blocks are available.

@Sjors
Copy link
Member

Sjors commented Mar 12, 2024

I can see how it would be difficult to test, indeed.

I encountered this scenario while testing #29370 on signet. All my peers were only downloading the background state. I only got headers for the tip chain, but it didn't progress. The workaround there was to turn the network off and on, which is consistent with your observation that only existing peers are affected.

utACK ac547ea

@fjahr
Copy link
Contributor

fjahr commented Mar 25, 2024

The code and reasoning look good to me and I have been trying to write a test but I am currently unable to reproduce the behavior described before the change, i.e. to make the test fail on master.

Here is the test so far: fjahr@478fb28

This now does fail on master but only because it also checks for the debug log HERE I have added. This log confirms that the test does trigger the behavior change, i.e. on master the last common block is set once, after the change here it is set twice within FindNextBlockToDownload. But, of course, we would rather like to have the test fail at the asserts at the end but that is not the case so far.

@mzumsande do you have an idea why this doesn't reproduce the behavior you have seen on master?

@mzumsande
Copy link
Contributor Author

mzumsande commented Mar 25, 2024

@mzumsande do you have an idea why this doesn't reproduce the behavior you have seen on master?

I tested your branch, and I see the same - this is really interesting, and it works differently than I thought when I opened the PR!

I think the following happens in PeerManagerImpl::FindNextBlocks (after being called from FindNextBlocksToDownload()):
We process up to 1024 blocks ahead of the current pindexWalk:

int nWindowEnd = state->pindexLastCommonBlock->nHeight + BLOCK_DOWNLOAD_WINDOW;

But due to the lines

bitcoin/src/net_processing.cpp

Lines 1497 to 1502 in 2102c97

if (pindex->nStatus & BLOCK_HAVE_DATA || (activeChain && activeChain->Contains(pindex))) {
if (activeChain && pindex->HaveNumChainTxs()) {
state->pindexLastCommonBlock = pindex;
}
continue;
}

we never ask for the blocks below the snapshot block here: Even though we don't have their data, they are now in the active chain, so they are skipped instead.
In the test, we immediately ask for the blocks after the snapshot instead, which is exactly the desired behavior.

So why did I observe a different behavior on signet?
I think it only works like this in the test because there, the snapshot is just a 100 blocks away from pindexLastCommonBlock.

If this wasn't the case, we'd request nothing in FindNextBlocksToDownload(), only move our pindexLastCommonBlock ahead by a maximum of 1024. Then we have capacity for this peer left and would actually ask for the historical blocks via TryDownloadingHistoricalBlocks().

So I think what would actually happen on mainnet and signet is that each time we can process a block request with an existing peer, we'd move pindexLastCommonBlock 1024 blocks ahead, and then request historical blocks - until pindexLastCommonBlock has reached the snapshot. So we'd continuing downloading the background chain for a while (a few minutes?) with existing peers until we switch over to the snapshot chain (instead of never switching over as I originally thought when opening this PR).

Still, I think that this behavior is not great, and the fix is an improvement - not only because we start downloading the snapshot chain immediately, we also do much less iterating in FindNextBlocksToDownload , moving pindexLastCommonBlock ahead without actually downloading anything.

@fanquake fanquake removed this from the 27.0 milestone Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

None yet

5 participants