New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IsInitialBlockDownload for testnet #7514

Merged
merged 1 commit into from Apr 28, 2016

Conversation

Projects
None yet
6 participants
@jmacwhyte
Contributor

jmacwhyte commented Feb 11, 2016

Problem: When running in testnet, a node can get stuck in IsInitialBlockDownload mode which will prevent it from responding to any GETHEADERS requests. This can make testing difficult! This problem does not happen on mainnet.

Theory: Testnet is a little wonky with the 20-minute difficulty reset condition. The nChainWork calculation can cause pindexBestHeader to be set with an old block that has more work than chainActive.Tip(). If that block is more than nMaxTipAge old, IsInitialBlockDownload will always return true.

Suggested solution: The current code works fine for mainnet, so I added a check to see if we are using testnet, and if so use chainActive.Tip() instead of pindexBestHeader when checking to see how far along we are with the block download. It's my thought that if the current tip is less than one day old (the nMaxTipAge default), it's probably safe to release IsInitialBlockDownload regardless of the age of pindexBestHeader.

[Edit: As per the comments below, I removed the check for testnet and instead used the greater of chainActive or BestHeader.]

@laanwj laanwj added the P2P label Feb 11, 2016

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Feb 11, 2016

Member

Are we sure this condition cannot happen on mainnet?

Introducing testnet-specific behavior is bad, as it makes testnet less useful as a proxy for testing mainnet. Also comparing strings here isn't acceptable - if you really need to do this it should be a flag on CChainParams. But I hope we can come up with a general solution.

Member

laanwj commented Feb 11, 2016

Are we sure this condition cannot happen on mainnet?

Introducing testnet-specific behavior is bad, as it makes testnet less useful as a proxy for testing mainnet. Also comparing strings here isn't acceptable - if you really need to do this it should be a flag on CChainParams. But I hope we can come up with a general solution.

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 11, 2016

Contributor

Thanks for the feedback. I agree, testnet-specific behavior should be avoided if at all possible, but the testnet-specific difficulty rules seem to be what is causing the problem. Maybe another solution would be to have the code that resets testnet difficulty after 20 minutes also reset the work value for pindexBestHeader so a new block with less work is able to replace it. I'm not sure where that happens, though, so I haven't been able to test it out.

It's also possible my theory isn't exactly correct. Currently, my pindexBestHash is set to a block with a height 2000 below the current tip, but the nChainWork is greater than the current tip. It doesn't show on any block explorers, so maybe it was set to a very high difficulty then orphaned.

In theory the same thing could happen on mainnet, but with the amount of mining power out there I can't imagine someone mining an orphaned block that still has more POW than the main chain 2000 blocks later.

Thoughts?

Contributor

jmacwhyte commented Feb 11, 2016

Thanks for the feedback. I agree, testnet-specific behavior should be avoided if at all possible, but the testnet-specific difficulty rules seem to be what is causing the problem. Maybe another solution would be to have the code that resets testnet difficulty after 20 minutes also reset the work value for pindexBestHeader so a new block with less work is able to replace it. I'm not sure where that happens, though, so I haven't been able to test it out.

It's also possible my theory isn't exactly correct. Currently, my pindexBestHash is set to a block with a height 2000 below the current tip, but the nChainWork is greater than the current tip. It doesn't show on any block explorers, so maybe it was set to a very high difficulty then orphaned.

In theory the same thing could happen on mainnet, but with the amount of mining power out there I can't imagine someone mining an orphaned block that still has more POW than the main chain 2000 blocks later.

Thoughts?

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Feb 11, 2016

Member

It's certainly possible for mainnet to have a chain with more blocks but lower total work.

Member

gmaxwell commented Feb 11, 2016

It's certainly possible for mainnet to have a chain with more blocks but lower total work.

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 11, 2016

Contributor

If that were the case, wouldn't the chain with more work be chosen as the active chain?

Contributor

jmacwhyte commented Feb 11, 2016

If that were the case, wouldn't the chain with more work be chosen as the active chain?

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Feb 11, 2016

Member

Yes, and that should be true for testnet too.

Member

gmaxwell commented Feb 11, 2016

Yes, and that should be true for testnet too.

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 11, 2016

Contributor

Can we therefore assume chainActive is what we always believe to the be the "correct" chain? If so, can we just check to see if chainActive is within 24 hours of us when doing the IsInitialBlockDownload check (instead of pindexBestHeader)? That's another solution I had thought of, but didn't know if it would have other implications on mainnet.

Contributor

jmacwhyte commented Feb 11, 2016

Can we therefore assume chainActive is what we always believe to the be the "correct" chain? If so, can we just check to see if chainActive is within 24 hours of us when doing the IsInitialBlockDownload check (instead of pindexBestHeader)? That's another solution I had thought of, but didn't know if it would have other implications on mainnet.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Feb 11, 2016

Member

chainActive is the currently best fully validated chain, always.

I think this issue can be fixed by using std::max(chainActive.Tip()->GetBlockTime(), pindexBestHeader->GetBlockTime()) instead of just pindexBestHeader->GetBlockTime().

Member

sipa commented Feb 11, 2016

chainActive is the currently best fully validated chain, always.

I think this issue can be fixed by using std::max(chainActive.Tip()->GetBlockTime(), pindexBestHeader->GetBlockTime()) instead of just pindexBestHeader->GetBlockTime().

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 12, 2016

Contributor

I think that's a great idea. As long as it only triggers until the node thinks it's close to finishing the initial block download, this method will accomplish its purpose. Either using the active chain or the best header should work.

I have updated the pull request.

Contributor

jmacwhyte commented Feb 12, 2016

I think that's a great idea. As long as it only triggers until the node thinks it's close to finishing the initial block download, this method will accomplish its purpose. Either using the active chain or the best header should work.

I have updated the pull request.

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Feb 13, 2016

Member

The nChainWork calculation can cause pindexBestHeader to be set with an old block that has more work than chainActive.Tip().

If it's an old block (ie, presumably one we have), shouldn't it be the active tip???

Member

luke-jr commented Feb 13, 2016

The nChainWork calculation can cause pindexBestHeader to be set with an old block that has more work than chainActive.Tip().

If it's an old block (ie, presumably one we have), shouldn't it be the active tip???

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 13, 2016

Contributor

It should be, but for some reason my node's BestHeader wasn't even in the
active chain. Where is the 20-minute difficulty reset rule defined? Is it
possible something in there is facilitating high-POW orphans?

On Fri, Feb 12, 2016 at 6:49 PM Luke-Jr notifications@github.com wrote:

The nChainWork calculation can cause pindexBestHeader to be set with an
old block that has more work than chainActive.Tip().

If it's an old block (ie, presumably one we have), shouldn't it be the
active tip???


Reply to this email directly or view it on GitHub
#7514 (comment).

Contributor

jmacwhyte commented Feb 13, 2016

It should be, but for some reason my node's BestHeader wasn't even in the
active chain. Where is the 20-minute difficulty reset rule defined? Is it
possible something in there is facilitating high-POW orphans?

On Fri, Feb 12, 2016 at 6:49 PM Luke-Jr notifications@github.com wrote:

The nChainWork calculation can cause pindexBestHeader to be set with an
old block that has more work than chainActive.Tip().

If it's an old block (ie, presumably one we have), shouldn't it be the
active tip???


Reply to this email directly or view it on GitHub
#7514 (comment).

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Feb 13, 2016

Member
Member

sipa commented Feb 13, 2016

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Feb 16, 2016

Member

Needs ACKs before merge

Member

laanwj commented Feb 16, 2016

Needs ACKs before merge

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Feb 23, 2016

Contributor

I will do some more research on the deeper problem and present what I can find, if anything.

Contributor

jmacwhyte commented Feb 23, 2016

I will do some more research on the deeper problem and present what I can find, if anything.

@jmacwhyte jmacwhyte changed the title from Fix IsInitialBlockDownload for testnet to [WIP] Fix IsInitialBlockDownload for testnet Feb 23, 2016

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Feb 24, 2016

Contributor

utACK

Contributor

dcousens commented Feb 24, 2016

utACK

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Mar 5, 2016

Member

I'm not convinced the current code is sufficient to solve all problems related to the wonkyness of testnet mining, but it won't hurt. utACK.

Member

sipa commented Mar 5, 2016

I'm not convinced the current code is sufficient to solve all problems related to the wonkyness of testnet mining, but it won't hurt. utACK.

@jmacwhyte

This comment has been minimized.

Show comment
Hide comment
@jmacwhyte

jmacwhyte Mar 11, 2016

Contributor

I'm no longer able to reproduce the problem, as my node's chain tip and best header are now in sync. I agree there is probably a deeper problem here, but I think the proposed changes are acceptable because they help testnet nodes act more like mainnet in some fringe cases without affecting anything other than isInitialBlockDownload (which should only matter when nodes are first getting started).

I have been running my node with the proposed changes and haven't noticed any adverse effects. Can I ACK my own proposal? :)

Contributor

jmacwhyte commented Mar 11, 2016

I'm no longer able to reproduce the problem, as my node's chain tip and best header are now in sync. I agree there is probably a deeper problem here, but I think the proposed changes are acceptable because they help testnet nodes act more like mainnet in some fringe cases without affecting anything other than isInitialBlockDownload (which should only matter when nodes are first getting started).

I have been running my node with the proposed changes and haven't noticed any adverse effects. Can I ACK my own proposal? :)

@laanwj laanwj changed the title from [WIP] Fix IsInitialBlockDownload for testnet to Fix IsInitialBlockDownload for testnet Apr 28, 2016

@laanwj laanwj merged commit 8aa7226 into bitcoin:master Apr 28, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Apr 28, 2016

Merge #7514: Fix IsInitialBlockDownload for testnet
8aa7226 Fix IsInitialBlockDownload to play nice with testnet (jmacwhyte)

@dgenr8 dgenr8 referenced this pull request Jan 10, 2017

Merged

Cherries #184

sickpig referenced this pull request in sickpig/BitcoinUnlimited Feb 27, 2017

Merge #7514: Fix IsInitialBlockDownload for testnet
8aa7226 Fix IsInitialBlockDownload to play nice with testnet (jmacwhyte)

gandrewstone referenced this pull request in BitcoinUnlimited/BitcoinUnlimited Mar 1, 2017

Merge pull request #318 from sickpig/port/core-pr-7514-Fix-IsInitialB…
…lockDownload-testnet

Port Core #7514: Fix IsInitialBlockDownload for testnet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment