Fixes for p2p-compactblocks.py test timeouts on travis (#8842) #9058

Merged
merged 3 commits into from Nov 11, 2016

Projects

None yet

6 participants

@ryanofsky
Contributor
ryanofsky commented Nov 1, 2016 edited

Fixes for #8842. There are three commits:

  • Commit [qa] Fix bug in compactblocks v2 merge fixes a p2p-compactblocks.py timeout related to, but not exactly the same as the timeout reported in #8842.
  • Commit [qa] Fix stale data bug in test_compactblocks_not_at_tip adds a missing variable initialization that isn't currently a problem, but would lead to test failures after the next commit.
  • Commit Modify getblocktxn handler not to drop requests for old blocks is the actual fix for the timeout reported in #8842.
@fanquake fanquake added the Tests label Nov 2, 2016
ryanofsky added some commits Oct 24, 2016
@ryanofsky @ryanofsky ryanofsky [qa] Fix bug in compactblocks v2 merge
Bug caused the wait_for_block_announcement to be called on the wrong node,
leading to nondeterminism and occasional test failures. Bug was introduced in
merge commit:

d075479 Merge #8882: [qa] Fix race conditions in p2p-compactblocks.py and sendheaders.py

Underlying commits which conflicted were:

27acfc1 [qa] Update p2p-compactblocks.py for compactblocks v2
6976db2 [qa] Another attempt to fix race condition in p2p-compactblocks.py

The first commit changed the test_compactblock_construction function signature
and second commit added code which wasn't updated during the merge to use the
new arguments.

Suhas Daftuar <sdaftuar@chaincode.com> noticed the bug and suggested the fix.
47e9659
@ryanofsky @ryanofsky ryanofsky [qa] Fix stale data bug in test_compactblocks_not_at_tip
Clear test_node.last_block before requesting blocks in the
compactblocks_not_at_tip test so comparisons won't fail if a blocks were received
before the test started.

The bug doesn't currently cause any problems due to the order tests run, but
this will change in the next commit.
55bfddc
@laanwj laanwj added P2P and removed Tests labels Nov 3, 2016
@laanwj
Member
laanwj commented Nov 3, 2016

I'm not sure I understand this: is the protocol change needed just to fix the test?

Nit: this should not be labeled "tests" if it also involves a protocol change: if there is a protocol change that would be the primary focus, I was also confused by the title.

@ryanofsky
Contributor
ryanofsky commented Nov 3, 2016 edited

I'll add more detail to the third commit message. Making nodes always respond to getblocktxn requests instead of silently ignoring requests for old blocks (depth > 10) fixes a race in the syncing protocol that only happens in the test environment, because the test environment generates many new blocks very quickly, which wouldn't happen in the real world.

@TheBlueMatt
Contributor

Note that while this is incredibly unlikely on main net, due to the bursts there, this could conceivably happen on test net.

On November 3, 2016 6:20:13 AM EDT, Russell Yanofsky notifications@github.com wrote:

I'll add more detail to the third commit message. Making nodes always
respond to getblocktxn requests instead of silently ignoring requests
for old blocks (depth > 10) fixes a race in the syncing protocol that
only happens in the test environment, because the test environment
generates new blocks very quickly, which wouldn't happen in the real
world.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#9058 (comment)

@TheBlueMatt
Contributor

Concept ACK

On November 1, 2016 10:54:19 AM EDT, Russell Yanofsky notifications@github.com wrote:

See commit messages for details. There are two test fixes and a
protocol change. Corresponding BIP update is
bitcoin/bips#469.
You can view, comment on, or merge this pull request online at:

#9058

-- Commit Summary --

  • [qa] Fix bug in compactblocks v2 merge
  • [qa] Fix stale data bug in test_compactblocks_not_at_tip
  • Modify getblocktxn not to drop requests for old blocks

-- File Changes --

M qa/rpc-tests/p2p-compactblocks.py (18)
M src/main.cpp (12)

-- Patch Links --

https://github.com/bitcoin/bitcoin/pull/9058.patch
https://github.com/bitcoin/bitcoin/pull/9058.diff

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#9058

@ryanofsky
Contributor

Updated the commit message. For completeness, I'll mention that I did also implement an alternate fix for for the timeouts at ryanofsky@36692c9 (Get rid of MAX_GETBLOCKTXN_DEPTH constant).

@ryanofsky ryanofsky referenced this pull request in bitcoin/bips Nov 3, 2016
Merged

Allow block responses to getblocktxn requests #469

src/main.cpp
+ CInv vInv;
+ vInv.type = State(pfrom->GetId())->fWantsCmpctWitness ? MSG_WITNESS_BLOCK : MSG_BLOCK;
+ vInv.hash = req.blockhash;
+ pfrom->vRecvGetData.push_front(vInv);
@sdaftuar
sdaftuar Nov 7, 2016 Contributor

Perhaps this should be inserted at the end rather than the beginning of vRecvGetData, to maintain ordering with other block requests.

@ryanofsky
ryanofsky Nov 7, 2016 Contributor

Thanks, switched to push_back.

@ryanofsky @ryanofsky ryanofsky Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.
dac53b5
@sdaftuar
Contributor
sdaftuar commented Nov 9, 2016

utACK dac53b5

Note that the behavior this is changing could also be problematic for users on mainnet, say if you had an edge router whose network went down for a couple hours -- on resumption it might relay the first of a batch of blocks as a compact block, but then ignore the subsequent GETBLOCKTXN if its tip rapidly advances in the meantime as it catches up.

Also, I think this has been mentioned elsewhere, but this issue doesn't just affect the p2p-compactblocks.py test, but many other of our python tests as well (as an example, I've been dealing with sporadic failures due to this issue in the pruning test this evening).

@laanwj
Member
laanwj commented Nov 11, 2016

Going to merge this, as the random errors in the segwit test are starting to be really annoying.

@laanwj laanwj merged commit dac53b5 into bitcoin:master Nov 11, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@laanwj laanwj added a commit that referenced this pull request Nov 11, 2016
@laanwj laanwj Merge #9058: Fixes for p2p-compactblocks.py test timeouts on travis (#…
…8842)


dac53b5 Modify getblocktxn handler not to drop requests for old blocks (Russell Yanofsky)
55bfddc [qa] Fix stale data bug in test_compactblocks_not_at_tip (Russell Yanofsky)
47e9659 [qa] Fix bug in compactblocks v2 merge (Russell Yanofsky)
7977a11
LogPrint("net", "Peer %d sent us a getblocktxn for a block > %i deep", pfrom->id, MAX_BLOCKTXN_DEPTH);
+ CInv vInv;
@TheBlueMatt
TheBlueMatt Nov 11, 2016 Contributor

Oops, usually v refers to the type being a vector, but this is just a single inv.

@ryanofsky
ryanofsky Nov 14, 2016 Contributor

Thanks, corrected in #9160

@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Nov 14, 2016
@ryanofsky ryanofsky [trivia] Fix hungarian variable name
Follow up to comment bitcoin#9058 (comment)
1811547
@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Nov 14, 2016
@ryanofsky ryanofsky [trivial] Fix hungarian variable name
Follow up to comment bitcoin#9058 (comment)
ec34648
@sdaftuar
Contributor

@TheBlueMatt Should this be tagged for backport to 0.13?

@TheBlueMatt
Contributor

It's largely a fix for QA instability.... What is our normal process for that?

On November 19, 2016 1:50:27 PM PST, Suhas Daftuar notifications@github.com wrote:

@TheBlueMatt Should this be tagged for backport to 0.13?

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9058 (comment)

@sdaftuar
Contributor

Well usually QA instability is addressed by changing the tests, and I think we tend to be pretty liberal about backporting test changes.

Personally I think it makes sense to backport this behavior change, as I mentioned above I think this is basically a bugfix since it could be triggered in some mainnet configurations as well, and I think the patch here is small and low risk. Not sure if everyone would agree with my assessment though...?

@TheBlueMatt
Contributor

Yea, I'm OK with this being backported.

@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke [qa] Fix bug in compactblocks v2 merge
Bug caused the wait_for_block_announcement to be called on the wrong node,
leading to nondeterminism and occasional test failures. Bug was introduced in
merge commit:

d075479 Merge #8882: [qa] Fix race conditions in p2p-compactblocks.py and sendheaders.py

Underlying commits which conflicted were:

27acfc1 [qa] Update p2p-compactblocks.py for compactblocks v2
6976db2 [qa] Another attempt to fix race condition in p2p-compactblocks.py

The first commit changed the test_compactblock_construction function signature
and second commit added code which wasn't updated during the merge to use the
new arguments.

Suhas Daftuar <sdaftuar@chaincode.com> noticed the bug and suggested the fix.

Github-Pull: #9058
Rebased-From: 47e9659
2ba5d78
@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke [qa] Fix stale data bug in test_compactblocks_not_at_tip
Clear test_node.last_block before requesting blocks in the
compactblocks_not_at_tip test so comparisons won't fail if a blocks were received
before the test started.

The bug doesn't currently cause any problems due to the order tests run, but
this will change in the next commit.

Github-Pull: #9058
Rebased-From: 55bfddc
286e548
@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
573b575
@MarcoFalke MarcoFalke referenced this pull request Nov 20, 2016
Merged

[qa] 0.13.2 Backports #9191

@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
c65fb7c
@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
e846166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment