p2p-compactblocks.py fails randomly #8842

Closed
jl2012 opened this Issue Sep 29, 2016 · 12 comments

Projects

None yet

6 participants

@jl2012
Member
jl2012 commented Sep 29, 2016

It seems p2p-compactblocks.py fails randomly

From: https://travis-ci.org/bitcoin/bitcoin/jobs/163788152 from #8841
The PR is trivial and totally unrelated to compact block

p2p-compactblocks.py:
Initializing test directory /tmp/test3yd_2q3h/2
start_node: bitcoind started, waiting for RPC to come up
start_node: RPC succesfully started
MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11016
Testing SENDCMPCT p2p message... 
Testing compactblock headers and shortIDs are correct...
Assertion failed: 
Stopping nodes
Not cleaning up dir /tmp/test3yd_2q3h/2
Failed
stderr:
  File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 151, in main
    self.run_test()
  File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/p2p-compactblocks.py", line 614, in run_test
    self.test_compactblock_construction()
  File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/p2p-compactblocks.py", line 264, in test_compactblock_construction
    assert(self.test_node.last_cmpctblock is not None)
Pass: False, Duration: 7 s
@MarcoFalke
Member

Thanks for the issue! I saw this failure earlier at least twice (only on travis, though)

@MarcoFalke MarcoFalke added the Tests label Sep 29, 2016
@sdaftuar
Contributor

Oops, I'll take a look...

@MarcoFalke MarcoFalke closed this in #8854 Oct 1, 2016
@MarcoFalke
Member
@MarcoFalke MarcoFalke reopened this Oct 2, 2016
@sdaftuar
Contributor
sdaftuar commented Oct 3, 2016

Ugh, I will take another look, thanks for noticing.

@sdaftuar
Contributor
sdaftuar commented Oct 3, 2016

Ok I think I see what I did wrong in #8854 -- syncing with a ping doesn't ensure that all block announcements have made it to us, just that any that have been announced have been processed.

I'm working on a new fix (my idea is to replace that sync_with_ping() with logic to wait until the tip has been announced), but I'm having trouble reproducing the problem.

@MarcoFalke
Member

Please let me know if this is still an issue.

@MarcoFalke MarcoFalke closed this Oct 11, 2016
@jnewbery
Contributor

@sdaftuar - p2p-compactblocks just failed in a travis build for me: https://travis-ci.org/bitcoin/bitcoin/builds/170131004 . The next build (which was just a squash of the same commits) succeeded: https://travis-ci.org/bitcoin/bitcoin/builds/170143716 , so it appears there's still an intermittent problem with the compact blocks testcase. Sadly there doesn't appear to be much logging in travis and I can't reproduce this locally.

@jnewbery
Contributor
@ryanofsky
Contributor

Looked into the failure, but so far couldn't find a problem in the test setup that would cause it.

The failure happened in line 138 of the sync_blocks util function when the two test nodes failed to sync after 60 seconds. The place where the failing sync_blocks was called was line 809 of run_test after segwit was activated on node 1. Node 1 should have added more blocks than node 0 at that point, so some kind of failure must have happened to cause node 0 not to receive (or accept) the new blocks from node 1, and the sync_block function to time out.

Since I don't have any good theories on what would cause node 0 not to sync with node 1, I'm currently running the test in a loop to try to reproduce the problem.

I added some debug prints and made the following change to avoid a different (unrelated) intermittent test failure:

--- a/qa/rpc-tests/p2p-compactblocks.py
+++ b/qa/rpc-tests/p2p-compactblocks.py
@@ -300,8 +300,8 @@ class CompactBlocksTest(BitcoinTestFramework):
             assert(segwit_tx_generated) # check that our test is not broken

         # Wait until we've seen the block announcement for the resulting tip
-        tip = int(self.nodes[0].getbestblockhash(), 16)
-        assert(self.test_node.wait_for_block_announcement(tip))
+        tip = int(node.getbestblockhash(), 16)
+        assert(test_node.wait_for_block_announcement(tip))

         # Now mine a block, and look at the resulting compact block.
         test_node.clear_block_announcement()
@laanwj laanwj reopened this Oct 25, 2016
@ryanofsky
Contributor

Update: running the test overnight on my workstation showed the sync_blocks call returning true after 3 loop iterations in 1060 out of 1089 runs, and returning true after 2 loop iterations in the other 29 runs. It never got anywhere close to the 60 iterations that would have been required to cause the timeout.

@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this issue Oct 27, 2016
@ryanofsky ryanofsky Modify getblocktxn not to drop requests for old blocks
Current getblocktxn behavior of ignoring requests for old blocks might be
causing occasional sync timeouts during tests reported in
bitcoin#8842. Deal with this by sending block
responses instead of no responses at all.

Corresponding BIP-152 update in
ryanofsky/bips@getbloc.

Change suggested by Suhas Daftuar <sdaftuar@chaincode.com>.
6109d86
@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this issue Nov 1, 2016
@ryanofsky @ryanofsky ryanofsky + ryanofsky Modify getblocktxn not to drop requests for old blocks
Current getblocktxn behavior of ignoring requests for old blocks might be
causing occasional sync timeouts during tests reported in
bitcoin#8842. Deal with this by sending block
responses instead of no responses at all.

Corresponding BIP-152 update in
ryanofsky/bips@getbloc.

Change suggested by Suhas Daftuar <sdaftuar@chaincode.com>.
6116439
@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this issue Nov 3, 2016
@ryanofsky @ryanofsky ryanofsky + ryanofsky Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.
f6c7943
@ryanofsky ryanofsky added a commit to ryanofsky/bitcoin that referenced this issue Nov 7, 2016
@ryanofsky @ryanofsky ryanofsky + ryanofsky Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.
dac53b5
@laanwj laanwj added a commit that referenced this issue Nov 11, 2016
@laanwj laanwj Merge #9058: Fixes for p2p-compactblocks.py test timeouts on travis (#…
…8842)


dac53b5 Modify getblocktxn handler not to drop requests for old blocks (Russell Yanofsky)
55bfddc [qa] Fix stale data bug in test_compactblocks_not_at_tip (Russell Yanofsky)
47e9659 [qa] Fix bug in compactblocks v2 merge (Russell Yanofsky)
7977a11
@ryanofsky
Contributor

This issue should be closed now that #9058 is merged.

@laanwj laanwj closed this Nov 11, 2016
@laanwj
Member
laanwj commented Nov 11, 2016

Yes, let's be optimistic and close it :)

@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this issue Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
573b575
@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this issue Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
c65fb7c
@MarcoFalke MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this issue Nov 20, 2016
@ryanofsky @MarcoFalke ryanofsky + MarcoFalke Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Github-Pull: #9058
Rebased-From: dac53b5
Github-Pull: #9160
Rebased-From: ec34648
e846166
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 3, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
81ba25c
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 3, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
4f1f2b7
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 3, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
10b20ff
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 3, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
3a4ca41
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 6, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
bf6e954
@dagurval dagurval added a commit to dagurval/bitcoinxt that referenced this issue Jan 11, 2017
@ryanofsky @dagurval ryanofsky + dagurval Modify getblocktxn handler not to drop requests for old blocks
The current getblocktxn implementation drops and ignores requests for old
blocks, which causes occasional sync_block timeouts during the
p2p-compactblocks.py test as reported in
bitcoin/bitcoin#8842.

The p2p-compactblocks.py test setup creates many new blocks in a short
period of time, which can lead to getblocktxn requests for blocks below the
hardcoded depth limit of 10 blocks. This commit changes the getblocktxn
handler not to ignore these requests, so the peer nodes in the test setup
will reliably be able to sync.

The protocol change is documented in BIP-152 update "Allow block responses
to getblocktxn requests" at bitcoin/bips#469.

The protocol change is not expected to affect nodes running outside the test
environment, because there shouldn't normally be lots of new blocks being
rapidly added that need to be synced.

Conflicts:
	src/main.cpp
75de407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment