Pruning regression test failures #476

ftrader · 2017-04-26T13:46:57Z

I've always had trouble executing the pruning.py test successfully. I'm opening this issue to sort them out, and see if others also experience troubles with that test.

To test, one needs to re-enable the test (it is currently Disabled in qa/pull-tester/rpc-tests.py).
Change the line which says:

Disabled('pruning', "too much disk"),

to simply read:

'pruning',

and then execute qa/pull-tester/rpc-tests.py pruning

The test takes a long time, which is why is was disabled even though it is already in the 'extended' test suite which doesn't get run automatically in Travis CI.

The failures I got in the past looked like timeout / connection related issues. Yesterday, I tried again based on latest dev branch HEAD, on a Debian 7 x86_64 system.
I've attached the failure output from the test as a file - the failure is slightly different than I've had before - this time an actual assertion in the client was triggered.
I will run the same test on release and post results, as well as execute it on an Ubuntu 16.10 VM for both release and dev.

I hope others can also try executing this test on a diverse range of systems, and report on whether it succeeds for them. Best to do it overnight, and remember to either set terminal scrollback to a sufficient length, or run it using script(1) to capture the output or otherwise redirect the output to a file for capturing.

20170426_pruning_fail_on_bu_dev_debian7.txt

The text was updated successfully, but these errors were encountered:

sickpig · 2017-04-26T15:49:06Z

I'm going to test it both on release and dev

ftrader · 2017-04-26T16:11:31Z

Testing 'release' branch on Debian7, it went into a loop after the test output
Mine 24 (stale) blocks on Node 1, followed by 25 (main chain) block reorg from Node 0, for 12 rounds
It seemed to be looping at these heights:
[1095, 1094, 1095]
with an output of such a line roughly every second. I might retry this later, but will first try Ubuntu 16.10.

Output log:
20170426_pruning_loop_on_bu_release_debian7.txt

ftrader · 2017-04-26T18:32:10Z

Test of 'release' (e78ce5e) looped after first step on Ubuntu 16.10 x86_64 (yakkety yak)

20170426_pruning_loop_on_bu_release_ubuntu16_10.txt

Test of 'dev' (b22d70e) failed in similar manner on Ubuntu 16.10 x68_64 as on Debian 7 x86_64:

20170426_pruning_fail_on_bu_dev_ubuntu16_10.txt

sickpig · 2017-04-26T20:03:29Z

release branch (current HEAD) test looped

[594, 594, 593]
[594, 594, 593]
[594, 594, 593]

Attached the debug log of the nodes used in the test

nodes_debug.tar.gz

sickpig · 2017-04-26T20:56:12Z

dev@HEAD

[1320, 1320, 1320]
Usage can be over target because of high stale rate: 638.6924505233765
Check that we can survive a 288 block reorg still
Current block height: 1320
Invalidating block at height: 1033 7599124c75931612b61d567c7e8d7be57756cb6b22e3e53b7de092e9295adb65
New best height 1032
Generating new longer chain of 300 more blocks
Reconnect nodes
[1320, 1332, 1320]
[1320, 1332, 1320]
[1320, 1332, 1033]
bitcoind: main.cpp:4894: void CheckBlockIndex(const Consensus::Params&): Assertion `foundInUnlinked' failed.
Unexpected exception caught during testing: ConnectionRefusedError(111, 'Connection refused')
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/test_framework/test_framework.py", line 184, in main
    self.run_test()
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/pruning.py", line 333, in run_test
    (self.forkheight,self.forkhash) = self.reorg_test() #(1033, )
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/pruning.py", line 192, in reorg_test
    sync_blocks(self.nodes[0:3])
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/test_framework/util.py", line 158, in sync_blocks
    counts = [ x.getblockcount() for x in rpc_connections ]
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/test_framework/util.py", line 158, in <listcomp>
    counts = [ x.getblockcount() for x in rpc_connections ]
  File "/home/sickpig/code/BU/bu-src/qa/rpc-tests/test_framework/coverage.py", line 49, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)

nodes_debug_log_dev.tar.gz

ftrader · 2017-04-26T23:23:03Z

A rollback to 6a06bda achieved a test PASS for sickpig (from Slack chat feedback).
I tried the same on my Ubuntu 16.10 VM, but wasn't successful on first run, neither on a repeat (test ongoing for Deb 7).
Here's the Ubuntu failure log of the first run (timeout):
20170426_pruning_fail_on_bu_6a06bda_ubuntu16_10.txt

The second failed run was the same failure, but at heights [1344, 1344, 1344].

sickpig · 2017-04-27T09:58:17Z

@ftrader tested again with repo rollback to 6a06bda and it passes

[1551, 1551, 1551]
[1552, 1551, 1551]
[1552, 1552, 1552]
Usage should be below target: 447.3477268218994
Test that we can rerequest a block we previously pruned if needed for a reorg
Will need to redownload block 1033
Rewind node 0 to prev main chain to mine longer chain to trigger redownload. Blocks needed: 233



Verify node 2 reorged back to the main chain, some blocks of which it had to redownload
Done
Stopping nodes
Cleaning up
Tests successful
Duration: 1895 s


Test                                                Status    Time (s)
----------------------------------------------------------------------
pruning.py                                          PASS         1895
----------------------------------------------------------------------
                                              Total time (s):    1895

1 test(s) passed / 0 test(s) failed / 1 test(s) executed
0 test(s) disabled / 0 test(s) skipped due to platform

sickpig · 2017-04-27T10:04:54Z

@ptschip @ftrader
just tested dev + #479 + #482 and got another failure:

[1320, 1295, 1306]
[1320, 1310, 1320]
[1320, 1320, 1320]
Usage can be over target because of high stale rate: 638.6925411224365
Check that we can survive a 288 block reorg still
Current block height: 1320
Invalidating block at height: 1033 492748de4c1b2b3a147a09bae348715dc34cb83284227290a18fcf9562101f44
New best height 1032
Generating new longer chain of 300 more blocks
Reconnect nodes
[1320, 1332, 1320]
[1320, 1332, 1320]
[1320, 1332, 1032]
bitcoind: main.cpp:4900: void CheckBlockIndex(const Consensus::Params&): Assertion `foundInUnlinked' failed.
Unexpected exception caught during testing: ConnectionRefusedError(111, 'Connection refused')
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/test_framework/test_framework.py", line 184, in main
    self.run_test()
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/pruning.py", line 333, in run_test
    (self.forkheight,self.forkhash) = self.reorg_test() #(1033, )
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/pruning.py", line 192, in reorg_test
    sync_blocks(self.nodes[0:3])
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/test_framework/util.py", line 158, in sync_blocks
    counts = [ x.getblockcount() for x in rpc_connections ]
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/test_framework/util.py", line 158, in <listcomp>
    counts = [ x.getblockcount() for x in rpc_connections ]
  File "/home/sickpig/Bitcoin/BU/bu-src/qa/rpc-tests/test_framework/coverage.py", line 49, in __call__
...
...
  File "/usr/lib/python3.5/socket.py", line 711, in create_connection
    raise err
  File "/usr/lib/python3.5/socket.py", line 702, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
Duration: 1729 s


Test                                                Status    Time (s)
----------------------------------------------------------------------
pruning.py                                          FAILED       1729
----------------------------------------------------------------------
                                              Total time (s):    1729

0 test(s) passed / 1 test(s) failed / 1 test(s) executed
0 test(s) disabled / 0 test(s) skipped due to platform

these are the debug.log
node_logs_dev.tar.gz

ftrader · 2017-04-27T11:04:25Z

Third run of 6a06bda on the Ubuntu VM failed as well, same timeout reason as before.
A run of 6a06bda on Debian 7 also failed with same timeout error.

Haven't tried with any PR patches yet.

ptschip · 2017-04-27T11:09:34Z

this test doesn't run on a VM...I get timeouts too...that's been know for a long time...the other issue though where it is not syncing the nodes is caused by a bug which is fixed in PR482

…

On 27/04/2017 4:04 AM, freetrader wrote: Third run of 6a06bda <6a06bda> on the Ubuntu VM failed as well, same timeout reason as before. A run of 6a06bda <6a06bda> on Debian 7 also failed with same timeout error. Haven't tried with any PR patches yet. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#476 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMRF0G26NGpzUcfZbT2bufK5d6TquFpRks5r0HY6gaJpZM4NI4u8>.

ftrader · 2017-04-27T11:19:28Z

Would be nice if we could have more people testing this on VMs, specifically the commit 6a06bda which passed here for @sickpig .
It could of course be something to do with my specific virtualization environment, but I don't see how that could easily generalize to various other VM environments.

ftrader · 2017-04-28T12:14:50Z

VM positive result: Test passed for commit 6a06bda on a cloud server running Debian 8 on QEMU.
Storage was SSD (plenty of free disk space), machine had 2GB RAM, one VCPU (Xeon E5-2697 v3 @ 2.60GHz) rated at 5199.99 bogomips.

Total time (s): 5890

ftrader · 2017-04-28T22:01:01Z

Also passed on the afore-mentioned Debian 8 QEMU box with release branch 1.0.1.4 (9d36bec) with the #482 commit a145250 merged in.

Total time (s): 5868

ftrader · 2017-05-12T15:42:42Z

I produced a block sync timeout failure on Core 0.14 branch too.

Raised issue 10393 on Bitcoin Core.

sickpig · 2017-08-14T16:56:38Z

just an update on that one. I'm still able to reproduce the problem on current dev

sickpig added beginner QA labels Apr 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pruning regression test failures #476

Pruning regression test failures #476

ftrader commented Apr 26, 2017 •

edited

sickpig commented Apr 26, 2017

ftrader commented Apr 26, 2017 •

edited

ftrader commented Apr 26, 2017 •

edited

sickpig commented Apr 26, 2017 •

edited

sickpig commented Apr 26, 2017

ftrader commented Apr 26, 2017 •

edited

sickpig commented Apr 27, 2017

sickpig commented Apr 27, 2017

ftrader commented Apr 27, 2017

ptschip commented Apr 27, 2017 via email

ftrader commented Apr 27, 2017

ftrader commented Apr 28, 2017 •

edited

ftrader commented Apr 28, 2017

ftrader commented May 12, 2017

sickpig commented Aug 14, 2017

Pruning regression test failures #476

Pruning regression test failures #476

Comments

ftrader commented Apr 26, 2017 • edited

sickpig commented Apr 26, 2017

ftrader commented Apr 26, 2017 • edited

ftrader commented Apr 26, 2017 • edited

sickpig commented Apr 26, 2017 • edited

sickpig commented Apr 26, 2017

ftrader commented Apr 26, 2017 • edited

sickpig commented Apr 27, 2017

sickpig commented Apr 27, 2017

ftrader commented Apr 27, 2017

ptschip commented Apr 27, 2017 via email

ftrader commented Apr 27, 2017

ftrader commented Apr 28, 2017 • edited

ftrader commented Apr 28, 2017

ftrader commented May 12, 2017

sickpig commented Aug 14, 2017

ftrader commented Apr 26, 2017 •

edited

ftrader commented Apr 26, 2017 •

edited

ftrader commented Apr 26, 2017 •

edited

sickpig commented Apr 26, 2017 •

edited

ftrader commented Apr 26, 2017 •

edited

ftrader commented Apr 28, 2017 •

edited