Python P2P testing framework #5981

sdaftuar · 2015-04-07T18:19:36Z

Motivated by the discussion in #4545, I've been working on building a python testing framework that will allow us to write tests that exercise the p2p code in bitcoind, so that we can perform end-to-end testing and perform comparisons of behavior across bitcoind versions. This code builds on the existing RPC testing functionality, so that tests using both RPC calls and p2p messages can be written.

I've split this pull into 3 commits to try to make this easier to review.

The first commit adds a file called mininode.py, which I grabbed from jgarzik's mini-node branch of the pynode repository, and modified. This file does a few things:

Defines a class, NodeConn, which manages connectivity to a bitcoin node.
Defines a callback prototype, NodeConnCB, which is used for message delivery. The idea is that you can write a class that inherits from NodeConnCB and pass it in to a NodeConn object, and then be notified when events of interest arrive.
Defines all the data structures from bitcoin core that pass over the network, eg CBlock, CTransaction, etc.
Defines all the serialization/deserialization code.

It's possible to write (crude) tests using mininode.py alone, if you want. Also, mininode supports testing outside of regtest (it can communicate on testnet and mainnet as well), so that theoretically allows for a broader category of testing than we currently can do. All the tests I've been working on have been focused on regtest, however.

Also, in the first commit, I provide one example test (maxblocksinflight.py) that shows how you can use mininode. This is largely a proof-of-concept example, but it does test something useful: 0.10 used to fail this test until #5507 was merged.

The second commit adds support for a comparison-tool style testing framework, similar to the comparison-tool framework that we use from bitcoinj in the pull-tester. The code in this commit is designed to add structure to test writing, so that tests are both easier to read and write (compared to the free-form structure of maxblocksinflight.py). I include in this commit one example test written using the framework, which tests the processing of two different types of invalid blocks.

EDIT: I forgot to mention that I use py-leveldb in blockstore.py, a file I introduced in this commit that provides disk-backed storage for blocks. I think I had to manually install that package on my machine, so I wanted to flag this dependency in case others think this would be a problem.

The third commit adds some script processing routines (script.py and its dependency, bignum.py -- the latter could probably be removed with a little work) which I have copied and modified from python-bitcoinlib. With these additional tools, I was able to write a test, script_test.py, which reads all the script tests from the unit test data directory (script_valid.json and script_invalid.json) and inserts them in blocks delivered to two nodes, and for each test case compares whether the nodes agree on whether the block containing the test transaction passes consensus checks. This test is very slow (perhaps 40 minutes to run), so this is largely a proof of concept that demonstrates the kinds of tests we ought to be able to write. (Unfortunately, this test makes use of RPC calls that only have existed since 0.10, so it is not possible to run this test in its current form to compare 0.9-or-older versions).

If this framework looks okay, then I think a future project would be to re-implement the pull-tester's comparison test in this framework.

I realize this is a lot of code, so if there's anything more I can do to aid in review please let me know.

jonasschnelli · 2015-04-07T18:29:13Z

Nice!
conceptual ACK.
Have plans to test this soon.

laanwj · 2015-04-08T06:24:29Z

Nice work!

I forgot to mention that I use py-leveldb in blockstore.py

I'd rather not add any external dependencies for the tests. Do we need any kind of persistent block storage in the tests? If we really do, I'd rather hack something together with python.

sdaftuar · 2015-04-09T14:38:13Z

@laanwj I'd prefer to leave the disk-backed storage there, even though it's probably not needed for the existing tests I've included in this pull, because it will be needed to write longer tests like the comparison test which currently runs in the pull-tester.

I pushed up a commit that replaces leveldb with dbm, which I believe is a standard python package; does that seem like an acceptable option?

laanwj · 2015-04-09T15:42:43Z

Sure, using a python standard package is fine, my gripe was with using an external dependency. Using disk-backed storage is no problem as long as subsequent tests don't interfere with each other.

laanwj · 2015-04-16T09:33:56Z

Maybe add the new tests (whenever they don't take too long) to the pull tester e.g. qa/pull-tester/rpc-tests.sh. For example invalidblockrequest.py seems to finish quickly.

The bipdersig-p2p.py test seems to hang indefinitely here at:

Initializing test directory /tmp/testBxTNIB
MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11916
Test 1: PASS [98]

sdaftuar · 2015-04-17T17:39:18Z

@laanwj The bipdersig-p2p.py test was failing because this needed rebase, due to the setgenerate ()/generate() regtest rpc change (now fixed).

I've also shortened maxblocksinflight.py, and added both it and invalidblockrequest.py to the pull tester's rpc tests.

EDIT: I see that this errored out in travis; will investigate.

laanwj · 2015-04-21T08:05:48Z

Ah yes, the generate switcharoo. I think we should add an explicit error when setgenerate() is used on regtest, so that bugs like this are tripped up immediately instead of through vague timeouts.

The travis error seems to occur on 64-bit Linux - no detailed information unfortunately

Running testscript maxblocksinflight.py...

Initializing test directory /tmp/testIyXJqc

No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

laanwj · 2015-04-28T08:24:27Z

I'd really like to merge this. Do you have trouble solving the travis error? If so, maybe @theuni can help.

sdaftuar · 2015-04-28T16:35:43Z

This needed to be rebased anyway, so I did that and now travis succeeds. I'm not sure what to make of that, since I didn't actually fix anything, and it seems odd to think that this was just a spurious travis failure that happened to only affect a new test I've introduced in this pull.

I'll try rebasing again, since I wanted to fold the 4th and 5th commits in anyway, and when I push we'll get one more data point about how travis does with the new tests.

mininode.py provides a framework for connecting to a bitcoin node over the p2p network. NodeConn is the main object that manages connectivity to a node and provides callbacks; the interface for those callbacks is defined by NodeConnCB. Defined also are all data structures from bitcoin core that pass on the network (CBlock, CTransaction, etc), along with de-/serialization functions. maxblocksinflight.py is an example test using this framework that tests whether a node is limiting the maximum number of in-flight block requests. This also adds support to util.py for specifying the binary to use when starting nodes (for tests that compare the behavior of different bitcoind versions), and adds maxblocksinflight.py to the pull tester.

comptool.py creates a tool for running a test suite on top of the mininode p2p framework. It supports two types of tests: those for which we expect certain behavior (acceptance or rejection of a block or transaction) and those for which we are just comparing that the behavior of 2 or more nodes is the same. blockstore.py defines BlockStore and TxStore, which provide db-backed maps between block/tx hashes and the corresponding block or tx. blocktools.py defines utility functions for creating and manipulating blocks and transactions. invalidblockrequest.py is an example test in the comptool framework, which tests the behavior of a single node when sent two different types of invalid blocks (a block with a duplicated transaction and a block with a bad coinbase value).

script.py is modified from the code in python-bitcoinlib, and provides tools for manipulating and creating CScript objects. bignum.py is a dependency for script.py script_test.py is an example test that uses the script tools for running a test that compares the behavior of two nodes, in a comptool- style test, for each of the test cases in the bitcoin unit test script files, script_valid.json and script_invalid.json. (This test is very slow to run, but is a proof of concept for how we can write tests to compare consensus-critical behavior between different versions of bitcoind.) bipdersig-p2p.py is another example test in the comptool framework, which tests deployment of BIP DERSIG for a single node. It uses the script.py tools for manipulating signatures to be non-DER compliant.

sdaftuar · 2015-04-28T19:56:18Z

Well, on the first attempt, travis failed again with the 10-minute no-data-received timeout, but this time it was not involving the new p2p tests (it died while the java comparison tool was running). I amended the last commit to force a re-run, and it ran successfully, so now I am a little more willing to believe that these may just be random travis failures that I'm encountering, and not a problem in the code introduced here.

theuni · 2015-04-28T20:00:13Z

I came too late.. what was the original error?
Edit: nevermind, posted same time.

Adds printing to the console before/after calls to bitcoin-cli -rpcwait, if the PYTHON_DEBUG environment variable is initialized.

sdaftuar · 2015-04-29T14:57:32Z

After discussing with @theuni on IRC, I added a commit that adds some optional debugging (which I enabled for travis) to calls to bitcoin-cli -rpcwait (inside util.py), and then another addressing the bug he caught with the default binary names.

laanwj · 2015-04-29T16:03:39Z

ACK

2703412 Fix default binary in p2p tests to use environment variable (Suhas Daftuar) 29bff0e Add some travis debugging for python scripts (Suhas Daftuar) d76412b Add script manipulation tools for use in mininode testing framework (Suhas Daftuar) b93974c Add comparison tool test runner, built on mininode (Suhas Daftuar) 6c1d1ba Python p2p testing framework (Suhas Daftuar)

jonasschnelli · 2015-04-30T13:06:24Z

1st: this is impressive work. Nice!
I think the quick merge honors this job.

Nevertheless here are some first post-ACK reports (more detailed report will follow):

Running maxblocksinflight.py does fail on my local machine:

Running testscript maxblocksinflight.py...
Initializing test directory /var/folders/hp/kb9p9q8x4k3_z_ccy588hxrc0000gn/T/testVtWBqX
MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11650
Round 0: success (total requests: 8)
Round 1: success (total requests: 16)
Round 2: success (total requests: 16)
Round 3: success (total requests: 16)
Stopping nodes
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/mininode.py", line 1237, in run
    asyncore.loop(0.1, True)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/asyncore.py", line 216, in loop
    poll_fun(timeout, map)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/asyncore.py", line 145, in poll
    r, w, e = select.select(r, w, e, timeout)
error: (9, 'Bad file descriptor')

same for invalidblockrequest.py

Running testscript invalidblockrequest.py...
Initializing test directory /var/folders/hp/kb9p9q8x4k3_z_ccy588hxrc0000gn/T/testqJOevS
MiniNode: Connecting to Bitcoin Node IP # 127.0.0.1:11506
Test 1: PASS [1]
Test 2: PASS [101]
Assertion failed: Test failed at test 3
  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/test_framework.py", line 119, in main
    self.run_test()
  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/invalidblockrequest.py", line 39, in run_test
    test.run()
  File "/Users/jonasschnelli/Documents/bitcoin/_bitcoin/qa/rpc-tests/comptool.py", line 284, in run
    raise AssertionError("Test failed at test %d" % test_number)
Stopping nodes

jonasschnelli · 2015-04-30T13:31:39Z

qa/rpc-tests/invalidblockrequest.py

+    ''' Can either run this test as 1 node with expected answers, or two and compare them. 
+        Change the "outcome" variable from each TestInstance object to only do the comparison. '''
+    def __init__(self):
+        self.num_nodes = 1


Is it somehow possible to set this to 2 if --refbinary is set?

sdaftuar · 2015-04-30T13:38:44Z

@jonasschnelli Thanks, and thanks for running the tests too. Looks to me like the first error message is happening in how the test cleans up, that probably just needs to be made more robust; I'll take a look. The second error message is more concerning though, as the substantive part of the test is failing.

Could you run invalidblockrequest.py --tracerpc (which will produce a lot of debug output, basically every p2p message sent or received, along with the rpc calls) and upload it somewhere for me to analyze? That should help me debug what's going on here.

jonasschnelli · 2015-04-30T14:25:21Z

qa/rpc-tests/mininode.py

+
+class NetworkThread(Thread):
+    def run(self):
+        asyncore.loop(0.1, True)


After discussion this on IRC it turned out, that OSes without polling support (OSX) have problems with timeouts below 1 because they always use select.select().
Should be changed to asyncore.loop(1, True)

jonasschnelli · 2015-04-30T14:32:54Z

Post merge ACK

sdaftuar · 2015-04-30T15:19:50Z

@jonasschnelli FYI - I did some more digging, and I think I've identified a race condition in the invalidblockrequest.py test (which you triggered the first time you ran, and got the assertion failure at test 3).

I'll PR a cleanup of these issues shortly.

laanwj added the Tests label Apr 8, 2015

sdaftuar force-pushed the python-testing-framework-4 branch from 6070e7b to 9fbbdb9 Compare April 11, 2015 09:58

sdaftuar force-pushed the python-testing-framework-4 branch from 9fbbdb9 to cde34ff Compare April 17, 2015 16:58

sdaftuar force-pushed the python-testing-framework-4 branch 2 times, most recently from b7f3812 to 96c34c1 Compare April 28, 2015 16:07

sdaftuar added 2 commits April 28, 2015 12:38

sdaftuar force-pushed the python-testing-framework-4 branch from 96c34c1 to 17f463d Compare April 28, 2015 18:43

sdaftuar force-pushed the python-testing-framework-4 branch from 17f463d to d76412b Compare April 28, 2015 19:13

sdaftuar added 2 commits April 28, 2015 16:51

Add some travis debugging for python scripts

29bff0e

Adds printing to the console before/after calls to bitcoin-cli -rpcwait, if the PYTHON_DEBUG environment variable is initialized.

Fix default binary in p2p tests to use environment variable

2703412

laanwj merged commit 2703412 into bitcoin:master Apr 30, 2015

jonasschnelli reviewed Apr 30, 2015
View reviewed changes

This was referenced Apr 30, 2015

P2P testing cleanups #6089

Merged

Python P2P testing bugfixes: rework locking to eliminate race conditions and fix comptool.py #6094

Merged

petertodd mentioned this pull request May 3, 2015

Mempool only CHECKLOCKTIMEVERIFY (BIP65) verification #5496

Closed

jonasschnelli mentioned this pull request May 8, 2015

[QA] restructure rpc tests directory #6097

Merged

sdaftuar mentioned this pull request May 10, 2015

Build tool failures due to use of un-maintained script #6119

Closed

sdaftuar mentioned this pull request Nov 16, 2015

Remove unmaintained example test script_test.py #7029

Merged

laanwj mentioned this pull request Dec 8, 2015

Comparison tool successor? #4545

Closed

jnewbery mentioned this pull request Jun 15, 2017

Retire the comparison test framework #10603

Closed

8 tasks

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python P2P testing framework #5981

Python P2P testing framework #5981

sdaftuar commented Apr 7, 2015

jonasschnelli commented Apr 7, 2015

laanwj commented Apr 8, 2015

sdaftuar commented Apr 9, 2015

laanwj commented Apr 9, 2015

laanwj commented Apr 16, 2015

sdaftuar commented Apr 17, 2015

laanwj commented Apr 21, 2015

laanwj commented Apr 28, 2015

sdaftuar commented Apr 28, 2015

sdaftuar commented Apr 28, 2015

theuni commented Apr 28, 2015

sdaftuar commented Apr 29, 2015

laanwj commented Apr 29, 2015

jonasschnelli commented Apr 30, 2015

jonasschnelli Apr 30, 2015

sdaftuar commented Apr 30, 2015

jonasschnelli Apr 30, 2015

jonasschnelli commented Apr 30, 2015

sdaftuar commented Apr 30, 2015

Python P2P testing framework #5981

Python P2P testing framework #5981

Conversation

sdaftuar commented Apr 7, 2015

jonasschnelli commented Apr 7, 2015

laanwj commented Apr 8, 2015

sdaftuar commented Apr 9, 2015

laanwj commented Apr 9, 2015

laanwj commented Apr 16, 2015

sdaftuar commented Apr 17, 2015

laanwj commented Apr 21, 2015

laanwj commented Apr 28, 2015

sdaftuar commented Apr 28, 2015

sdaftuar commented Apr 28, 2015

theuni commented Apr 28, 2015

sdaftuar commented Apr 29, 2015

laanwj commented Apr 29, 2015

jonasschnelli commented Apr 30, 2015

jonasschnelli Apr 30, 2015

Choose a reason for hiding this comment

sdaftuar commented Apr 30, 2015

jonasschnelli Apr 30, 2015

Choose a reason for hiding this comment

jonasschnelli commented Apr 30, 2015

sdaftuar commented Apr 30, 2015