Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow initial synchronization #235

Closed
adamkrellenstein opened this issue Aug 14, 2014 · 28 comments
Closed

Slow initial synchronization #235

adamkrellenstein opened this issue Aug 14, 2014 · 28 comments
Assignees

Comments

@adamkrellenstein
Copy link
Member

@mcelrath: "The counterparty database is exceedingly slow to synchronize with the bitcoin database. It is taking about 35s per block, and given where it is in the blockchain, this will take 12 days to synchronize. It appears this is being caused by bitcoind doing I/O, not counterpartyd doing processing (my CPU is only about 10% loaded). Could counterparty be optimized to not hammer bitcoind so hard? Could bitcoind be optimized to not hammer the disk so hard?" (https://github.com/CounterpartyXCP/Counterparty/issues/5)

See also #128.

@adamkrellenstein
Copy link
Member Author

We haven't done much profiling here, so I'm not sure of where the bottlenecks are.

It's possible that parallelising the requests may help (see #128).

Or we could use a library to read the block files? (#39)

@ouziel-slama
Copy link
Contributor

Some months ago I ran blocks following with cProfile: more than 90% of the time is taken by the JSON RPC open connection.. So yes use a library to read the block files (#39) and avoid Json RPC for the initial synchro seems to be the best solution.

@mcelrath
Copy link

I've created an initial implementation using python 3.4's new asyncio and the aiohttp library instead of requests. It appears to be 2-3 times faster at initial synchronization, but this is heavily dependent on disk caching. It needs further testing, especially verifying that I haven't screwed up anything else in the process. I don't see how to run test/test_.py...

Ultimately, the initial sync is I/O bound, and bitcoind is the bottleneck. While a 2x-3x improvement is nice, we're still talking about many days to sync the initial database, which I think is unacceptable. A better solution would be to read the bitcoin blocks directly, bypassing bitcoind.

https://github.com/mcelrath/counterpartyd/tree/asyncio

@adamkrellenstein
Copy link
Member Author

That's a start!

You run the test suite with py.test in the counterpartyd/ directory, with counterpartyd --testnet running in the background.

@adamkrellenstein
Copy link
Member Author

Also, for those interested, you can bootstrap the database with a file from here: http://bootstrap.counterparty.co/

@adamkrellenstein
Copy link
Member Author

@mcelrath, would you mind making a pull request for these changes (from develop)?

@mcelrath
Copy link

I saw that in the README.md but there is no py.test in the repository.

I'd be happy to submit a pull request but I'd like to test it a bit more and make sure I haven't screwed up something else. Also, if you guys are happy depending on python 3.4 and aiohttp (which is an external package... asyncio is part of python 3.4) then there's a strong argument for dumping Tornado in favor of it for the API. I'm not entirely sure the existence of both in my repository hasn't screwed up the API, as they both have event loops.

@robby-d
Copy link
Contributor

robby-d commented Aug 17, 2014

Thanks a lot for this. For anyone interested in the challenge, reading the block files directly would be great too (and we could use it as the preferred alternative to RPC-based sync if the block files were available).

@adamkrellenstein
Copy link
Member Author

py.test comes from a separate package: pytest.

@xnova, what do you think about depending on Python 3.4, aiohttp?

@robby-d
Copy link
Contributor

robby-d commented Aug 18, 2014

if we have to depend on python 3.4, bit may break windows compatibility. there is an APSW compiled for py3.4, but there is not a pycrypto I can find (at least from voidspace) compiled for py3.4 (only py3.3)

EDIT: this guy talks about how he made one from scratch: http://flintux.wordpress.com/2014/04/30/pycrypto-for-python-3-4-on-windows-7-64bit/

This is an issue short term at least... I'm going to see how using docker under windows works out...if it's really slick I think I may want to move all windows people to just use that instead of the native windows builds...but otherwise we can keep the windows builds

@mcelrath
Copy link

I got pytest installed, but the vast majority of tests fail even on master and develop. :-( (Only 5/43 pass) poke poke.

I know python 3.4 is still pretty new, the windows packages will appear eventually. This problem of syncing to the bitcoin database will become worse with time though, but it can be merged later. It certainly would help adoption.

I'm going to take a stab at reading the bitcoin dat files directly too. I doubt counterparty will be synced to my bitcoind by tomorrow, even with my asyncio changes. ;-)

@adamkrellenstein
Copy link
Member Author

Run py.test with -x so it just stops after the first failure. What's the error? All of the tests pass for me.

That'd be great if you could make it read the block files directly!

@mcelrath
Copy link

Arg, I think it's failing because my database isn't caught up with bitcoind. :-(

@ghost
Copy link

ghost commented Aug 18, 2014

Guys I created this soon after the problems with the 64-bit of Python surfaced. It just needs to be "officialed" if there's a desire to support the building of the rare & unavailable binary Python packages which may become taxing.
https://wiki.counterparty.co/w/Counterparty_with_64-bit_Python_3.4

@mcelrath
Copy link

I'm having quite a difficult time finding a way to parse bitcoind's blk?????.dat files in python. bitcoin-abe has its code deeply tied with the SQL database it dumps into, and separating them would be difficult. There don't seem to be a whole lot of other options. Short of writing one from scratch, anyone have any ideas?

@adamkrellenstein
Copy link
Member Author

What about python-libbitcoin, or libbitcoin (through Obelisk)?

@mcelrath
Copy link

python-libbitcoin only talks to bitcoind, it does not read the blk?????.dat files directly. python-obelisk looks promising, thanks!

@robby-d
Copy link
Contributor

robby-d commented Aug 19, 2014

@mcelrath
Copy link

BTW your wiki says it takes 7-12 hours to sync with bitcoind. Is that old info? Or do I have a really slow hard drive? (probably both...)

@adamkrellenstein
Copy link
Member Author

I'm pretty sure that Armory requires two on-disk copies of the blockchain.

@adamkrellenstein
Copy link
Member Author

@mcelrath, I don't know how long it takes these days, though the time should be roughly proportional to the number of Bitcoin (not Counterparty) transactions since block 278000.

@mcelrath
Copy link

FWIW, I've printed out the times to load blocks, on my computer it's 5.89s (wall time) per block (average over the last 28k blocks or so). That comes out to 2.6 days using my asyncio code at the current block height. (A lot better than my estimate using the original code, but still...) This should finish by tomorrow morning and I can run the tests and give you guys a pull request.

@ouziel-slama
Copy link
Contributor

maybe this can help :
https://github.com/gavinandresen/bitcointools/blob/master/dbdump.py

2014-08-19 21:44 GMT+01:00 Bob McElrath notifications@github.com:

FWIW, I've printed out the times to load blocks, on my computer it's 5.89s
(wall time) per block (average over the last 28k blocks or so). That comes
out to 2.6 days using my asyncio code at the current block height. (A lot
better than my estimate using the original code, but still...) This should
finish by tomorrow morning and I can run the tests and give you guys a pull
request.


Reply to this email directly or view it on GitHub
#235 (comment)
.

@ouziel-slama
Copy link
Contributor

see also this:
https://github.com/joric/pyblockchain/blob/master/pyblockchain.py

2014-08-19 22:09 GMT+01:00 Ouziel Slama lightzarlboro@gmail.com:

maybe this can help :
https://github.com/gavinandresen/bitcointools/blob/master/dbdump.py

2014-08-19 21:44 GMT+01:00 Bob McElrath notifications@github.com:

FWIW, I've printed out the times to load blocks, on my computer it's 5.89s

(wall time) per block (average over the last 28k blocks or so). That comes
out to 2.6 days using my asyncio code at the current block height. (A lot
better than my estimate using the original code, but still...) This should
finish by tomorrow morning and I can run the tests and give you guys a pull
request.


Reply to this email directly or view it on GitHub
#235 (comment)
.

@robby-d
Copy link
Contributor

robby-d commented Aug 22, 2014

FYI, the windows build is updated to use python 3.4.1, so we should be good here from a windows standpoint

@mcelrath
Copy link

I've now changed the usage of requests in api.py and util.py to use aiohttp instead. However I've wrapped asynchronous calls in these to behave synchronously.

What this all means is that if you call functions in bitcoin.py, they may return a generator, which asynchronously waits for results from bitcoind so counterpartyd can go do something else while it's waiting. Bitcoind has by default 4 threads handling RPC calls (you can raise this with its rpcthreads parameter in bitcoin.conf). To get these calls to behave synchronously, wrap these calls in the new function util.aiorun(x, timeout). See lots of examples in api.py and util.py. Long story short, this allows counterpartyd to run bitcoind as fast as it can go, so it isn't limited by one request at a time. This mostly affects blocks.list_tx and blocks.follow which are now asynchronous coroutines.

A separate question is should this asynchronicity be extended to counterpartyd's API Server (possibly replacing tornado). I don't think counterpartyd will be performance limited in the foreseeable future, and we can cross that bridge when we come to it.

All tests pass except test_log and test_json_rpc but those don't pass for me on the develop branch either, and the former is related to what's below...

Remaining to do before a Pull Request:

  1. Fix logging spam with --verbose (conflicting event loops, I think)

@weex
Copy link

weex commented Sep 1, 2014

The -dbcache switch to bitcoind may help since it increases the amount of memory bitcoind allocates to database caching so reduces disk io. Using it with 128 or 256 should help but it would be interesting to know the exact speedup on an initial sync from bootstrap.dat stock vs 256.

ouziel-slama pushed a commit to ouziel-slama/counterpartyd that referenced this issue Oct 23, 2014
ouziel-slama pushed a commit to ouziel-slama/counterpartyd that referenced this issue Oct 23, 2014
@adamkrellenstein
Copy link
Member Author

In progress solution: #362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants