Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libevent-based http server #5677

Merged
merged 16 commits into from
Sep 4, 2015
Merged

libevent-based http server #5677

merged 16 commits into from
Sep 4, 2015

Conversation

laanwj
Copy link
Member

@laanwj laanwj commented Jan 18, 2015

  • Replace usage of boost::asio with libevent2. boost::asio is not part of C++11, so unlike other boost there is no forwards-compatibility reason to stick with it. Together with Convert entire source tree from json_spirit to UniValue. #4738 (convert json_spirit to UniValue), this rids Bitcoin Core of the worst offenders with regard to compile-time slowness.
  • Replace spit-and-duct-tape http server with evhttp. Front-end http handling is handled by libevent, a work queue (with configurable depth and parallelism) is used to handle application requests.
  • Wrap HTTP request in C++ class; this makes the application code mostly HTTP-server-neutral
  • Refactor RPC to move all http-specific code to a separate file. Theoreticaly this can allow building without HTTP server but with another RPC backend, e.g. Qt's debug console (currently not implemented) or future RPC mechanisms people may want to use.
  • HTTP dispatch mechanism; services (e.g., RPC, REST) register which URL paths they want to handle.

By using a proven, high-performance asynchronous networking library (also used by Tor) and HTTP server, problems such as #5674, #5655, #344 should be avoided.

What works? bitcoind, bitcoin-cli, bitcoin-qt. Unit tests and RPC/REST tests pass. The aim for now is everything but SSL support.

Configuration options:

  • -rpcthreads: repurposed as "number of work handler threads". Still defaults to 4.
  • -rpcworkqueue: maximum depth of work queue. When this is reached, new requests will return a 500 Internal Error.
  • -rpctimeout: inactivity time, in seconds, after which to disconnect a client.
  • -debug=http: low-level http activity logging

(due to the separation of RPC and HTTP server, renaming these options may make sense, but I've kept this out of backwards compatiblity)

TODO:

  • Build system (currently hardcodes libraries, so this will definitely not pass Travis) (thanks @theuni)
  • REST and RPC register their own request handlers respectively
  • Qt debug console must register a RPCTimerInterface (to make timeouts in the debug console work with -server=0)
  • Interrupt/Shutdown flow needs to be cleaned up
  • [warn] event_active: event has no event_base set. appears sometimes to the console. Seems to be harmless, but it is weird (see @ajweiss comments)

@theuni
Copy link
Member

theuni commented Jan 20, 2015

Very nice! Before looking over the work itself, I wanted to be sure that libevent was viable for all of our build targets.

See here for the build-system work. This should be enough to get Travis passing, I'd think:
https://github.com/theuni/bitcoin/commits/5677

@laanwj
Copy link
Member Author

laanwj commented Jan 20, 2015

@cfields Will pull that in, thanks a lot!

@luke-jr
Copy link
Member

luke-jr commented Jan 20, 2015

Hm, nothing special needed for longpolling?

@laanwj
Copy link
Member Author

laanwj commented Jan 20, 2015

@luke-jr I don't think so. The current implementation should work. Of course it would be more optimal to release the worker thread while longpolling, and change "new block" it into an event-trigger, but I leave that as a challenge for later.

@laanwj laanwj force-pushed the 2015_01_evhttpd branch 2 times, most recently from c527672 to 8d45fe4 Compare January 20, 2015 05:53
@laanwj
Copy link
Member Author

laanwj commented Jan 20, 2015

The fail in "32-bit + dash" is strange

FAIL: qt/test/test_bitcoin-qt

I'm not sure how this can be affected at all (passes fine here), but I'll check.

On Win32/64 it still tries to link against libevent_pthread. IIRC there is no specific thread library for windows, evthread_use_windows_threads is part of the core library there.

/usr/bin/x86_64-w64-mingw32-ld: cannot find -levent_pthreads

@theuni
Copy link
Member

theuni commented Jan 20, 2015

Blah, sorry, missed that one.

@laanwj
Copy link
Member Author

laanwj commented Jan 20, 2015

It's easy to miss those sneaky qt unit tests. Windows passes now!

That leaves the 32-bit + dash case, which is not an intermittent issue

FAIL! : PaymentServerTests::paymentServerTests() Compared values are not the same
Actual (merchant):
Expected (QString("testmerchant.org")): testmerchant.org
Loc: [qt/test/paymentservertests.cpp(84)]

... no clue how this happens yet. The Qt tests don't use the RPC mechanism. My gut feeling is some interaction with OpenSSL which, absent verification, is now an indirect dependency through Qt? Not sure, and why it only happens in this test is a mystery to me.

@theuni
Copy link
Member

theuni commented Jan 20, 2015

@laanwj Yes, it has some interaction with qt:

  • ./configure --with-gui=qt4: fine.
  • ./configure --with-gui=qt5: fine.
  • make -C depends; ./configure --prefix=pwd/depends/x86_64-unknown-linux-gnu: busted
  • make -C depends USE_LINUX_STATIC_QT5=1; ./configure --prefix=pwd/depends/x86_64-unknown-linux-gnu: fine.

@jonasschnelli
Copy link
Contributor

tested gitian build.
Binaries to test: https://builds.jonasschnelli.ch/pulls/5677/

@jtimon
Copy link
Contributor

jtimon commented Jan 21, 2015

Concept ACK

@laanwj laanwj force-pushed the 2015_01_evhttpd branch 2 times, most recently from c532cb5 to 6f1259a Compare January 23, 2015 09:07
@laanwj
Copy link
Member Author

laanwj commented Jan 23, 2015

Now that the code is stable it is time for some benchmarking.
I found a nice scriptable framework for HTTP benchmarking, wrk. Some results.
These benchmarks were taken at the default settings (4 worker threads, 16 depth work queue).

GET request to invalid URL

These are handled by evhttp itself, so this is the baseline.

$  ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/inv
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   375.53us   66.86us   6.31ms   93.03%
    Req/Sec     2.67k   121.28     3.11k    80.98%
  303293 requests in 10.00s, 36.73MB read
  Non-2xx or 3xx responses:
    404: 303293
Requests/sec:  30343.08
Transfer/sec:      3.68MB

As expected, we get 404s.

GET request to /

These are dispatched to a worker thread. Some more latency is expected. In the worker thread these error out early, as GET is not a valid method for JSON RPC.

$ ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   472.96us   44.12us   1.28ms   93.31%
    Req/Sec     2.14k   127.02     2.33k    82.90%
  245025 requests in 10.00s, 41.59MB read
  Non-2xx or 3xx responses:
    405: 245025
Requests/sec:  24514.30
Transfer/sec:      4.16MB

As expected: lots of 405 (invalid method for URL) results.

RPC getgenerate requests

Post getgenerate requests. This is an extremely cheap RPC call.
Script:

wrk.method = "POST"
wrk.body   = "{\"method\":\"getgenerate\",\"params\":[],\"id\":1}\n"
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Authorization"] = "Basic XXX"
$ ./wrk -t12 -c15 -d10s -s getgenerate.lua http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   571.28us   59.77us   2.09ms   91.89%
    Req/Sec     1.82k   110.56     2.00k    80.32%
  204481 requests in 9.99s, 28.28MB read
  Non-2xx or 3xx responses:
Requests/sec:  20463.77
Transfer/sec:      2.83MB

All responses succesful.

RPC getinfo requests

Post getinfo requests. This is a more expensive RPC call.

wrk.method = "POST"
wrk.body   = "{\"method\":\"getinfo\",\"params\":[],\"id\":1}\n"
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Authorization"] = "Basic XXX
$ ./wrk -t12 -c15 -d10s -s getinfo.lua http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.69ms  572.87us  15.39ms   95.21%
    Req/Sec   626.46     81.57     1.00k    84.26%
  71245 requests in 9.99s, 37.17MB read
  Non-2xx or 3xx responses:
Requests/sec:   7128.48
Transfer/sec:      3.72MB

Although the performance goes down, no errors happen. The maximum queue depth is never reached.

RPC requests w/ invalid authentication

Post getinfo requests with invalid authentication. This will trigger a 250ms delay, and thus we can trigger worker queue-full conditions with enough threads.

wrk.method = "POST"
wrk.body   = "{\"method\":\"getinfo\",\"params\":[],\"id\":1}\n"
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Authorization"] = "Basic YYY
$ ./wrk -t12 -c15 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   751.03ms  323.38us 752.15ms   81.82%
    Req/Sec     1.05      0.65     2.00     58.33%
  156 requests in 10.00s, 19.80KB read
  Non-2xx or 3xx responses:
    401: 156
Requests/sec:     15.60
Transfer/sec:      1.98KB

Increasing the load further, we can exceed the work queue depth:

$ ./wrk -t12 -c45 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   204.35ms  462.08ms   1.25s    83.72%
    Req/Sec     2.42k     1.53k    6.00k    75.64%
  277072 requests in 10.00s, 43.59MB read
  Non-2xx or 3xx responses:
    401: 156
    500: **276916**
Requests/sec:  27706.71
Transfer/sec:      4.36MB

Looks good. Nothing unexpected, it's clear the evhttp is not the bottleneck, and the work queue works as expected. Will try this with the old asio-based HTTP server shortly.

@laanwj
Copy link
Member Author

laanwj commented Jan 23, 2015

Old http server

Same steps as above, repeated with the old server as of commit 944c256.

GET request to invalid URL

$  ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/inv
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   502.17us   94.60us   2.65ms   74.77%
    Req/Sec     1.89k   119.76     2.33k    83.38%
  214864 requests in 9.99s, 37.50MB read
  Responses:
    404: 214864
Requests/sec:  21502.28
Transfer/sec:      3.75MB

GET request to /

$ ./wrk -t12 -c15 -d10s http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   502.58us   92.30us   2.07ms   73.39%
    Req/Sec     1.89k   118.97     2.22k    83.36%
  214269 requests in 9.99s, 103.40MB read
  Responses:
    401: 214269
Requests/sec:  21447.93
Transfer/sec:     10.35MB

RPC getgenerate requests

$ ./wrk -t12 -c15 -d10s -s getgenerate.lua http://127.0.0.1:18332/
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    93.68us   24.84us   1.27ms   89.23%
    Req/Sec    10.06k   616.30    11.78k    64.49%
  380211 requests in 10.00s, 78.32MB read
  Socket errors: connect 0, read 0, write 0, timeout 39
  Responses:
    200: 380211
Requests/sec:  38006.24
Transfer/sec:      7.83MB

RPC getinfo requests

$ ./wrk -t12 -c15 -d10s -s getinfo.lua http://127.0.0.1:18332/
Running 10s test @ http://127.0.0.1:18332/
  12 threads and 15 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   544.15us  187.74us   4.03ms   79.81%
    Req/Sec     1.89k   188.50     2.90k    63.09%
  71480 requests in 10.00s, 42.13MB read
  Socket errors: connect 0, read 0, write 0, timeout 39
  Responses:
    200: 71480
Requests/sec:   7144.83
Transfer/sec:      4.21MB

RPC requests w/ invalid authentication

$ ./wrk -t12 -c15 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
  12 threads and 15 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   751.75ms  161.83us 752.03ms   73.91%
    Req/Sec     0.94      0.66     2.00     56.52%
  156 requests in 10.00s, 77.09KB read
  Responses:
    401: 156
Requests/sec:     15.60
Transfer/sec:      7.71KB
$ ./wrk -t12 -c45 -d10s -s getinfo_unauth.lua http://127.0.0.1:18332/
Running 10s test @ http://127.0.0.1:18332/
  12 threads and 45 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.19s   309.47ms   2.26s    95.24%
    Req/Sec     0.90      1.56     6.00     87.30%
  156 requests in 10.01s, 77.09KB read
  Socket errors: connect 0, read 0, write 0, timeout 24
  Responses:
    401: 156
Requests/sec:     15.59
Transfer/sec:      7.70KB

Summary

Testcase                      Old (r/s) New (r/s)
================================================
GET request to invalid URL    21502     30343
GET request to /              21448     24514
RPC getgenerate requests      38006*    20463
RPC getinfo requests           7144*     7128
RPC requests w/ invalid auth     16*       16

* with timeout errors.
  • The new server wins on the base http requests front. Even to /, which are dispatched to the worker threads.
  • The old server is currently much faster with getgenerate requests. I am curious why. Also: how can getgenerate requests be faster than simple GET /'s? I suspect a measurement error that has to do with the timeouts (edit: this is because the old server disconnects after errors, so those don't utilize keepalive).
  • getinfo requests are, as expected, ~ the same speed. Processing overhead dominates.
  • Same for unauthenticated requests. The requests take 250ms to handle so the number seen is exactly as expected.

Take into account that it's not entirely a fair comparison: the new http server can service a large number of connections at the same time, whereas the old server can have a maximum of four (or, -rpcthreads) and starves additional.connections (thanks to keep-alive). There is some overhead in multiplexing that is absent in a one-to-one scenario. This is also a purely a local benchmark. I/O bandwidth doesn't come into it, and the benchmark tool competes for the same CPU as the server.

@Diapolo
Copy link

Diapolo commented Jan 23, 2015

Why are we dropping SSL support for RPC?


// Synchronously look up hostname
struct evhttp_connection *evcon = evhttp_connection_base_new(base, NULL, host.c_str(), port); // XXX RAII
if (evcon == NULL)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand why are you explicitly using == NULL here and for base above just !var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Diapolo == NULL and !evcon are equivalent in this case so purely a matter to taste. The pull clearly states [PoC] as for proof-of-concept, please don't report all these minor things but only critical or high-level issues,

@laanwj
Copy link
Member Author

laanwj commented Jan 23, 2015

@Diapolo Re: this pull: because I don't feel like implementing it.

In the longer term I (and many others) would also argue it is better to drop it:

  • Makes it possible to drop OpenSSL dependency from bitcoind completely, after secp256k1 verification is used. Simplifies the code overall.
  • I've never heard of anyone using SSL with RPC. It may also be better not to know; after all, this invites opening up the RPC port reachable to the internet or other untrusted networks. The limited amount of configurability of rpcssl almost guarantees this will be an insecure setup.
  • If you really need to use RPC remotely over an untrusted network, it is easy enough to set up stunnel or a SSH tunnel, or even e.g. OpenVPN, with the full power of those tools available you may have a chance of doing so securely.

@jonasschnelli
Copy link
Contributor

I agree with @laanwj. SSL support could lead somebody to believe it's "save". IMO it currently a bad idea to expose bitcoind RPC to a public accessible area. Nevertheless, If one like to do this, he could still do a apache ssl enable reverse proxy to bitcoind's RPC.

@gmaxwell
Copy link
Contributor

Also SSL in the RPC massively increases the attack surface we have exposed (if you also expose it to the outside world) and we've had to push updates previously on account of it-- even though we believe its a feature virtually no one uses. As mentioned it can be better accomplished via stunnel (or any of several other tools)

@gavinandresen
Copy link
Contributor

I agree, it was a mistake to add SSL support to the RPC (mea culpa-- I wrote the original version of that code).

@jonasschnelli
Copy link
Contributor

concept ACK.
needs rebase.

@jonasschnelli
Copy link
Contributor

@paveljanik: just added libevent mentioning for osx in #6635

@jgarzik
Copy link
Contributor

jgarzik commented Sep 4, 2015

post merge tested re-ACK

@jtimon jtimon mentioned this pull request Sep 4, 2015
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jan 10, 2016
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jan 10, 2016
@sipa sipa mentioned this pull request Sep 13, 2019
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.