Make HTTP server shutdown more graceful #6719

Merged
merged 3 commits into from Sep 28, 2015

Conversation

Projects
None yet
4 participants
@laanwj
Member

laanwj commented Sep 24, 2015

Shutting down the HTTP server currently breaks off all current requests. This can create a race condition with RPC stop command, where the calling process never receives confirmation.

This change removes the listening sockets on shutdown so that no new requests can come in, but no longer breaks off requests in progress.

Meant to fix #6717.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Sep 25, 2015

Contributor

but no longer breaks off requests in progress.

Possible attack vector by denying the service the ability to shut down / restart by keeping a connection open?

Contributor

dcousens commented Sep 25, 2015

but no longer breaks off requests in progress.

Possible attack vector by denying the service the ability to shut down / restart by keeping a connection open?

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 25, 2015

Member

After -rpcservertimeout current connections will get booted, unless there is activity. If they do any requests they get a server unavailable error. But yes, if your application software is truly malicious they could keep the server busy by sending e.g. one header at a time, every timeout-1 seconds.

Note that this solution is cleaner in any case, it doesn't rule out queuing a event_base_loopbreak(eventBase);, say, 30 seconds in the future to force a shutdown.

Member

laanwj commented Sep 25, 2015

After -rpcservertimeout current connections will get booted, unless there is activity. If they do any requests they get a server unavailable error. But yes, if your application software is truly malicious they could keep the server busy by sending e.g. one header at a time, every timeout-1 seconds.

Note that this solution is cleaner in any case, it doesn't rule out queuing a event_base_loopbreak(eventBase);, say, 30 seconds in the future to force a shutdown.

@jonasschnelli

This comment has been minimized.

Show comment
Hide comment
@jonasschnelli

jonasschnelli Sep 25, 2015

Member

utACK.

I agree that the attack vectors are out of scope for this PR maybe for the whole http-json-rpc-daemon (it's not hardened enough to expose to potential malicious environments).

Member

jonasschnelli commented Sep 25, 2015

utACK.

I agree that the attack vectors are out of scope for this PR maybe for the whole http-json-rpc-daemon (it's not hardened enough to expose to potential malicious environments).

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Sep 25, 2015

Contributor

utACK, just thought I'd mention it.

Contributor

dcousens commented Sep 25, 2015

utACK, just thought I'd mention it.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 25, 2015

Member

@dcousens It's great that you mentioned it. It reminded me of potential issue of this: what happens if a timer is stilll running, will it block the event queue exit until it triggers?
Will test this scenario by stopping with an unlocked wallet.

Pushed another related fix:

http: Wait for worker threads to exit

Add a WaitExit() call to http's WorkQueue to make StopHTTPServer delete the work queue only when all worker threads stopped.

This fixes a problem that was reproducable by pressing Ctrl-C during AppInit2:

/usr/include/boost/thread/pthread/condition_variable_fwd.hpp:81: boost::condition_variable::~condition_variable(): Assertion `!ret' failed.
/usr/include/boost/thread/pthread/mutex.hpp:108: boost::mutex::~mutex(): Assertion `!posix::pthread_mutex_destroy(&m)' failed.

I was assuming that threadGroup->join_all(); would always have been called when entering the Shutdown(). However this is not the case in bitcoind's AppInit2-non-zero-exit case "was left out intentionally here".
(now that the http module takes care of waiting for its own worker threads, we could even make it stop using threadGroup completely, but I leave that for a future refactor)

Member

laanwj commented Sep 25, 2015

@dcousens It's great that you mentioned it. It reminded me of potential issue of this: what happens if a timer is stilll running, will it block the event queue exit until it triggers?
Will test this scenario by stopping with an unlocked wallet.

Pushed another related fix:

http: Wait for worker threads to exit

Add a WaitExit() call to http's WorkQueue to make StopHTTPServer delete the work queue only when all worker threads stopped.

This fixes a problem that was reproducable by pressing Ctrl-C during AppInit2:

/usr/include/boost/thread/pthread/condition_variable_fwd.hpp:81: boost::condition_variable::~condition_variable(): Assertion `!ret' failed.
/usr/include/boost/thread/pthread/mutex.hpp:108: boost::mutex::~mutex(): Assertion `!posix::pthread_mutex_destroy(&m)' failed.

I was assuming that threadGroup->join_all(); would always have been called when entering the Shutdown(). However this is not the case in bitcoind's AppInit2-non-zero-exit case "was left out intentionally here".
(now that the http module takes care of waiting for its own worker threads, we could even make it stop using threadGroup completely, but I leave that for a future refactor)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 25, 2015

Member

Pushed another commit to force-exit the event loop after a predefined time after interruption. This addresses @dcousens 's issue, as well as the case that other events are still lingering (such as a wallet unlock timer).

Member

laanwj commented Sep 25, 2015

Pushed another commit to force-exit the event loop after a predefined time after interruption. This addresses @dcousens 's issue, as well as the case that other events are still lingering (such as a wallet unlock timer).

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Sep 25, 2015

Contributor

utACK, well done @laanwj

Contributor

dcousens commented Sep 25, 2015

utACK, well done @laanwj

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Sep 25, 2015

Member

Concept ACK, btw. Definitely an improvement.

Member

theuni commented Sep 25, 2015

Concept ACK, btw. Definitely an improvement.

laanwj added some commits Sep 24, 2015

Make HTTP server shutdown more graceful
Shutting down the HTTP server currently breaks off all current requests.
This can create a race condition with RPC `stop` command, where the calling
process never receives confirmation.

This change removes the listening sockets on shutdown so that no new
requests can come in, but no longer breaks off requests in progress.

Meant to fix #6717.
http: Wait for worker threads to exit
Add a WaitExit() call to http's WorkQueue to make it delete the work
queue only when all worker threads stopped.

This fixes a problem that was reproducable by pressing Ctrl-C during
AppInit2:
```
/usr/include/boost/thread/pthread/condition_variable_fwd.hpp:81: boost::condition_variable::~condition_variable(): Assertion `!ret' failed.
/usr/include/boost/thread/pthread/mutex.hpp:108: boost::mutex::~mutex(): Assertion `!posix::pthread_mutex_destroy(&m)' failed.
```

I was assuming that `threadGroup->join_all();` would always have been
called when entering the Shutdown(). However this is not the case in
bitcoind's AppInit2-non-zero-exit case "was left out intentionally
here".
http: Force-exit event loop after predefined time
This makes sure that the event loop eventually terminates, even if an
event (like an open timeout, or a hanging connection) happens to be
holding it up.
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Sep 28, 2015

Member

Processed @theuni 's comments. WIll merge as soon as Travis passes.

Member

laanwj commented Sep 28, 2015

Processed @theuni 's comments. WIll merge as soon as Travis passes.

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Sep 28, 2015

Member

Thanks. ut ACK.

Member

theuni commented Sep 28, 2015

Thanks. ut ACK.

@laanwj laanwj merged commit ec908d5 into bitcoin:master Sep 28, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Sep 28, 2015

Merge pull request #6719
ec908d5 http: Force-exit event loop after predefined time (Wladimir J. van der Laan)
de9de2d http: Wait for worker threads to exit (Wladimir J. van der Laan)
5e0c221 Make HTTP server shutdown more graceful (Wladimir J. van der Laan)

laanwj added a commit that referenced this pull request Nov 11, 2015

http: speed up shutdown
Pardon a bit of iteration. This continues/fixes #6719.

`event_base_loopbreak` was not doing what I expected it to.
What I expected was that it sets a timeout, given that no other pending
events it would exit in N seconds. However, what it does was delay the
event loop exit with 10 seconds, even if nothing is pending.

Solve it in a different way: give the event loop thread time to exit
out of itself, and if it doesn't, send loopbreak.

This speeds up the RPC tests a lot, each exit incurred a 10 second
overhead, with this change there should be no shutdown overhead in the
common case and up to two seconds if the event loop is blocking.

As a bonus this breaks dependency on boost::thread_group, as the HTTP
server minds its own offspring.

@laanwj laanwj referenced this pull request Nov 11, 2015

Merged

http: speed up shutdown #6990

laanwj added a commit that referenced this pull request Nov 13, 2015

http: speed up shutdown
This continues/fixes #6719.

`event_base_loopbreak` was not doing what I expected it to, at least in
libevent 2.0.21.
What I expected was that it sets a timeout, given that no other pending
events it would exit in N seconds. However, what it does was delay the
event loop exit with 10 seconds, even if nothing is pending.

Solve it in a different way: give the event loop thread time to exit
out of itself, and if it doesn't, send loopbreak.

This speeds up the RPC tests a lot, each exit incurred a 10 second
overhead, with this change there should be no shutdown overhead in the
common case and up to two seconds if the event loop is blocking.

As a bonus this breaks dependency on boost::thread_group, as the HTTP
server minds its own offspring.

@jnewbery jnewbery referenced this pull request Aug 9, 2017

Merged

Improve shutdown process #11006

zkbot added a commit to zcash/zcash that referenced this pull request Oct 23, 2017

Auto merge of #2555 - jasondavies:fix-2554, r=str4d
Fix various thread assertion errors caused during shutdown.

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6719
- bitcoin/bitcoin#6990
- bitcoin/bitcoin#8421
  - Second commit only in this PR
- bitcoin/bitcoin#11006

I've cherry-picked the relevant commits, along with a note in each commit referring to the original Bitcoin commit ID (and the Zcash issue numbers where applicable).  I've tested each issue with/without these patches applied.

Closes #2214, #2334, and #2554.

@dagurval dagurval referenced this pull request in bitcoinxt/bitcoinxt Jan 27, 2018

Merged

HTTP Server cherries #315

sickpig added a commit to sickpig/BitcoinUnlimited that referenced this pull request Mar 12, 2018

Port XT #315 from dagurval/28-01-httpserver
HTTP Server cherries from Core:

bitcoin/bitcoin#6719 - Make HTTP server shutdown more graceful
bitcoin/bitcoin#6859 - http: Restrict maximum size of http + headers
bitcoin/bitcoin#6990 - http: speed up shutdown
bitcoin/bitcoin#7966 - http: Do a pending c++11 simplification handling work items
bitcoin/bitcoin#8421 - httpserver: drop boost (#8023 dependency)
bitcoin/bitcoin#11006 - Improve shutdown process

@sickpig sickpig referenced this pull request in BitcoinUnlimited/BitcoinUnlimited Mar 12, 2018

Merged

HTTP servers Core ports #1005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment