Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: avoid fd exhaustion #12274

Closed
wants to merge 6 commits into from
Closed

http: avoid fd exhaustion #12274

wants to merge 6 commits into from

Conversation

theuni
Copy link
Member

@theuni theuni commented Jan 26, 2018

Addresses #11368. This adds a file descriptor limiter for the http server. It's possible for the open fd count to potentially exhaust the user's defined limit if not checked.

I'm having a hard time reproducing the issue now, I'm not sure what's changed in my environment. I was finally able to hit it using some pretty nasty tricks, but I'm less worried about this now for 0.16.

I do think it's worth considering taking de6cfce, 5f5099a, and (part of) 3b0ebb3 for 0.16, though, as they help significantly and are easy to review. I'm happy to break that out into a new PR if necessary.

Responses are usually added to the response queue by worker threads. As an
optimization, and in order to close connections as quickly as possible, allow
for immediate replies (bypassing the queue) when there is no need for a worker
to service the request.
No functional change, just preparation for the next commits.
Rather than potentially accepting an unbounded number of connections, give them
a ceiling of twice the size of the processing queue.

Once the ceiling is reached, new connections will not be accepted until the
count has fallen back under the limit.

Note that file descriptors are removed from the limiter in the request's
close callback.
If we're unable to service a request, close the connection rather than
potentially keeping it alive longer. This helps to be slightly more fair to new
connections.
This avoids assigning file descriptors faster than we can close them.

It is the same as using an evconnlistener with the LEV_OPT_DEFERRED_ACCEPT
flag, which is only available with libevent 2.1+.
Libevent 2.2 adds a callback allowing us to drop connections as they come in.
Use it when possible.
@fanquake fanquake added this to the 0.16.0 milestone Jan 26, 2018
@@ -288,14 +376,19 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
if (i != iend) {
std::unique_ptr<HTTPWorkItem> item(new HTTPWorkItem(std::move(hreq), path, i->handler));
assert(workQueue);
if (workQueue->Enqueue(item.get()))
if (workQueue->Enqueue(item.get())) {
// Disable reading to work around a libevent bug, fixed in 2.2.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this block be wrapped in a LIBEVENT_VERSION_NUMBER macro guard so it only occurs in the affected version range?

Seems like it would be easier for future cleanups to see where the version macro is checked, say in the instance that a minimum version of libevent is defined and code like this could be safely removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just move-only from above. There's no need to guard it with an ifdef because there's no api difference, it's purely a runtime check.


const unsigned int m_limit;
std::vector<evconnlistener*> m_listeners;
std::set<evutil_socket_t> m_sockets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it probably matters in most configurations where the fd limit is ~1K, but making this an unordered_set would be slightly beneficial in the high fd limit cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@laanwj
Copy link
Member

laanwj commented Jan 28, 2018

This needs to be tested by those experiencing the issue: @KanoczTomas @vii

@promag
Copy link
Member

promag commented Jan 29, 2018

Concept ACK.

Why don't understand why you need dc53bec, seems like an unnecessary refactor.

5f5099a and 3b0ebb3 could be in a separate PR, are simple and make sense even without the fd exhaustion problem.

@theuni
Copy link
Member Author

theuni commented Jan 29, 2018

@promag dc53bec is just a small chunk of c56cf1a that seemed reasonable on its own.

@@ -300,6 +300,22 @@ static void connection_close_cb(evhttp_connection* conn, void *arg)
}
}

#if LIBEVENT_VERSION_NUMBER >= 0x02020001
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome that you're using the callback - one way to unambiguously detect the presence of the new feature would be to add a new autoconf test maybe using AC_LINK_IFELSE for evhttp_set_newreqcb

@vii
Copy link

vii commented Jan 30, 2018

Awesome! The defer accept tweak will defend against typical inadvertent denial of services

Note that the original issue can still be activated depending on the file descriptor limit that is set, as the budgeting in the daemon does not account for this usage or the usage in the db code - see
#11785

To test it, something like this slowhttptest program

slowhttptest -c 40000 -r 1000 -u 'http://127.0.0.1:8332/rest/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f.json'

or another tool which can simulate a slowloris (HTTP connection header trickling) attack

@laanwj laanwj removed this from the 0.16.0 milestone Jan 30, 2018
@laanwj
Copy link
Member

laanwj commented Jan 30, 2018

Sorry, I'm removing the 0.16.0 milestone here. As it introduces many new concepts and work-arounds it is hard to review last-minute, the problem is hard to trigger in the first place, and it is not a regression since 0.15. Also according to @vii it doesn't solve the issue completely. I think getting the fd issue fixed properly should be a focus for 0.16.1.

Copy link
Contributor

@eklitzke eklitzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a superficial check, I can do another pass later in more detail. I'm curious if you can reproduce the fd exhaustion issue.

g_limiter = MakeUnique<ConnectionLimiter>(std::move(listeners), workQueueDepth * 2);
evhttp_set_gencb(http, http_request_cb, g_limiter.get());

#if LIBEVENT_VERSION_NUMBER >= 0x02020001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see this defined separately as:

#if LIBEVENT_VERSION_NUMBER >= 0x02020001
#define HAVE_EVHTTP_SET_NEWREQCB 1
#endif

That way the check here is more obvious, particularly since you do the check elsewhere. Of course, even better than this would be to use AC_CHECK_FUNC()...

{
bool ret = false;
#ifdef TCP_DEFER_ACCEPT
static constexpr int set = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I've seen static constexpr used correctly in the Bitcoin code, congratulations.

@fanquake
Copy link
Member

fanquake commented Mar 21, 2018

@eklitzke I can still produce a crash with these changes on top of 2405ce1:

screenshot 2018-03-22 at 12 04 52 am

date='2015-09-16T16:18:02Z' progress=0.269372 cache=313.8MiB(2336744txo)
2018-03-21T16:04:44Z UpdateTip: new best=00000000000000000c7112c52408963acb80e1fedc71005859d6ef9a22f84d79 height=374820 version=0x00000003 log2_work=83.354829 tx=84036580 date='2015-09-16T16:20:12Z' progress=0.269373 cache=313.8MiB(2337189txo)
2018-03-21T16:04:44Z LevelDB read failure: IO error: /Users/fanquake/Library/Application Support/Bitcoin/chainstate/019338.ldb: Too many open files
2018-03-21T16:04:44Z Fatal LevelDB error: IO error: /Users/fanquake/Library/Application Support/Bitcoin/chainstate/019338.ldb: Too many open files
2018-03-21T16:04:44Z You can use -debug=leveldb to get more complete diagnostic messages
2018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
2018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
2018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files
2018-03-21T16:04:44Z libevent: Error from accept() call: Too many open files

When last discussed with @theuni he was going to revamp these changes.

@eklitzke
Copy link
Contributor

@fanquake Interesting -- do you have a script to stress test it to induce that condition? Or does that happen during normal operation?

@laanwj laanwj removed this from the 0.16.1 milestone May 17, 2018
Nico205 added a commit to machinecoin-project/machinecoin-core that referenced this pull request May 17, 2018
@DrahtBot
Copy link
Contributor

The last travis run for this pull request was 175 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

@DrahtBot DrahtBot closed this Jul 20, 2018
@DrahtBot DrahtBot reopened this Jul 20, 2018
{
// Disable socket accepting if adding this connection puts us equal to the limit
if (!Interrupted() && m_sockets.insert(fd).second && m_sockets.size() == m_limit) {
LogPrint(BCLog::HTTP, "Suspending new connections");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All calls to LogPrintf() and LogPrint() should be terminated with \n

@DrahtBot
Copy link
Contributor

Needs rebase

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 3, 2018

There hasn't been much activity lately and the patch still needs rebase, so I am closing this for now. Please let me know when you want to continue working on this, so the pull request can be re-opened.

@DrahtBot DrahtBot closed this Dec 3, 2018
@BlockMechanic
Copy link
Contributor

So is this the recommended fix for #11368 ? I just bumped into it and need to resolve it

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants