Proxied requests query other nodes in parallel #2779

dennisoelkers · 2016-09-06T08:47:56Z

Description

This change makes the ProxiedResource base class perform requests to other nodes in the cluster in parallel, when all nodes are supposed to be queried. This reduces the increase of round trip times with growing cluster sizes, which was growing linearly before (due to requests being executed in a single thread).

The HTTP requests are being executed on a shared thread pool, which has a maximum size, defined by the proxied_requests_max_threads config setting. The default for this is 64, which is rather arbitrary. Coming up with a sane default for this is hard, as it is not related to the number of available CPUs (as most of the time the threads will be sleeping/blocked on IO, so overprovisioning of CPUs is desired to achieve good performance) and the number of nodes in the cluster is dynamically changing during runtime (and maybe not available during startup). Any recommendations for a good default are welcome.

Motivation and Context

For large cluster sizes, performing proxied requests sequentially could lead to large round trip times, which might exceed the defined timeout. This could lead to functionality being unavailable or even an overloaded Graylog server.

How Has This Been Tested?

Node cleanup was disabled and different ranges of dummy node table entries were created, with transport addresses pointing to a local http stub returning dummy metrics responses. Then, a cluster metrics request was sent to the Graylog server and round trip times were measured.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

bernd · 2016-09-06T21:35:31Z

I am not really happy about introducing another thread pool to the system. A small Graylog server already has around 160 threads in several pools. (last time I checked) Most of them are idle, though. I would rather like to have a few configurable thread pools for different purposes than every new or modified subsystem adding their own thread pools.

But I guess that's a larger refactoring and we shouldn't do that right now.

So for the pool sizing I would like to avoid adding 64 threads by default. The HTTP requests in the proxied resources are currently single threaded, so I would actually use a pretty low default like 2 or 4 which already is an improvement for smaller setups. In bigger setups where this is still a problem the pool size needs to be adjusted.

joschi · 2016-09-07T10:00:33Z

graylog2-server/src/main/java/org/graylog2/plugin/BaseConfiguration.java

@@ -163,6 +163,10 @@
    @Parameter(value = "web_tls_key_password")
    private String webTlsKeyPassword;

+    @Parameter(value = "proxied_requests_max_threads", required = true, validator = PositiveIntegerValidator.class)
+    // TODO: this is a totally abitrary number. this needs a better default based on ... something.
+    private int proxiedRequestsMaxThreads = 64;


I agree with @bernd (#2779 (comment)) that a pool size of 64 threads is a bit too much for most setups.

8 or 16 threads should be fine for most workloads.

joschi · 2016-09-08T09:59:17Z

LGTM. 👍

* Call other nodes concurrently for proxied requests. * Injecting ExecutorService in ProxiedResource + configured max pool size * Explaing proxied_requests_max_threads config parameter in sample config. * Making config value consistent, changing default, explaining sizing. (cherry picked from commit 87acd4a)

dennisoelkers added 5 commits September 6, 2016 09:28

Call other nodes concurrently for proxied requests.

c3ae20b

Injecting ExecutorService in ProxiedResource + configured max pool size

9b9f8ab

Removing unused import.

91ed3e1

Adding license header.

6d24f50

Explaing proxied_requests_max_threads config parameter in sample config.

0f8c9d1

dennisoelkers added performance bug labels Sep 6, 2016

dennisoelkers changed the title ~~Proxied requests query other nodes parallely~~ Proxied requests query other nodes in parallel Sep 6, 2016

dennisoelkers added the ready-for-review label Sep 6, 2016

dennisoelkers added this to the 2.1.1 milestone Sep 6, 2016

joschi self-assigned this Sep 7, 2016

joschi reviewed Sep 7, 2016
View reviewed changes

Jochen Schalanda and others added 3 commits September 7, 2016 12:23

Use CompletableFuture instead of ConcurrentUtils in ProxiedResource

a2551ab

Try to simplify ProxiedResource#getForAllNodes()

3cfc43f

Making config value consistent, changing default, explaining sizing.

cc39363

joschi merged commit 87acd4a into 2.1 Sep 8, 2016

joschi deleted the issue-2764 branch September 8, 2016 09:59

jalogisch mentioned this pull request Sep 12, 2016

Lack of concurrency in ProxiedResource slows down fetching of metrics in large clusters #2764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxied requests query other nodes in parallel #2779

Proxied requests query other nodes in parallel #2779

dennisoelkers commented Sep 6, 2016

bernd commented Sep 6, 2016

joschi Sep 7, 2016

joschi commented Sep 8, 2016

Proxied requests query other nodes in parallel #2779

Proxied requests query other nodes in parallel #2779

Conversation

dennisoelkers commented Sep 6, 2016

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

bernd commented Sep 6, 2016

joschi Sep 7, 2016

Choose a reason for hiding this comment

joschi commented Sep 8, 2016