New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxied requests query other nodes in parallel #2779
Conversation
I am not really happy about introducing another thread pool to the system. A small Graylog server already has around 160 threads in several pools. (last time I checked) Most of them are idle, though. I would rather like to have a few configurable thread pools for different purposes than every new or modified subsystem adding their own thread pools. But I guess that's a larger refactoring and we shouldn't do that right now. So for the pool sizing I would like to avoid adding 64 threads by default. The HTTP requests in the proxied resources are currently single threaded, so I would actually use a pretty low default like 2 or 4 which already is an improvement for smaller setups. In bigger setups where this is still a problem the pool size needs to be adjusted. |
@@ -163,6 +163,10 @@ | |||
@Parameter(value = "web_tls_key_password") | |||
private String webTlsKeyPassword; | |||
|
|||
@Parameter(value = "proxied_requests_max_threads", required = true, validator = PositiveIntegerValidator.class) | |||
// TODO: this is a totally abitrary number. this needs a better default based on ... something. | |||
private int proxiedRequestsMaxThreads = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @bernd (#2779 (comment)) that a pool size of 64 threads is a bit too much for most setups.
8 or 16 threads should be fine for most workloads.
LGTM. 👍 |
* Call other nodes concurrently for proxied requests. * Injecting ExecutorService in ProxiedResource + configured max pool size * Explaing proxied_requests_max_threads config parameter in sample config. * Making config value consistent, changing default, explaining sizing. (cherry picked from commit 87acd4a)
Description
This change makes the
ProxiedResource
base class perform requests to other nodes in the cluster in parallel, when all nodes are supposed to be queried. This reduces the increase of round trip times with growing cluster sizes, which was growing linearly before (due to requests being executed in a single thread).The HTTP requests are being executed on a shared thread pool, which has a maximum size, defined by the
proxied_requests_max_threads
config setting. The default for this is 64, which is rather arbitrary. Coming up with a sane default for this is hard, as it is not related to the number of available CPUs (as most of the time the threads will be sleeping/blocked on IO, so overprovisioning of CPUs is desired to achieve good performance) and the number of nodes in the cluster is dynamically changing during runtime (and maybe not available during startup). Any recommendations for a good default are welcome.Motivation and Context
For large cluster sizes, performing proxied requests sequentially could lead to large round trip times, which might exceed the defined timeout. This could lead to functionality being unavailable or even an overloaded Graylog server.
How Has This Been Tested?
Node cleanup was disabled and different ranges of dummy node table entries were created, with transport addresses pointing to a local http stub returning dummy metrics responses. Then, a cluster metrics request was sent to the Graylog server and round trip times were measured.
Types of changes
Checklist: