New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal: Management thread pool should reject requests when there are too many #7318
Comments
+1 |
I assume we'd add a configurable queue size and reject when the queue is full. I like the idea; it would help debugging cases where management tools are loading the server. |
the scaling thread pool today doesn't support rejection based on queue size (just an FYI). We can try and add this support, can be a bit tricky since it relies on rejection to add a thread or not. I think potentially the simplest solution is to make it fixed with a bounded queue size? |
Fixed with bounded queue size sounds like a good compromise; I'll try to take a stab at this. |
OK all I changed was this:
But just to confirm: if the backlog tries to exceed 100 then ES will throw EsRejectedExecutionExc back to the client? Looks like it will, because EsExecutors.newFixed passes new EsAbortPolicy()... |
@mikemccand correct, changing to fix with bounded queue will cause rejections to be thrown. I think that availableProcessors as the value for the thread pool is too big, specifically for beefy machines, I would put there |
OK, thanks @kimchy I'll switch to min(5, availableProcessors). |
ahh, we already have |
Ahh super. |
I've reverted this change, since it causes #7916 ... |
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected. Closes elastic#7318 Closes elastic#7320
we should probably revisit this now that we have a different execution model for indices stats etc (#7990) . Reopening .... |
@jasontedor may be of interest to you? |
Superseded by #18613 |
Today, the management thread pool (used by stats and cats) is bounded to 5, but it still accepts further requests, and then waits indefinitely for a thread to free up.
This is dangerous because node stats can be a somewhat costly operation (in proportion to number of shards on the node)....
And it confounds debugging, because it can cause loooong hangs in e.g. node stats requests via browser/curl, and it also is not graceful for recovering from "too many management requests" overload.
If we instead rejected the request it would make it clearer which clients are causing too much load.
The text was updated successfully, but these errors were encountered: