Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal: Management thread pool should reject requests when there are too many #7318

Closed
mikemccand opened this issue Aug 18, 2014 · 13 comments
Assignees
Labels
:Core/Infra/Core Core issues without another label discuss >enhancement

Comments

@mikemccand
Copy link
Contributor

Today, the management thread pool (used by stats and cats) is bounded to 5, but it still accepts further requests, and then waits indefinitely for a thread to free up.

This is dangerous because node stats can be a somewhat costly operation (in proportion to number of shards on the node)....

And it confounds debugging, because it can cause loooong hangs in e.g. node stats requests via browser/curl, and it also is not graceful for recovering from "too many management requests" overload.

If we instead rejected the request it would make it clearer which clients are causing too much load.

@nik9000
Copy link
Member

nik9000 commented Aug 18, 2014

+1

@kevinkluge
Copy link
Member

I assume we'd add a configurable queue size and reject when the queue is full. I like the idea; it would help debugging cases where management tools are loading the server.

@kimchy
Copy link
Member

kimchy commented Aug 18, 2014

the scaling thread pool today doesn't support rejection based on queue size (just an FYI). We can try and add this support, can be a bit tricky since it relies on rejection to add a thread or not. I think potentially the simplest solution is to make it fixed with a bounded queue size?

@mikemccand
Copy link
Contributor Author

Fixed with bounded queue size sounds like a good compromise; I'll try to take a stab at this.

@mikemccand mikemccand self-assigned this Aug 18, 2014
@mikemccand
Copy link
Contributor Author

OK all I changed was this:

diff --git a/src/main/java/org/elasticsearch/threadpool/ThreadPool.java b/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
index 0152c0e..19692e6 100644
--- a/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
+++ b/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
@@ -119,7 +119,7 @@ public class ThreadPool extends AbstractComponent {
                 .put(Names.SEARCH, settingsBuilder().put("type", "fixed").put("size", availableProcessors * 3).put("queue_size", 1000).build())
                 .put(Names.SUGGEST, settingsBuilder().put("type", "fixed").put("size", availableProcessors).put("queue_size", 1000).build())
                 .put(Names.PERCOLATE, settingsBuilder().put("type", "fixed").put("size", availableProcessors).put("queue_size", 1000).build())
-                .put(Names.MANAGEMENT, settingsBuilder().put("type", "scaling").put("keep_alive", "5m").put("size", 5).build())
+                .put(Names.MANAGEMENT, settingsBuilder().put("type", "fixed").put("size", availableProcessors).put("queue_size", 100).build())
                 .put(Names.FLUSH, settingsBuilder().put("type", "scaling").put("keep_alive", "5m").put("size", halfProcMaxAt5).build())
                 .put(Names.MERGE, settingsBuilder().put("type", "scaling").put("keep_alive", "5m").put("size", halfProcMaxAt5).build())
                 .put(Names.REFRESH, settingsBuilder().put("type", "scaling").put("keep_alive", "5m").put("size", halfProcMaxAt10).build())

But just to confirm: if the backlog tries to exceed 100 then ES will throw EsRejectedExecutionExc back to the client? Looks like it will, because EsExecutors.newFixed passes new EsAbortPolicy()...

@kimchy
Copy link
Member

kimchy commented Aug 18, 2014

@mikemccand correct, changing to fix with bounded queue will cause rejections to be thrown. I think that availableProcessors as the value for the thread pool is too big, specifically for beefy machines, I would put there Math.min(5, avaiableProcessors)?, I think 5 threads should be enough, otherwise the operation is just too heavy and its a bug, a stats call should not be heavy (like using the usable space on file length issue).

@mikemccand
Copy link
Contributor Author

OK, thanks @kimchy I'll switch to min(5, availableProcessors).

@kimchy
Copy link
Member

kimchy commented Aug 18, 2014

ahh, we already have halfProcMaxAt5, maybe just use that one?

@mikemccand
Copy link
Contributor Author

Ahh super.

mikemccand added a commit that referenced this issue Aug 18, 2014
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected.

Closes #7318

Closes #7320
mikemccand added a commit that referenced this issue Aug 18, 2014
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected.

Closes #7318

Closes #7320
@clintongormley clintongormley changed the title Core: should management thread pool reject requests when there are too many? Internal: Management thread pool should reject requests when there are too many Sep 8, 2014
mikemccand added a commit that referenced this issue Sep 8, 2014
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected.

Closes #7318

Closes #7320
@mikemccand
Copy link
Contributor Author

I've reverted this change, since it causes #7916 ...

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Switch management threads to a fixed thread pool with up to 5 threads, and queue size of 100 by default, after which excess incoming requests are rejected.

Closes elastic#7318

Closes elastic#7320
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
@bleskes
Copy link
Contributor

bleskes commented Nov 4, 2015

we should probably revisit this now that we have a different execution model for indices stats etc (#7990) . Reopening ....

@clintongormley
Copy link

@jasontedor may be of interest to you?

@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Feb 23, 2018
@jasontedor
Copy link
Member

Superseded by #18613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label discuss >enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants