New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"IMMEDIATE" tasks getting queued up in pending tasks #8860
Comments
Hey @ppf2 , I assume by EMERGENCY you mean URGENT. Answering based on that assumption..
Yes, this is correct. The cluster state tasks are executed by a single thread to make sure things stay consistent. While the queue is being prioritized with every task insertion, you have to wait for the thread to finish it's current task and pickup on the next highest priority task. The assumption is that no task should take long (although you may have many tasks queued up). The question is what task was taking so long in your case? You can run hot threads to see what the cluster state update thread is doing. Another thing to try is to increase the timeout of the cluster state settings update call to 1m and see if it helps to get it in (the master wait for up to 30s for the nodes to respond when publishing the cluster state so this might be it as well). PS - even if the api call times out , the task is still queued up and will be executed when the thread is free. So the setting will be applied, albeit after the call returns |
Thanks will get hot threads, and yes, I meant urgent tasks, not emergency :D |
We were having a similar issue with IMMEDIATE tasks queuing up. See issue #8804. |
@miccon yeah, you have the same issue - a single cluster state update task takes way too long (which was fixed in your case in #8803 and disabling the include relocations as a work around) |
This has probably been resolved by async shard store fetching. Please reopen if you still see this on recent versions |
Have a situation where the cluster had to be restarted. Upon restarting, there was a ton of recovery activity (at times, we observed >100
EMERGENCY
tasks in pending_tasks). As a result, attempts to update the cluster (eg. to increase the concurrency setting for recovery) started failing withProcessClusterEventTimeoutException
errors.https://github.com/elasticsearch/elasticsearch/blob/1816951b6b0320e7a011436c7c7519ec2bfabc6e/src/main/java/org/elasticsearch/common/Priority.java#L45 seems to indicate that
IMMEDIATE
tasks are of higher priority ofEMERGENCY
tasks. But for some reason, these update cluster setting calls are being queued up still.Is the cluster too busy to even go and re-prioritize its running tasks? Or is it because once an
EMERGENCY
task starts to run, even if anIMMEDIATE
task comes in, it will still have to wait till these runningEMERGENCY
tasks have completed? In other words, if theIMMEDIATE
task comes in at the same time as anEMERGENCY
task, it will get prioritized higher, but it will not go and suspend any runningEMERGENCY
task to allow for theIMMEDIATE
task to run first, etc..?The text was updated successfully, but these errors were encountered: