Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically retry persistent task allocation #35792

Closed
droberts195 opened this issue Nov 21, 2018 · 3 comments
Closed

Periodically retry persistent task allocation #35792

droberts195 opened this issue Nov 21, 2018 · 3 comments
Assignees
Labels
:Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement :ml Machine learning team-discuss

Comments

@droberts195
Copy link
Contributor

droberts195 commented Nov 21, 2018

Currently the persistent tasks framework attempts to allocate unallocated persistent tasks in the following situations:

  • Persistent tasks are changed
  • A node joins or leaves the cluster
  • The routing table is changed
  • Custom metadata in the cluster state is changed
  • A new master node is elected

When an ML node fails we need to reallocate ML jobs according to their memory requirements, taking into account the memory requirements of other open ML jobs. The "A node joins or leaves the cluster" triggers an attempt to do this, but the master node doing the reallocation may not have all the up-to-date memory usage statistics needed to make sensible reallocation decisions. One way to solve this problem is to defer the allocation - return null in response to the call to the getAssignment() function that the failure immediately causes and instead trigger an asynchronous request to gather the necessary information. The problem comes when the asynchronous request returns with the necessary information - at this point we need to try to reallocate the persistent tasks whose allocation was deferred again.

We discussed this in the distributed area weekly meeting and decided that the simplest and safest way to achieve this would be to have persistent tasks recheck allocations periodically, say every 30 seconds.

An alternative would be to add an endpoint to allow clients to request a recheck of unallocated persistent tasks. But this would run the risk that a client that called the endpoint too often could cause an excessive amount of rechecking.

The proposed change is therefore:

  • Add a new dynamic cluster setting cluster.persistent_tasks.allocation.recheck_interval (default 30s) to control how frequently to recheck allocation of unallocated persistent tasks.
  • Change PersistentTasksClusterService so that the loop currently in shouldReassignPersistentTasks() is factored out into a new method and is also run by the timer callback, and if it returns true then a cluster state update is triggered that calls PersistentTasksClusterService.reassignTasks() to change the state.
  • PersistentTasksClusterService.shouldReassignPersistentTasks() will reset the timer so that if some cluster state update triggers an allocation check then the timer doesn't do another one shortly afterwards.
@droberts195 droberts195 added >enhancement :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. :ml Machine learning team-discuss labels Nov 21, 2018
@droberts195 droberts195 self-assigned this Nov 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@ywelsch
Copy link
Contributor

ywelsch commented Nov 26, 2018

I would like to add one more point to the list above: There should only ever be a single pending cluster state update task. This prevents queuing too many update tasks when the master is taxed with higher-priority tasks.

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Nov 29, 2018
Previously persistent task assignment was checked in the
following situations:

- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected

However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.

This change adds a timed recheck of persistent task
allocation to account for such situations.  The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.

Closes elastic#35792
droberts195 added a commit to droberts195/elasticsearch that referenced this issue Nov 29, 2018
Previously persistent task assignment was checked in the
following situations:

- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected

However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.

This change adds a timed recheck of persistent task
assignment to account for such situations.  The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.

Closes elastic#35792
droberts195 added a commit that referenced this issue Dec 13, 2018
Previously persistent task assignment was checked in the
following situations:

- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected

However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.

This change adds a timed recheck of persistent task
assignment to account for such situations.  The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.

Closes #35792
droberts195 added a commit that referenced this issue Dec 13, 2018
Previously persistent task assignment was checked in the
following situations:

- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected

However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.

This change adds a timed recheck of persistent task
assignment to account for such situations.  The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.

Closes #35792
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement :ml Machine learning team-discuss
Projects
None yet
Development

No branches or pull requests

3 participants