Periodically try to reassign unassigned persistent tasks #36069

droberts195 · 2018-11-29T17:43:57Z

Previously persistent task assignment was checked in the
following situations:

Persistent tasks are changed
A node joins or leaves the cluster
The routing table is changed
Custom metadata in the cluster state is changed
A new master node is elected

However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.

This change adds a timed recheck of persistent task
assignment to account for such situations. The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.

Closes #35792

Previously persistent task assignment was checked in the following situations: - Persistent tasks are changed - A node joins or leaves the cluster - The routing table is changed - Custom metadata in the cluster state is changed - A new master node is elected However, there could be situations when a persistent task that could not be assigned to a node could become assignable due to some other change, such as memory usage on the nodes. This change adds a timed recheck of persistent task assignment to account for such situations. The timer is suspended while checks triggered by cluster state changes are in-flight to avoid adding burden to an already busy cluster. Closes elastic#35792

elasticmachine · 2018-11-29T17:43:59Z

Pinging @elastic/es-distributed

bleskes

Thanks robert. I left some comments. Also can you add an IT test?

bleskes · 2018-12-03T09:58:06Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

-                });
+                reassignPersistentTasks(event.state().getVersion());
+            } else {
+                periodicRechecker.schedule();


I think you only want to schedule if you have unassigned tasks?

bleskes · 2018-12-03T09:59:01Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+            @Override
+            public ClusterState execute(ClusterState currentState) {
+                ClusterState newState = reassignTasks(currentState);
+                periodicRechecker.schedule();


I think you only want to do this if you have unassigned tasks

Also, this should be done when the cluster state is published. See ClusterStateUpdateTask.clusterStatePublished

ClusterStateUpdateTask.clusterStatePublished is empty and final and also only gets called if the cluster state is updated by the execute method.

I think to make this reschedule after publishing or no-op I'd need to change the update class to implementAckedClusterStateTaskListener right? Then onAllNodesAcked will get called regardless of whether execute changes the cluster state or not.

Any objections before I start on this change?

argh, you are right. The one you need is clusterStateProcessed . Note that both of these are only called when the cluster state is committed, which is a big difference than when the execute method returns. It means that master got other master nodes to agree on the change.

It looks like overriding clusterStateProcessed is non-trivial now.

Since https://github.com/elastic/elasticsearch/pull/31241/files#diff-bc0dd060947fa9d8e3209d60f7255f1dR67 it cannot be overridden in a class that extends ClusterStateUpdateTask because it runs in the system context. I don't think running in the system context would be a problem for persistent task allocation decisions, but it creates a question of how to do this without opening up ClusterStateUpdateTask to future mistakes. I could move most of the functionality of ClusterStateUpdateTask into a new base class called SystemContextClusterStateUpdateTask, and have ClusterStateUpdateTask extend that and just override clusterStateProcessed with it's empty final version. Does that sound OK?

@droberts195 I may be missing something, but the way I read the other PR, it is exactly the intention to use clusterStateProcessed as the thread context is maintained there. See here.

Sorry, I was still looking at clusterStatePublished. clusterStateProcessed makes it easy.

bleskes · 2018-12-03T10:04:53Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+                logger.trace("periodic persistent task assignment check running");
+                ClusterState state = clusterService.state();
+                final PersistentTasksCustomMetaData tasks = state.getMetaData().custom(PersistentTasksCustomMetaData.TYPE);
+                if (tasks != null && anyTaskNeedsReassignment(tasks, state)) {


can we unify this logic with clusterChanged?

bleskes · 2018-12-03T10:05:16Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+    /**
+     * Class to periodically try to reassign unassigned persistent tasks.
+     */
+    private class PeriodicRechecker implements Runnable {


I think it's time to extract a utility class that's shared with org.elasticsearch.index.IndexService.BaseAsyncTask

tlrx · 2018-12-04T08:00:45Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

        this.clusterService = clusterService;
        clusterService.addListener(this);
+        clusterService.getClusterSettings().addSettingsUpdateConsumer(CLUSTER_TASKS_ALLOCATION_RECHECK_INTERVAL_SETTING,


Can you add the settings update consumer last, after the periodicRechecker is instanciated?

(The one that is still to be addressed is the integration test)

droberts195 · 2018-12-06T10:49:22Z

I think I've addressed all the feedback - please could you have another look @bleskes?

bleskes

Looking good - I left some more comments.

bleskes · 2018-12-06T12:11:25Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+    }
+
+    void setRecheckInterval(TimeValue recheckInterval) {
+        this.recheckInterval = recheckInterval;


this feels awkward. Why do we need to store it in two different places?

Good point - we don't

bleskes · 2018-12-06T12:14:34Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

@@ -241,21 +269,47 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS

    @Override
    public void clusterChanged(ClusterChangedEvent event) {
+        periodicRechecker.cancel();


I find it easier to reason about this if we only cancel a scheduled check if this even is really relevant i.e., if shouldReassignPersistentTasks returns true. This will also avoid the need to have a re-schedule if we didn't end up doing anything.

bleskes · 2018-12-06T12:18:16Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+    private boolean anyTaskNeedsReassignment(final PersistentTasksCustomMetaData tasks, final ClusterState state) {
+        for (PersistentTask<?> task : tasks.tasks()) {
+            if (needsReassignment(task.getAssignment(), state.nodes())) {
+                Assignment assignment = createAssignment(task.getTaskName(), task.getParams(), state);


this method has weird naimig (semantics?) - it's called anyTaskNeedsReassignment and here we know that needsReassignment returned true, why don't we return true immediately ?

How does anyTaskReassignmentRequired sound? I think in combination with renaming needsReassignment to isAssignedToValidNode it makes the code clearer.

bleskes · 2018-12-06T12:19:11Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+     */
+    private boolean isAnyTaskUnassigned(final PersistentTasksCustomMetaData tasks, final ClusterState state) {
+        for (PersistentTask<?> task : tasks.tasks()) {
+            if (needsReassignment(task.getAssignment(), state.nodes())) {


the method is called isAnyTaskUnassigned but the code checks that a task needsReassignment - can you double check what we need and rename accordingly?

needsReassignment predates this change. I'll rename it to isAssignedToValidNode, which reflects what it actually does.

bleskes · 2018-12-06T12:22:28Z

server/src/test/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTaskTests.java

+
+    public void testManualRepeat() throws InterruptedException {
+
+        final AtomicReference<CountDownLatch> latch = new AtomicReference<>(new CountDownLatch(1));


I wonder if you want a CyclicBarrier here.

bleskes · 2018-12-06T12:22:46Z

server/src/test/java/org/elasticsearch/index/IndexServiceTests.java

@@ -136,7 +136,6 @@ public void testRefreshTaskIsUpdated() throws IOException {
        assertNotSame(refreshTask, indexService.getRefreshTask());
        assertTrue(refreshTask.isClosed());
        assertFalse(refreshTask.isScheduled());
-        assertFalse(indexService.getRefreshTask().mustReschedule());


why is this changed?

Because checking whether close() has been called is now outside of the control of the derived class. mustReschedule() on the derived class just checks extra conditions over and above whether the task has been closed. (mustReschedule() is protected - it's not meant to be called by external users.) If AbstractAsyncTaskTests is doing its job properly then assertTrue(refreshTask.isClosed()); on line 137 should be enough in this file to confirm that the task will never be scheduled again.

droberts195 · 2018-12-10T16:58:57Z

@bleskes I think I've addressed your second batch of comments now - please can you have another look?

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

bleskes · 2018-12-11T09:49:30Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+                periodicRechecker.cancel();
+                reassignPersistentTasks(event.state().getVersion());
+            } else {
+                scheduleRecheckIfUnassignedTasks(event.state());


Something is needed here. I added a slightly different version back in 610f25e with a comment to make it clearer why.

bleskes · 2018-12-11T10:01:49Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+     * Returns true if any persistent task provided is unassigned,
+     * i.e. is not assigned or is assigned to a non-existing node.
+     */
+    private boolean isAnyTaskUnassigned(final PersistentTasksCustomMetaData tasks, final ClusterState state) {


still some naming issues here - how about isAnyTasksRequiresAssignment ?

bleskes · 2018-12-11T10:02:26Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

@@ -330,7 +400,7 @@ static boolean persistentTasksChanged(final ClusterChangedEvent event) {
    }

    /** Returns true if the task is not assigned or is assigned to a non-existing node */
-    public static boolean needsReassignment(final Assignment assignment, final DiscoveryNodes nodes) {
+    public static boolean isAssignedToValidNode(final Assignment assignment, final DiscoveryNodes nodes) {


how about requiresAssignment? (is assigned suggest it is but we also consider unassigned tasks)

bleskes · 2018-12-11T10:05:29Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+     * Submit a cluster state update to reassign any persistent tasks that need reassigning
+     */
+    private void reassignPersistentTasks(long currentStateVersion) {
+        logger.trace("checking task reassignment for cluster state {}", currentStateVersion);


it's weird to have a parameter passed in for just a version log under trace. Can you remove it and leave the logging in the event method?

bleskes · 2018-12-11T10:07:04Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+                if (tasks != null && anyTaskReassignmentRequired(tasks, state)) {
+                    reassignPersistentTasks(state.getVersion());
+                } else {
+                    scheduleRecheckIfUnassignedTasks(state);


this is unneeded no? we already checked in this if so why check again?

bleskes

Looking great. I made another pass and left some nits.

bleskes · 2018-12-12T10:46:02Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

-                });
+                reassignPersistentTasks();
+            } else if (periodicRechecker.isScheduled() == false &&
+                isAnyTaskUnassigned(event.state().getMetaData().custom(PersistentTasksCustomMetaData.TYPE))) {


Any time that a task can't be allocated when it should have we schedule a periodic check. This means this else clause is not needed. Can you clarify why this needed? (it also creates a race condition when one cluster change has triggered submission of a cluster state task via reassignPersistentTasks and then another cluster state change comes in and schedules a check).

The CI for commit 824ffab failed without this extra check: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/1905/console

shouldReassignPersistentTasks returns false if there are unassigned tasks but the assignment would still be the same for them.

createPersistentTask sets the initial assignment for a newly created task, which will then be the same when it goes through shouldReassignPersistentTasks, causing shouldReassignPersistentTasks to return false and bypass the scheduling of the periodic check.

A solution that avoids the race condition you pointed out is to schedule a recheck in the anonymous cluster state update class in createPersistentTask. That's in bcb1daf.

bleskes · 2018-12-12T10:53:04Z

server/src/main/java/org/elasticsearch/persistent/PersistentTasksClusterService.java

+
+        @Override
+        protected String getThreadPool() {
+            return ThreadPool.Names.GENERIC;


This can be same, I think.

bleskes · 2018-12-12T10:58:06Z

server/src/test/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTaskTests.java

+
+    public void testAutoRepeat() throws InterruptedException {
+
+        final AtomicReference<CountDownLatch> latch1 = new AtomicReference<>(new CountDownLatch(1));


you didn't like the CyclicBarrier ?

bleskes · 2018-12-12T11:01:12Z

server/src/test/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTaskTests.java

+                } catch (InterruptedException e) {
+                    fail("interrupted");
+                }
+                if (randomBoolean()) {


can you sample this in advance (before the task is generated) so it will be reproducible?

bleskes · 2018-12-12T11:02:04Z

server/src/test/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTaskTests.java

+
+        assertFalse(task.isScheduled());
+        task.rescheduleIfNecessary();
+        barrier.await(10, TimeUnit.SECONDS); // should happen very quickly


we typically wait indefinitely so if things get stuck we get a suite timeout + stack dump so we know where things got stuck

bleskes · 2018-12-12T11:02:53Z

server/src/test/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTaskTests.java

+        task.setInterval(TimeValue.timeValueMillis(1));
+        assertTrue(task.isScheduled());
+        // This should only take 2 milliseconds in ideal conditions, but allow 10 seconds in case of VM stalls
+        assertTrue(latch.await(10, TimeUnit.SECONDS));


same comment

bleskes · 2018-12-12T11:03:19Z

server/src/test/java/org/elasticsearch/index/IndexServiceTests.java

@@ -136,7 +136,6 @@ public void testRefreshTaskIsUpdated() throws IOException {
        assertNotSame(refreshTask, indexService.getRefreshTask());
        assertTrue(refreshTask.isClosed());
        assertFalse(refreshTask.isScheduled());
-        assertFalse(indexService.getRefreshTask().mustReschedule());


bleskes · 2018-12-12T11:06:45Z

server/src/test/java/org/elasticsearch/persistent/PersistentTasksExecutorIT.java

+        PersistentTasksClusterService persistentTasksClusterService =
+            internalCluster().getInstance(PersistentTasksClusterService.class, internalCluster().getMasterName());
+        // Speed up rechecks to a rate that is quicker than what settings would allow
+        persistentTasksClusterService.setRecheckInterval(TimeValue.timeValueMillis(10));


This will still make a slow test. How about calling the setTimeInterval method in PersistentTasksClusterService directly and settings it to something much faster?

I changed it to 1ms here and the test runs in 85ms on my laptop now.

bleskes

LGTM. Thanks @droberts195

droberts195 · 2018-12-13T07:31:15Z

run gradle build tests 1

Previously persistent task assignment was checked in the following situations: - Persistent tasks are changed - A node joins or leaves the cluster - The routing table is changed - Custom metadata in the cluster state is changed - A new master node is elected However, there could be situations when a persistent task that could not be assigned to a node could become assignable due to some other change, such as memory usage on the nodes. This change adds a timed recheck of persistent task assignment to account for such situations. The timer is suspended while checks triggered by cluster state changes are in-flight to avoid adding burden to an already busy cluster. Closes #35792

After #36069 the approach for reallocating ML persistent tasks after refreshing job memory requirements can be simplified.

droberts195 added >enhancement :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. v7.0.0 v6.6.0 labels Nov 29, 2018

droberts195 requested review from tlrx, bleskes and ywelsch November 29, 2018 17:43

droberts195 force-pushed the periodic_reallocation_attempt branch from 1b0794e to 8ff4e2e Compare November 29, 2018 17:44

bleskes suggested changes Dec 3, 2018

View reviewed changes

tlrx reviewed Dec 4, 2018

View reviewed changes

droberts195 added 3 commits December 5, 2018 13:05

Address some review comments

1acf74b

(The one that is still to be addressed is the integration test)

Adding an integration test

74cff93

Merge branch 'master' into periodic_reallocation_attempt

4825973

bleskes suggested changes Dec 6, 2018

View reviewed changes

droberts195 added 2 commits December 10, 2018 09:38

Merge branch 'master' into periodic_reallocation_attempt

4ebbbc6

Address further code review comments

28464ac

bleskes reviewed Dec 11, 2018

View reviewed changes

droberts195 added 3 commits December 11, 2018 12:07

Address further review comments

5e63d17

Merge branch 'master' into periodic_reallocation_attempt

bf54c2d

Revert some redundant changes to reduce PR size

824ffab

droberts195 mentioned this pull request Dec 11, 2018

[ML] Adapt to periodic persistent task refresh #36494

Merged

Schedule recheck if unassignable tasks exist after cluster state change

610f25e

droberts195 requested a review from bleskes December 12, 2018 10:03

Merge branch 'master' into periodic_reallocation_attempt

209c0a1

bleskes suggested changes Dec 12, 2018

View reviewed changes

Address some more review comments

9260ab1

droberts195 added 2 commits December 12, 2018 13:18

Schedule recheck if initial assignment is null on creation

bcb1daf

Fix auto repeat test

0907000

droberts195 requested a review from bleskes December 12, 2018 13:43

bleskes approved these changes Dec 12, 2018

View reviewed changes

droberts195 merged commit 13cb0fb into elastic:master Dec 13, 2018

droberts195 deleted the periodic_reallocation_attempt branch December 13, 2018 09:15

droberts195 added a commit that referenced this pull request Dec 14, 2018

[ML] Adapt to periodic persistent task refresh (#36494)

e0a58c9

After #36069 the approach for reallocating ML persistent tasks after refreshing job memory requirements can be simplified.

droberts195 mentioned this pull request Dec 19, 2018

[ML] Investigate alternative methods for sharing job memory usage information #34084

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019


		public void testManualRepeat() throws InterruptedException {

		final AtomicReference<CountDownLatch> latch = new AtomicReference<>(new CountDownLatch(1));


		public void testAutoRepeat() throws InterruptedException {

		final AtomicReference<CountDownLatch> latch1 = new AtomicReference<>(new CountDownLatch(1));

Periodically try to reassign unassigned persistent tasks #36069

Periodically try to reassign unassigned persistent tasks #36069

Conversation

droberts195 commented Nov 29, 2018

elasticmachine commented Nov 29, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 commented Dec 6, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 commented Dec 10, 2018

Choose a reason for hiding this comment

droberts195 Dec 11, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

droberts195 commented Dec 13, 2018

droberts195 Dec 11, 2018 •

edited