KAFKA-4677: Avoid unnecessary task movement across threads during rebalance #2429

dguy · 2017-01-24T17:04:05Z

Makes task assignment more sticky by preferring to assign tasks to clients that had previously had the task as active task. If there are no clients with the task previously active, then search for a standby. Finally falling back to the least loaded client.

dguy · 2017-01-24T17:37:07Z

@guozhangwang @mjsax @enothereska

asfbot · 2017-01-24T17:49:16Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1150/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-01-24T18:25:38Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1152/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-24T18:27:27Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1150/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-24T18:59:07Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1150/
Test PASSed (JDK 8 and Scala 2.12).

mjsax

Very nice PR! Code is super clean. Tests are amazing!

Still, some comments ;P

mjsax · 2017-01-24T22:29:46Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    }
+
+    @Test
+    public void shouldNotMigrateActiveTaskToOtherProcess() throws Exception {


Just wondering if this test is good. Seems like a "RoundRobinAssigner" might compute the exact same assignment. Some more "randomness" would be good IMHO. (Not a must though.)

Thanks. Yes it needs to actually test the reverse previous assignment, too. Which indeed highlights a problem.

mjsax · 2017-01-24T22:33:36Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+
+        taskAssignor.assign(0);
+
+        assertThat(clients.get(p2).activeTasks(), hasItems(task01));


hasItem(task01) -> equalTo(Collections.singleton(task01))

mjsax · 2017-01-24T22:34:57Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    }
+
+    @Test
+    public void shouldKeepActiveTaskStickynessWhenMoreClientThanActiveTasks() {


As above: add more "randomness"

mjsax · 2017-01-24T22:45:15Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    @Test
+    public void shouldAssignToClientWithStandbyIfProcessWithPreviousReachedCapacity() throws Exception {
+        final ClientState<TaskId> client = createClientWithPreviousActiveTasks(p1, 1, task00, task01);
+        // give p1 assignment so already at capacity


Comments unnecessary IMHO -- test is pretty well written and speaks for itself -- test method name nails it. (really start to enjoy you way of writing code -- need to get closer to this for my own code)

I'm happy you said that the comment is unnecessary! :-)

mjsax · 2017-01-24T22:58:38Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    @Test
+    public void shouldAssignStandbyTasksToClientThatDontHaveSameActiveTask() throws Exception {
+        final TaskId task03 = new TaskId(0, 3);
+        createClientWithActiveTasks(p1, 1, task00);


final Integer p4 = 4;
And use below. (or add as class member together with task03 in the first place -- even if only used here, it reduced "noise" in reading the test).

But why do you need 4 clients/task here? Three would do, too, IMHO.

Strictly speaking we don't need 4, but i wanted to mix it up a bit as 3 generally used throughout. Good to try something different.

mjsax · 2017-01-25T01:08:48Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    }
+
+    @Test
+    public void shouldAssignActiveAndStandbyTasks() throws Exception {


Seems to be covered by shouldAssignMultipleReplicasOfStandbyTask() already

Not quite - this is testing specifically that assign(numStandbyReplicas) assigns both active and standby tasks (when numStandbyReplicas is > 0). The other method is just testing StandbyTask assignment only

mjsax · 2017-01-25T01:10:18Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+
+    @Test
+    public void shouldAssignAtLeastOneTaskToEachClientIfPossible() throws Exception {
+        // add a process with 3 threads


remove comments (as above)

mjsax · 2017-01-25T01:15:15Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

@@ -17,40 +17,34 @@

 package org.apache.kafka.streams.processor.internals.assignment;

+
 import java.util.HashSet;
 import java.util.Set;

 public class ClientState<T> {


Not part of this PR: but why does ClientState have a generic type? It's only used with TaskId anyway? Or did I miss anything?

In unit tests we use Float as its type as well.

@mjsax - i had the same thought and did start down the path of changing it. Soon realised it was a battle for another day as it would've muddied the PR

@guozhangwang @dguy I did have a quick look at the tests using ClientState<Integer> -- I actually do not see any reason why those tests need to use Integer instead of TaskId. It might be a slight simplification for the test, but if we don't need the generic type in the actual code, a test does not justify to have a generic IMHO and I have a strong opinion about removing it. This would be a nice "beginner" PR so not much work for us. But I want your opinion before I create a JIRA for it.

@mjsax - yes i agree. As i mentioned above, i did start doing that as part of the PR but didn't want to add too much noise to it.

mjsax · 2017-01-25T01:29:51Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+            return null;
+        }
+        final HashSet<ID> constrainTo = new HashSet<>(ids);
+        constrainTo.retainAll(clientsWithin);


constrainTo is never used. I guess you can remove both lines. ids should not contain any invalid IDs anyway -- otherwise it's a bug and retainAll would only mask the bug, but not fix it.

oops - the constraint is needed, but obviously there wasn't a test case to prove it. Will have to add one

mjsax · 2017-01-25T01:39:53Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+                                                                            new TaskId(1, 2));
+
+        taskAssignor.assign(0);
+        assertTrue("expected client 2 to have more assigned tasks than client 1", clients.get(p2).assignedTaskCount() > clients.get(p1).assignedTaskCount());


Nit: line too long

Thinking about this, I guess we could improve StickyTaskAssinger. If I am not off, load balancing on stream basis is not optimal -- but I am also not sure if the effort to improve it is worth it...

If we extend this test to assign more tasks, let's say 12, client p2 will get 7 tasks assigned and p1 get 5 tasks assigned (while it would be better to assign 8 tasks to p2 such that all 3 thread get 4 tasks each). The problem is, that the capacity factors are not considered: p2 should get twice as many tasks assigned as p1 -- but the algorithm says only "more" -- and this more is determined be the diff of the capacity (ie. in this case p2 will get at most 2 more tasks assigned than p1.

Or maybe my analysis is wrong (I did not run the code and step through it.)

Any thoughts about this @dguy @guozhangwang -- this is a different issues than the newly created JIRA (even if we might be able to subsume it with it). The issue described here would also be there, if we only assign stateful standby tasks.

mjsax · 2017-01-25T06:43:31Z

One more general comment:

This RS raised the thought, that our standby task configuration is rather "all or nothing". Do you think it would make sense to allow a more fine grained configuration? I.e., allow to specify the number of standby replicas individually for each stateful operator?

Furthermore: if I understand the PR correctly, we create standby task even for stateless tasks -- thats seems to be rather odd (even if not necessarily harmful).

guozhangwang

There is a trade-off between load balancing and task stickiness. I think in many cases we should favor the latter over the former, but it seems we are only favoring that until the client has number of assigned tasks equal to the number of threads?

For example, say we have two client with one thread (hence one capacity) each and four tasks, and each client is assigned two tasks (1, 2), (3, 4); say a new topic is discovered an a new rebalance is triggered with no tasks added and no members changed, is it possible that we will then get (1, 3) and (2, 4) since each client runs over capacity?

guozhangwang · 2017-01-25T06:59:42Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

@@ -17,40 +17,34 @@

 package org.apache.kafka.streams.processor.internals.assignment;

+
 import java.util.HashSet;
 import java.util.Set;

 public class ClientState<T> {


In unit tests we use Float as its type as well.

guozhangwang · 2017-01-25T07:05:10Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

            "]";
    }
+
+    boolean reachedCapacity() {
+        return assignedTasks.size() >= capacity;


Capacity is the number of consumers, i.e. stream threads on this client; and one thread should be able to handle multiple tasks. Should we use this criterion that the assigned tasks has exceeded the number of threads to determine if the client is "full"?

We need something? What else are you suggesting? It is not like you can just say it is 4 * threads etc, as not all threads are equal. So it seems simplest just to stick with this.

@dguy This is related to the general comment I had above: do we still try to favor stickiness after each thread has assigned a task? In my general comment I asked for an example, and I'm thinking now about another slightly different one:

Say we have two client with one thread (hence one capacity) each and four tasks, and each client is assigned two tasks (1, 2), (3, 4), both of them are "over-capacity" now. When adding a new client with one thread, one of the task will be migrated to that client, say task4. The question is with this assignor, do we enforce the assignment to be (1, 2), (3), (4) or task2 can be randomly assigned to another client after task1 has been assigned to client1?

Yes - we will enforce the assignment to be along these lines. i.e., only 1 task will move. So the assignment could be either (1, 2), (3), (4) or (1), (3, 4), (2)
I'll commit some updates soon

guozhangwang · 2017-01-25T07:07:37Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+            for (final TaskId taskId : taskIds) {
+                final Set<ID> ids = findClientsWithoutAssignedTask(taskId);
+                if (ids.isEmpty()) {
+                    continue;


guozhangwang · 2017-01-25T07:10:35Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+        return previous;
+    }
+
+    private ClientState<TaskId> clientWithPreviousAssignment(final TaskId taskId, final Set<ID> clientsWithin) {


nit: findClientsWithPreviousAssignedTask?

guozhangwang · 2017-01-25T07:12:50Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+        return findLeastLoadedClientWithStandby(taskId, clientsWithin);
+    }
+
+    private ClientState<TaskId> findLeastLoadedClientWithStandby(final TaskId taskId, final Set<ID> clientsWithin) {


nit: findLeastLoadedClientWithPreviousStandByTask?

guozhangwang · 2017-01-25T07:46:52Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+                if (ids.isEmpty()) {
+                    continue;
+                }
+                final ClientState<TaskId> client = findClient(taskId, ids);


In the old approach, we use taskPairs trying to distribute the standby tasks to better tolerate failures; for example if we have four clients, four tasks, and replica.num = 1, we would rather have

client1: active: t1, standby: t2 client2: active: t2, standby: t3 client3: active: t3, standby: t4 client4: active: t4, standby: t1

Instead of

client1: active: t1, standby: t2 client2: active: t2, standby: t1 client3: active: t3, standby: t4 client4: active: t4, standby: t3

since otherwise if we lose client1 and client2, we lose t2 only in the first case but will lose both t1 and t2 in the second case.

Thanks @guozhangwang. That makes sense i'll add it back in.

asfbot · 2017-01-25T12:01:19Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1184/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-25T12:02:30Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1182/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-25T12:11:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1182/
Test PASSed (JDK 7 and Scala 2.10).

dguy · 2017-01-25T17:58:42Z

@guozhangwang, @mjsax - comments addressed
@guozhangwang regarding:

There is a trade-off between load balancing and task stickiness. I think in many cases we should favor the latter over the former, but it seems we are only favoring that until the client has number of assigned tasks equal to the number of threads?

For example, say we have two client with one thread (hence one capacity) each and four tasks, and each client is assigned two tasks (1, 2), (3, 4); say a new topic is discovered an a new rebalance is triggered with no tasks added and no members changed, is it possible that we will then get (1, 3) and (2, 4) since each client runs over capacity?

I've changed it so that it will tend to favour sticky ness more, so the scenario you have mentioned wont happen. In this case the previous assignments will stay the same, but 1 of the clients will get the new task.

@mjsax, regarding your general comment - as discussed offline i think this is a task for another JIRA. I raised https://issues.apache.org/jira/browse/KAFKA-4696.

asfbot · 2017-01-25T18:22:13Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1191/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-25T18:42:49Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1195/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-25T18:43:09Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1193/
Test PASSed (JDK 8 and Scala 2.12).

mjsax · 2017-01-25T19:45:53Z

Any thought about this:

This RS raised the thought, that our standby task configuration is rather "all or nothing". Do you think it would make sense to allow a more fine grained configuration? I.e., allow to specify the number of standby replicas individually for each stateful operator?

guozhangwang · 2017-01-25T19:55:55Z

Any thought about this:

It is an interesting question. Today num.replicas is on tasks, which is abstracted away for end users; so as you mentioned if we want to be user-customizable finer-grained then it needs to be per-operator, which I feel would be quite tricky for users to understand / specify.

Any thoughts about this @dguy @guozhangwang -- this is a different issues than the newly created JIRA (even if we might be able to subsume it with it). The issue described here would also be there, if we only assign stateful standby tasks.

I think I agree. It seems that once all clients are over capacity then we just treat them equally when assigning (assuming no prev standby / active tasks considered). It seems better to assign the tasks to over-capacity clients also to be depending on their total capacity. For example, if client1 has cap. 1, client2 has cap. 2, and we have a total of 6 tasks, it would be better to assign them as 2 : 4 to these two clients than 3 : 3?

asfbot · 2017-01-25T20:12:30Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1193/
Test FAILed (JDK 7 and Scala 2.10).

mjsax · 2017-01-25T20:32:20Z

About fine grained configuration:
Yes, users should not need to think about tasks. It should be on operator basis. Thus, we would need to make standby tasks "smarter" and only populate some stores (as requested by the user) in the background. Would be a bigger change, but not impossible.

For example, if client1 has cap. 1, client2 has cap. 2, and we have a total of 6 tasks, it would be better to assign them as 2 : 4 to these two clients than 3 : 3?

Yes, that is exactly what I had in mind.

guozhangwang · 2017-01-25T21:49:51Z

https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1193/testReport/junit/org.apache.kafka.streams.integration/QueryableStateIntegrationTest/queryOnRebalance_0_/

Is this related to the change or maybe an existing issue that is not fixed yet?

dguy · 2017-01-26T08:53:49Z

@guozhangwang - i believe the test failure is unrelated.

dguy · 2017-01-26T09:58:40Z

@mjsax - regarding this:

Any thought about this:

This RS raised the thought, that our standby task configuration is rather "all or nothing". Do you think it would make sense to allow a more fine grained configuration? I.e., allow to specify the number of standby replicas individually for each stateful operator?

We probably need a wider discussion on this.

dguy · 2017-01-26T10:08:07Z

@guozhangwang

I think I agree. It seems that once all clients are over capacity then we just treat them equally when assigning (assuming no prev standby / active tasks considered). It seems better to assign the tasks to over-capacity clients also to be depending on their total capacity. For example, if client1 has cap. 1, client2 has cap. 2, and we have a total of 6 tasks, it would be better to assign them as 2 : 4 to these two clients than 3 : 3?

This already happens. The test shouldAssignMoreTasksToClientWithMoreCapacity shows this.
@mjsax was referring to having more tasks. I'll expand the test and see what i can come up with

guozhangwang · 2017-01-26T22:04:36Z

This already happens. The test shouldAssignMoreTasksToClientWithMoreCapacity shows this.
@mjsax was referring to having more tasks. I'll expand the test and see what i can come up with

I think we are talking about the same, that when all clients are over-capacity already, should the num.tasks be proportional to client's total capacity, and from the tests it seems the case.

guozhangwang

One more comment, otherwise LGTM. Thanks @dguy for the patch!!

Let's update the created JIRA for keeping track of the broader discussion of per-operator num.replica configuration.

guozhangwang · 2017-01-26T22:08:31Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

+    }
+
+    boolean hasMoreAvailableCapacityThan(final ClientState<T> other) {
+        final int otherLoad = other.assignedTaskCount() / other.capacity;


Should these two be float than integers, otherwise e.g. 5/3 and 4/3 would be the same?

asfbot · 2017-01-26T22:43:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1269/
Test PASSed (JDK 7 and Scala 2.10).

dguy · 2017-01-27T08:46:01Z

Thanks @guozhangwang. I've updated the new JIRA with the the per-operator num.replica info. I've also re-opened the queryOnRebalance JIRA.
Updated the code to use double rather than integer in the hasMoreAvailableCapacityThan - also added checks for capacity <= 0

asfbot · 2017-01-27T10:17:28Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1285/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-01-27T10:18:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1287/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-01-27T12:36:03Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1285/
Test FAILed (JDK 8 and Scala 2.12).

guozhangwang · 2017-01-27T15:46:55Z

retest this please

asfbot · 2017-01-27T16:33:07Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1291/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-27T17:30:26Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1289/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-27T17:33:48Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1289/
Test FAILed (JDK 8 and Scala 2.12).

mjsax · 2017-01-30T19:22:21Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

    }

-    public ClientState(double capacity) {
+    ClientState(int capacity) {


add final

mjsax · 2017-01-30T19:22:30Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

        this(new HashSet<T>(), new HashSet<T>(), new HashSet<T>(), new HashSet<T>(), new HashSet<T>(), capacity);
    }

-    private ClientState(Set<T> activeTasks, Set<T> standbyTasks, Set<T> assignedTasks, Set<T> prevActiveTasks, Set<T> prevAssignedTasks, double capacity) {
+    private ClientState(Set<T> activeTasks, Set<T> standbyTasks, Set<T> assignedTasks, Set<T> prevActiveTasks, Set<T> prevAssignedTasks, int capacity) {


add final to all

Unify this with ClientState(int capacity) ? (ie, remove the private constructor with long parameter list and just initialize members directly.

The private constructor is there for the copy method. I'm going to leave it as is.

mjsax · 2017-01-30T19:22:42Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/ClientState.java

@@ -59,28 +53,102 @@ private ClientState(Set<T> activeTasks, Set<T> standbyTasks, Set<T> assignedTask
    }

    public void assign(T taskId, boolean active) {


add final to both

well i can yes, but i don't believe i've changed this!

mjsax · 2017-01-30T19:54:14Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+            for (final TaskId taskId : taskIds) {
+                final Set<ID> ids = findClientsWithoutAssignedTask(taskId);
+                if (ids.isEmpty()) {
+                    log.warn("Unable to assign replica for task [{}]", taskId);


Don't like that we write this message multiple time.
What about cloning taksIds before the for loop, and remove a taskId from it, when it cannot be assigned.
Also update the log message like
log.warn("Could only create {} stand-by tasks for task [{}] instead of {} as requested. Not enough task available. You need to increase the number of threads to maintain the requested number of standby tasks.", i, taskId, numStandbyReplicas);
Or some similar hint want the issue is and how to fix it.

updated so it only logs once. Also improved the log message. thanks

mjsax · 2017-01-30T21:04:18Z

...s/src/test/java/org/apache/kafka/streams/processor/internals/assignment/ClientStateTest.java

+
+public class ClientStateTest {
+
+    private final ClientState<Integer> client = new ClientState<>(1);


Should this be an new instance for each test method? I am wondering how test could pass?

It is a new instance for each test method. JUnit creates a new instance of the class for each test method.

I though some annotation would be needed for that... What it the JUnit rule for this? What members get re-created for each single test method?

All non-static members. You get a new instance of the Test class for each test method.

Thanks! Good to know :)

mjsax · 2017-01-30T21:24:17Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+
+    @Test
+    public void shouldAssignTasksToClientWithPreviousStandbyTasks() throws Exception {
+        final ClientState<TaskId> client1 = createClientWithPreviousActiveTasks(p1, 1);


Can't you use createClient? This call confused me... (same below)

Yeah - i thought i'd replaced all of those

mjsax · 2017-01-30T21:27:53Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+        assertThat(clients.get(p2).standbyTasks(), not(hasItems(task01)));
+        assertThat(clients.get(p3).standbyTasks(), not(hasItems(task02)));
+        assertThat(clients.get(p4).standbyTasks(), not(hasItems(task03)));
+        assertThat(allStandbyTasks(), equalTo(Arrays.asList(task00, task01, task02, task03)));


Maybe check, that each tasks has one standby task assigned?

It is not always possible (with the current algo) to guarantee that each task will get 1 task. It depends somewhat on the previous active tasks and the order that the tasks are iterated in. I can change the test to make sure that at least 3 tasks have standby and that no task has more than 2

I general that might be correct. But for this specific test setup, we know the pre conditions: the overall logic dictates, that active tasks get reassigned to there previous client -- it this would not happen, there would be something wrong. Thus, the 4 active task are "pinned" to the 4 clients by precondition and thus, in order to load balance the standby task, each client must get exactly one -- or do I miss something?

Yeah you missed something. I can make the setup of the test such that this happens, but with the current params setup we get 1 client with 2 tasks, 2 clients with 1 task, and 1 client with 0 tasks. This happens because we would otherwise end up with the standby and active task for task003 being on the same node i.e,
active assignment:
task00 -> p1
task01 -> p2
task02 -> p3
task03 -> p4

standby assignment
task00 -> p2
task01 -> p3
task02 -> p1
task03 -> p1 (as it can't go to p4)

Thanks for clarification. Does make sense now -- I did not consider the "task pairs" heuristic.

mjsax · 2017-01-30T21:30:43Z

...est/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignorTest.java

+    }
+
+    @Test
+    public void shouldNotAssignStandbyTaskReplicasWhenNoClientHasCapacityLeftOver() throws Exception {


It's not about capacity, is it? It's about having not task that does not have the task assigned (as active or standby). -> shouldNotAssignStandbyTaskReplicasWhenNoClientAvailableWithoutHavingTheTaskAssigned

mjsax · 2017-01-31T01:21:08Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+                && client.reachedCapacity()
+                && availableCapacity <= taskIds.size()
+                && hasClientsWithZeroTasks();
+    }


I don't get this condition:
I understand (1) !hasNewTasks (that is only true, if we scale out and not loose any existing client, right?) and I understand (2) client.reachedCapacity() (no reason to assign somewhere else I client has spare capacity).

But the last two conditions puzzle me.

(3) Let's say I have 1 client with capacity 1 and overall 2 tasks, both assigned to it (ie, taskIds.size() == 2). I add a new client with capacity 2. Now availableCapacity == 3 and taskIds.size() is still 2. Thus availableCapacity <= taskIds.size() (3 <= 2) is false, and no load rebalancing would happen. However, this would result in an overload of the original client (2 tasks assigned with capacity 1), while the new client is under utilized (0 tasks assigned with capacity 2). What do I miss?

(4) Let's say I have 1 client with capacity 1 and overall 3 tasks, all assigned to it (ie, taskIds.size() == 3). I add a new client with capacity 2. Condition (3) would be met because (3 <= 3) so one task would be re-assigned from it's original client to the new client. But not the second tasks, because after the first client got a task assigned there is no client with zero tasks left. Thus the original client would still be overloaded (2 tasks assigned with capacity 1) while the new client is under utilized (1 tasks assigned with capacity 2). What do I miss?

Thanks for pointing out. It was an evolution of tests that led to it, but the third condition is not required and the the 4th condition should be hasClientsWithMoreAvailableCapacity(client). I've added a test and changed it. Thanks!

mjsax · 2017-01-31T01:38:17Z

...rc/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java

+        this.clients = clients;
+        this.taskIds = taskIds;
+        this.availableCapacity = sumCapacity(clients.values());
+        taskPairs = new TaskPairs(taskIds.size() * (taskIds.size() - 1) * 2);


/ 2, not * 2

Oops - thanks!

asfbot · 2017-01-31T10:55:40Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1348/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-01-31T10:58:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1352/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-31T11:18:54Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1348/
Test FAILed (JDK 8 and Scala 2.12).

dguy · 2017-01-31T11:31:54Z

comments all addressed

asfbot · 2017-01-31T11:38:06Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1350/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-01-31T12:17:47Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1350/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-31T12:19:39Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1354/
Test PASSed (JDK 8 and Scala 2.11).

mjsax · 2017-01-31T17:37:38Z

@dguy One more minor comment.

mjsax

LGTM

guozhangwang · 2017-02-01T04:17:15Z

LGTM. Merged to trunk.

guozhangwang · 2017-02-06T17:48:34Z

@mjsax Re "removing templates in ClientState": I agree. Feel free to file a newbie JIRA for it.

mjsax · 2017-02-06T19:58:58Z

@dguy @guozhangwang Done: https://issues.apache.org/jira/browse/KAFKA-4738

…alance Makes task assignment more sticky by preferring to assign tasks to clients that had previously had the task as active task. If there are no clients with the task previously active, then search for a standby. Finally falling back to the least loaded client. Author: Damian Guy <damian.guy@gmail.com> Reviewers: Matthias J. Sax, Guozhang Wang Closes apache#2429 from dguy/kafka-4677

sticky task assignor

4d7455f

dguy force-pushed the kafka-4677 branch from a7bb5f5 to 4d7455f Compare January 24, 2017 17:36

mjsax reviewed Jan 25, 2017

View reviewed changes

guozhangwang reviewed Jan 25, 2017

View reviewed changes

some feedback

fb94e71

dguy added 3 commits January 25, 2017 16:45

add back task pairs. more task stickyness

5167e35

avoid moving tasks when new assignment is just adding new tasks

01ceed5

added a couple more tests

d2a0e22

guozhangwang reviewed Jan 26, 2017

View reviewed changes

check for capapacity <= 0 and use double instead of integer

92f19bd

mjsax reviewed Jan 31, 2017

View reviewed changes

address comments

292b21c

address comments

c7b61fe

mjsax approved these changes Jan 31, 2017

View reviewed changes

asfgit closed this in 0b48ea1 Feb 1, 2017

dguy deleted the kafka-4677 branch February 17, 2017 00:58


		taskAssignor.assign(0);

		assertThat(clients.get(p2).activeTasks(), hasItems(task01));

		@@ -59,28 +53,102 @@ private ClientState(Set<T> activeTasks, Set<T> standbyTasks, Set<T> assignedTask
		}

		public void assign(T taskId, boolean active) {


		public class ClientStateTest {

		private final ClientState<Integer> client = new ClientState<>(1);

KAFKA-4677: Avoid unnecessary task movement across threads during rebalance #2429

KAFKA-4677: Avoid unnecessary task movement across threads during rebalance #2429

Conversation

dguy commented Jan 24, 2017

dguy commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

mjsax left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dguy Jan 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax commented Jan 25, 2017

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented Jan 25, 2017

asfbot commented Jan 25, 2017

asfbot commented Jan 25, 2017

dguy commented Jan 25, 2017

asfbot commented Jan 25, 2017

asfbot commented Jan 25, 2017

asfbot commented Jan 25, 2017

mjsax commented Jan 25, 2017

guozhangwang commented Jan 25, 2017

asfbot commented Jan 25, 2017

mjsax commented Jan 25, 2017

guozhangwang commented Jan 25, 2017

dguy commented Jan 26, 2017

dguy commented Jan 26, 2017

dguy commented Jan 26, 2017

guozhangwang commented Jan 26, 2017

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented Jan 26, 2017

dguy commented Jan 27, 2017

asfbot commented Jan 27, 2017

asfbot commented Jan 27, 2017

asfbot commented Jan 27, 2017

guozhangwang commented Jan 27, 2017

asfbot commented Jan 27, 2017

asfbot commented Jan 27, 2017

asfbot commented Jan 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dguy Jan 25, 2017 •

edited

Loading