-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-4677: [Follow Up] add optimization to StickyTaskAssignor for rolling rebounce #2609
Conversation
@guozhangwang based on my investigation into increasing the session timeout on rebalance. There is an optimization that can be made to the |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Is the Jenkins failure possibly related, as I did not see this before:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One meta-comment is whether we should have some additional checks before applying this short-cut optimization. For example:
- Check if the result assignment is balanced.
- There are no new tasks created: sum of
prevAssignedTasks
should be equal to the new list of tasks. - no "new" client is likely joining (we cannot tell for sure):
clientsWithoutPreviousActiveTasks
should have some tasks inprevAssignedTasks
meaning that it is not likely new clients; - no "old" client was likely to be dropped (again we cannot tell for sure so just heuristics): sum of
clientsWithoutPreviousActiveTasks
's prevAssignedTasks should covernoPreviousActiveAssignment
, otherwise it is likely that some client has been dropped from the group.
Some related commit:
@@ -35,13 +35,11 @@ | |||
private final Map<TaskId, ID> previousActiveTaskAssignment = new HashMap<>(); | |||
private final Map<TaskId, Set<ID>> previousStandbyTaskAssignment = new HashMap<>(); | |||
private final TaskPairs taskPairs; | |||
private final int availableCapacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sumCapacity
can be removed as well.
@guozhangwang i don't see how this change would have caused that failure.
The assignment is not always balanced, so i'm not sure this is going to be very effective.
It is already checked that no new tasks are created. line 79:
Yes we can add a check for this
I'll add some tests and checks for this |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
@guozhangwang i've re-worked it a bit. I removed the optimisation as such as i found a more general solution. |
rework assignmnent such that it first assigns tasks to previous active clients where possible.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dguy , I had a nit comment otherwise LGTM.
Some corner cases for this general solution would be 1) multiple threads on the same JVM and hence lots of clients would claim to "have seen this task" before, and hence we may end up with some shuffling, but since they are likely to locate on the same node it is OK; 2) some nodes have not cleaned up its state directory and hence claimed seen it before, in that case we may still have some shuffling and different client's local state store may be either far or close to the "changelog end offset", but this should be a rare case.
private void assign(final TaskId taskId, final Set<ID> clientsWithin, final boolean active) { | ||
final ClientState<TaskId> client = findClient(taskId, clientsWithin); | ||
taskPairs.addPairs(taskId, client.assignedTasks()); | ||
client.assign(taskId, active); | ||
} | ||
|
||
private void assignTaskToClient(final Set<TaskId> assigned, final TaskId taskId, final ClientState<TaskId> client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename to assignTasksToClient
and the previous assign
function to allocateTaskWithClientCandidates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll rename the assign
method as suggested, but this one is actually just assignTaskToClient
. The Set<TaskId>
param is just to keep track of what has been assigned.
@guozhangwang with respect to multiple threads on the same JVM. There is only 1 Client per JVM, i.e., if there were 4 threads then the capacity of the Client would be 4. w.r.t. point 2. Yes i agree. I think we could add a further optimization where by we send the checkpointed offsets for each of these and try and use the client with the most recent offset. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged to trunk.
Detect when a rebalance has happened due to one or more existing nodes bouncing. Keep assignment of previous active tasks the same and only assign the tasks that were not active to the new clients.