Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-4540: Suspended tasks that are not assigned to the StreamThread need to be closed before new active and standby tasks are created #2266

Closed
wants to merge 7 commits into from

Conversation

dguy
Copy link
Contributor

@dguy dguy commented Dec 16, 2016

During onPartitionsAssigned first close, and remove, any suspended StandbyTasks that are no longer assigned to this consumer.

@dguy
Copy link
Contributor Author

dguy commented Dec 16, 2016

@mjsax @guozhangwang @enothereska - hopefully this is the last in the series of PRs to do with StandbyTask handling!

@asfbot
Copy link

asfbot commented Dec 16, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/186/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Dec 16, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/184/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Dec 16, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/185/
Test FAILed (JDK 8 and Scala 2.12).

@mjsax
Copy link
Member

mjsax commented Dec 16, 2016

All three builds failed with a compile error...

@@ -233,6 +233,9 @@ public void onPartitionsAssigned(Collection<TopicPartition> assignment) {
StreamThread.this.getName(), assignment);

setStateWhenNotInPendingShutdown(State.ASSIGNING_PARTITIONS);
// do this first as we may have suspended standby tasks that
// will become active
closeNonAssignedSuspendedStandbyTasks();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move closing non-assigned tasks into this method too? Would be a cleaner workflow -- close all not reassigned tasks (regular and stand-by), than reuse suspended tasks and create new tasks.

@@ -965,34 +985,14 @@ private void removeSuspendedTasks() {
// Close task and state manager
for (final AbstractTask task : suspendedTasks.values()) {
task.close();
task.flushState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is already done in suspendTasksAndState so there is no need to flush again.

task.closeStateManager();
// flush out any extra data sent during close
producer.flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is already done in suspendTasksAndState. So this is not required.

@asfbot
Copy link

asfbot commented Dec 19, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/234/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Dec 19, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/235/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Dec 19, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/233/
Test FAILed (JDK 7 and Scala 2.10).

Copy link
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

// do this first as we may have suspended standby tasks that
// will become active
closeNonAssignedSuspendedTasks();
// closeNonAssignedSuspendedStandbyTasks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!

@@ -848,7 +881,7 @@ private void addStreamTasks(Collection<TopicPartition> assignment) {
}

// destroy any remaining suspended tasks
removeSuspendedTasks();
// removeSuspendedTasks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need to remove the comments.

final Set<TaskId> currentActiveTaskIds = partitionAssignor.activeTasks().keySet();
final Iterator<Map.Entry<TaskId, StreamTask>> activeTaskIterator = suspendedActiveTasks.entrySet().iterator();
while (activeTaskIterator.hasNext()) {
closeAndRemoveIfNotAssigned(currentActiveTaskIds, activeTaskIterator.next(), activeTaskIterator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we return a boolean indicating if this task is removed or not? Then in line 964 below we can instead check that this map should then be empty since we have removed both recycled tasks or non-recycled ones (in line 863).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how returning a boolean helps with anything. It would just indicate that a single task has been removed. If we really want to make sure we've removed all the suspended tasks we can just check if the map is empty in addStreamTasks after the for loop.

@@ -960,34 +990,14 @@ private void removeStandbyTasks() {
}

private void removeSuspendedTasks() {
log.info("{} Removing all suspended tasks [{}]", logPrefix, suspendedTasks.keySet());
log.info("{} Removing all suspended tasks [{}]", logPrefix, suspendedActiveTasks.keySet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments above. If that case should we be guaranteed that suspendedActiveTasks is always an empty map here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is no longer used and should've been removed

} else {
throw e;
}
} catch (final LockException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we change the thrown exceptions in ProcessorStateManager? I thought we are still throwing ProcessorStateException which embed the LockException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it just throws a LockException - i saw this when i was trying to test it.

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/262/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/260/
Test FAILed (JDK 7 and Scala 2.10).

@dguy
Copy link
Contributor Author

dguy commented Dec 20, 2016

Updated this to take into account the partition assignment when finding StreamTasks to close. For a given TaskId the partition assignment can change, i.e., regex subscriptions.
I don't believe this matters for a StandbyTask as any StateStore will only have a single source topic, so even if a regex subscription changed this wouldn't change the partitions that are consumed to update the StateStore.

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/261/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/265/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/264/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Dec 20, 2016

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/263/
Test PASSed (JDK 7 and Scala 2.10).

@asfgit asfgit closed this in 56c6174 Dec 20, 2016
@guozhangwang
Copy link
Contributor

Merged to trunk.

@dguy dguy deleted the kafka-4540 branch January 13, 2017 08:39
soenkeliebau pushed a commit to soenkeliebau/kafka that referenced this pull request Feb 7, 2017
… need to be closed before new active and standby tasks are created

During `onPartitionsAssigned` first close, and remove, any suspended `StandbyTasks` that are no longer assigned to this consumer.

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2266 from dguy/kafka-4540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants