Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10199: Consider tasks in state updater when computing offset sums #13925

Merged
merged 4 commits into from Jul 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -1141,25 +1141,30 @@ public Map<TaskId, Long> getTaskOffsetSums() {
// Not all tasks will create directories, and there may be directories for tasks we don't currently own,
// so we consider all tasks that are either owned or on disk. This includes stateless tasks, which should
// just have an empty changelogOffsets map.
for (final TaskId id : union(HashSet::new, lockedTaskDirectories, tasks.allTaskIds())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems with the state updated enabled, tasks is actually only containing "running tasks". It seems appropriate the rename this variable to runningTasks (can also happen in a follow up PR).

I am actually also wondering if we still need this Tasks container any longer to begin with? The purpose of the Tasks container was to simplify TaskManager that manages both active and standby tasks. With the state updated (from my understanding) the TaskManager only manages active tasks, while standby tasks will be owned by the state-updated-thread (would it still be useful for the state-updated-thread to use Tasks container, given that is also own active tasks as long as they are restoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems with the state updated enabled, tasks is actually only containing "running tasks". It seems appropriate the rename this variable to runningTasks (can also happen in a follow up PR).

The old code path with disabled state updater does still exist and we can disable the state updater if we encounter a major bug after releasing. So, I would postpone such renamings to the removal of the old code path.

I am actually also wondering if we still need this Tasks container any longer to begin with?

I would keep it, because it allows to cleanly set a specific state of the task manager in unit tests. Anyways, I would wait for the upcoming thread refactoring to make such changes.

would it still be useful for the state-updated-thread to use Tasks container, given that is also own active tasks as long as they are restoring?

I do not think so, since access by the state updater would imply that the tasks registry (aka tasks container) needs to be concurrently accessed. For this reason, we defined a invariant, that a task can only be owned either by the stream thread or by the state updater, but not both. Sharing the tasks registry between stream thread and state updater would break that invariant. If you meant to use an separate instance of the tasks registry for the state updater, that would be not useful IMO.

final Task task = tasks.contains(id) ? tasks.task(id) : null;
// Closed and uninitialized tasks don't have any offsets so we should read directly from the checkpoint
if (task != null && task.state() != State.CREATED && task.state() != State.CLOSED) {
final Map<TaskId, Task> tasks = allTasks();
final Set<TaskId> lockedTaskDirectoriesOfNonOwnedTasksAndClosedAndCreatedTasks =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I recommended this change thinking that lockedTaskDirectories always includes all ClosedAndCreatedTasks -- I think it does right? So it should be enough to assign this to lockedTaskDirectories.

Copy link
Contributor Author

@cadonna cadonna Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is guarantee that lockedTaskDirectories contains any tasks the client owns. lockedTaskDirectories are just the non-empty task directories in the state directory when a rebalance starts. However, a task directory is created when a task is created, i.e., it is in state CREATE. A task directory is not deleted when a task is closed, i.e., in state CLOSED. This might be a correlation and not a thought-out invariant. At least, the original code did not rely on this since it used union(HashSet::new, lockedTaskDirectories, tasks.allTaskIds()).
I am also somehow reluctant to rely on such -- IMO -- brittle invariant.
As an example, in future we could decide to move the creation of the task directory to other parts of the code -- like when the task is initialized -- which would mean that there is a interval in which the task is in state CREATED but does not have a task directory.

Copy link
Member

@lucasbru lucasbru Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if there is no task directory, there is no checkpoint to process. So it's safe to not do anything in this case.

All you'd do by adding more tasks is to later skip on the check checkPointFile.exists().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could make the simplified code break, is that we decide to not release the lock before transitioning to the CLOSED state.

So yeah, being defensive here and going through all CREATED and CLOSED tasks as well to make sure that they do not have state directories that are locked but not inside lockedTaskDirectories sound good to me as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree with you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be defensive then!

union(HashSet::new, lockedTaskDirectories, tasks.keySet());
for (final Task task : tasks.values()) {
if (task.state() != State.CREATED && task.state() != State.CLOSED) {
final Map<TopicPartition, Long> changelogOffsets = task.changelogOffsets();
if (changelogOffsets.isEmpty()) {
log.debug("Skipping to encode apparently stateless (or non-logged) offset sum for task {}", id);
log.debug("Skipping to encode apparently stateless (or non-logged) offset sum for task {}",
task.id());
} else {
taskOffsetSums.put(id, sumOfChangelogOffsets(id, changelogOffsets));
taskOffsetSums.put(task.id(), sumOfChangelogOffsets(task.id(), changelogOffsets));
}
} else {
final File checkpointFile = stateDirectory.checkpointFileFor(id);
try {
if (checkpointFile.exists()) {
taskOffsetSums.put(id, sumOfChangelogOffsets(id, new OffsetCheckpoint(checkpointFile).read()));
}
} catch (final IOException e) {
log.warn(String.format("Exception caught while trying to read checkpoint for task %s:", id), e);
lockedTaskDirectoriesOfNonOwnedTasksAndClosedAndCreatedTasks.remove(task.id());
}
}

for (final TaskId id : lockedTaskDirectoriesOfNonOwnedTasksAndClosedAndCreatedTasks) {
final File checkpointFile = stateDirectory.checkpointFileFor(id);
try {
if (checkpointFile.exists()) {
taskOffsetSums.put(id, sumOfChangelogOffsets(id, new OffsetCheckpoint(checkpointFile).read()));
}
} catch (final IOException e) {
log.warn(String.format("Exception caught while trying to read checkpoint for task %s:", id), e);
}
}

Expand All @@ -1177,14 +1182,15 @@ private void tryToLockAllNonEmptyTaskDirectories() {
// current set of actually-locked tasks.
lockedTaskDirectories.clear();

final Map<TaskId, Task> allTasks = allTasks();
for (final TaskDirectory taskDir : stateDirectory.listNonEmptyTaskDirectories()) {
final File dir = taskDir.file();
final String namedTopology = taskDir.namedTopology();
try {
final TaskId id = parseTaskDirectoryName(dir.getName(), namedTopology);
if (stateDirectory.lock(id)) {
lockedTaskDirectories.add(id);
if (!tasks.contains(id)) {
if (!allTasks.containsKey(id)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this debug log, we did only consider tasks owned by the stream thread.

log.debug("Temporarily locked unassigned task {} for the upcoming rebalance", id);
}
}
Expand Down
Expand Up @@ -1592,6 +1592,76 @@ public void shouldComputeOffsetSumForNonRunningActiveTask() throws Exception {
computeOffsetSumAndVerify(changelogOffsets, expectedOffsetSums);
}

@Test
public void shouldComputeOffsetSumForRestoringActiveTaskWithStateUpdater() throws Exception {
final StreamTask restoringStatefulTask = statefulTask(taskId00, taskId00ChangelogPartitions)
.inState(State.RESTORING).build();
final long changelogOffset = 42L;
when(restoringStatefulTask.changelogOffsets()).thenReturn(mkMap(mkEntry(t1p0changelog, changelogOffset)));
expectLockObtainedFor(taskId00);
makeTaskFolders(taskId00.toString());
final Map<TopicPartition, Long> changelogOffsetInCheckpoint = mkMap(mkEntry(t1p0changelog, 24L));
writeCheckpointFile(taskId00, changelogOffsetInCheckpoint);
final TasksRegistry tasks = Mockito.mock(TasksRegistry.class);
final TaskManager taskManager = setUpTaskManager(ProcessingMode.AT_LEAST_ONCE, tasks, true);
when(stateUpdater.getTasks()).thenReturn(mkSet(restoringStatefulTask));
replay(stateDirectory);
taskManager.handleRebalanceStart(singleton("topic"));

assertThat(taskManager.getTaskOffsetSums(), is(mkMap(mkEntry(taskId00, changelogOffset))));
}

@Test
public void shouldComputeOffsetSumForRestoringStandbyTaskWithStateUpdater() throws Exception {
final StandbyTask restoringStandbyTask = standbyTask(taskId00, taskId00ChangelogPartitions)
.inState(State.RUNNING).build();
final long changelogOffset = 42L;
when(restoringStandbyTask.changelogOffsets()).thenReturn(mkMap(mkEntry(t1p0changelog, changelogOffset)));
expectLockObtainedFor(taskId00);
makeTaskFolders(taskId00.toString());
final Map<TopicPartition, Long> changelogOffsetInCheckpoint = mkMap(mkEntry(t1p0changelog, 24L));
writeCheckpointFile(taskId00, changelogOffsetInCheckpoint);
final TasksRegistry tasks = Mockito.mock(TasksRegistry.class);
final TaskManager taskManager = setUpTaskManager(ProcessingMode.AT_LEAST_ONCE, tasks, true);
when(stateUpdater.getTasks()).thenReturn(mkSet(restoringStandbyTask));
replay(stateDirectory);
taskManager.handleRebalanceStart(singleton("topic"));

assertThat(taskManager.getTaskOffsetSums(), is(mkMap(mkEntry(taskId00, changelogOffset))));
}

@Test
public void shouldComputeOffsetSumForRunningStatefulTaskAndRestoringTaskWithStateUpdater() {
final StreamTask runningStatefulTask = statefulTask(taskId00, taskId00ChangelogPartitions)
.inState(State.RUNNING).build();
final StreamTask restoringStatefulTask = statefulTask(taskId01, taskId01ChangelogPartitions)
.inState(State.RESTORING).build();
final StandbyTask restoringStandbyTask = standbyTask(taskId02, taskId02ChangelogPartitions)
.inState(State.RUNNING).build();
final long changelogOffsetOfRunningTask = 42L;
final long changelogOffsetOfRestoringStatefulTask = 24L;
final long changelogOffsetOfRestoringStandbyTask = 84L;
when(runningStatefulTask.changelogOffsets())
.thenReturn(mkMap(mkEntry(t1p0changelog, changelogOffsetOfRunningTask)));
when(restoringStatefulTask.changelogOffsets())
.thenReturn(mkMap(mkEntry(t1p1changelog, changelogOffsetOfRestoringStatefulTask)));
when(restoringStandbyTask.changelogOffsets())
.thenReturn(mkMap(mkEntry(t1p2changelog, changelogOffsetOfRestoringStandbyTask)));
final TasksRegistry tasks = Mockito.mock(TasksRegistry.class);
final TaskManager taskManager = setUpTaskManager(ProcessingMode.AT_LEAST_ONCE, tasks, true);
when(tasks.allTasksPerId()).thenReturn(mkMap(mkEntry(taskId00, runningStatefulTask)));
when(stateUpdater.getTasks()).thenReturn(mkSet(restoringStandbyTask, restoringStatefulTask));

assertThat(
taskManager.getTaskOffsetSums(),
is(mkMap(
mkEntry(taskId00, changelogOffsetOfRunningTask),
mkEntry(taskId01, changelogOffsetOfRestoringStatefulTask),
mkEntry(taskId02, changelogOffsetOfRestoringStandbyTask)
))
);
}

@Test
public void shouldSkipUnknownOffsetsWhenComputingOffsetSum() throws Exception {
final Map<TopicPartition, Long> changelogOffsets = mkMap(
Expand Down