Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-10199: Change to RUNNING if no pending task to init exist #14214

Conversation

cadonna
Copy link
Contributor

@cadonna cadonna commented Aug 15, 2023

A stream thread should only change to RUNNING if there are no active tasks in restoration in the state updater and if there are no pending tasks to recycle and to init.

Usually all pending tasks to init are added to the state updater in the same poll iteration that handles the assignment. However, if during an initialization of a task a LockException the task is re-added to the tasks to init and initialization is retried in the next poll iteration.

A LockException might occur when a state directory is still locked by another thread, when the rebalance just happened.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@@ -53,9 +53,11 @@ public interface TasksRegistry {

void addPendingTaskToCloseClean(final TaskId taskId);

Set<Task> drainPendingTaskToInit();
Set<Task> drainPendingTasksToInit();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just renaming.


void addPendingTaskToInit(final Collection<Task> tasks);
void addPendingTasksToInit(final Collection<Task> tasks);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just renaming.

void addPendingTaskToInit(final Collection<Task> tasks);
void addPendingTasksToInit(final Collection<Task> tasks);

boolean hasPendingTasksToInit();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual addition

Comment on lines +767 to +769
return !stateUpdater.restoresActiveTasks()
&& !tasks.hasPendingTasksToRecycle()
&& !tasks.hasPendingTasksToInit();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual fix.

Comment on lines 849 to +852
// The state directory may still be locked by another thread, when the rebalance just happened.
// Retry in the next iteration.
log.info("Encountered lock exception. Reattempting locking the state in the next iteration.", lockException);
tasks.addPendingTaskToInit(Collections.singleton(task));
tasks.addPendingTasksToInit(Collections.singleton(task));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that required the fix.

Copy link
Member

@lucasbru lucasbru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, lgtm!

What I didn't see, what was the symptom of the bug? I thought the state was largely to inform the user

@cadonna
Copy link
Contributor Author

cadonna commented Aug 15, 2023

I am not sure that this is the cause of the issue that I see. But in any case, it is not correct to change the state to RUNNING if there are still active tasks that are not ready for processing or standby tasks that are not yet in the state updater.

Copy link
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A stream thread should only change to RUNNING if there are no
active tasks in restoration in the state updater and if there
are no pending tasks to recycle and to init.

Usually all pending tasks to init are added to the state updater
in the same poll iteration that handles the assignment. However,
if during an initialization of a task a LockException the task
is re-added to the tasks to init and initialization is retried
in the next poll iteration.

A LockException might occur when a state directory is still locked
by another thread, when the rebalance just happened.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants