New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RetryStrategies do not yield #10539
Comments
@npepinpe lets do that next week I have no time for it right now |
👍 I still don't see it 😅 So when we don't yield, the current job is resubmitted for this task. This then returns that the task should be resubmitted to the thread. But that should still only cause the task to be resubmitted to the work stealing group, not cause the thread to busy loop without ever trying to dequeue other tasks (I think?) 🤔 EDIT: don't get me wrong, I believe you, I just don't get why resubmitting the task to the thread (via an append) means other tasks are not dequeued... |
nvm, found it. In |
Ah I was just writing you a nice unit test :) @Test
void shouldRunOtherActor() throws InterruptedException {
// given
final var busy = new Actor() {
@Override
protected void onActorStarted() {
actor.run(this::myWork);
}
private void myWork() {
System.out.println("I'm busy I have work to do");
actor.run(this::myWork);
}
};
final CountDownLatch latch = new CountDownLatch(1);
final var yolo = new Actor() {
@Override
protected void onActorStarted() {
actor.run(this::myWork);
}
private void myWork() {
System.out.println("Yolo I would also do someee stuff");
latch.countDown();
actor.run(this::myWork);
}
};
final var scheduler = ActorScheduler.newActorScheduler().setCpuBoundActorThreadCount(1).build();
scheduler.start();
// when
scheduler.submitActor(busy);
scheduler.submitActor(yolo);
// then
assertThat(latch.await(10, TimeUnit.SECONDS)).isTrue();
} |
imo as we don't have runUntilDone anymore, we don't need this logic of looping anymore, but let's first backport it with yields to be safe, then we can remove the resubmit in snapshot and see if it's safe over time. |
10844: Yield actor thread between retries r=npepinpe a=npepinpe ## Description Ensures retry strategies are yielding their thread between retries. Note that this doesn't really apply to the `BackOffRetryStrategy`, as it's already yielding via its usage of timers. This PR also updates the `RetryStrategyTest` to junit 5, and adds test coverage by applying the existing tests to the other strategies where applicable (previously, only the `RecoverableRetryStrategy` and `AbortableRetryStrategy` were tested there). ## Related issues <!-- Which issues are closed by this PR or are related --> closes #10539 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10844: Yield actor thread between retries r=npepinpe a=npepinpe ## Description Ensures retry strategies are yielding their thread between retries. Note that this doesn't really apply to the `BackOffRetryStrategy`, as it's already yielding via its usage of timers. This PR also updates the `RetryStrategyTest` to junit 5, and adds test coverage by applying the existing tests to the other strategies where applicable (previously, only the `RecoverableRetryStrategy` and `AbortableRetryStrategy` were tested there). ## Related issues <!-- Which issues are closed by this PR or are related --> closes #10539 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10869: [Backport stable/8.0] Yield actor thread between retries r=deepthidevaki a=npepinpe # Description Backport of #10844 to `stable/8.1`. relates to #10539 # Conflicts I had to backport the `ControlledActorSchedulerExtension`, mostly, but I think that'll be useful in the future for backporting other things (maybe). There was nothing special for the actual test/PR itself. Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Describe the bug
Previously we had implemented the ActorRetryStrategies with Actor#runUntilDone this was replaced with #9850 we realized that Actor#submit is not optimal and replaced it with Actor#run again #10289, but we missed to add an Actor#yield if we want to retry (and loop).
https://github.com/camunda/zeebe/blob/main/scheduler/src/main/java/io/camunda/zeebe/scheduler/retry/AbortableRetryStrategy.java#L47
This is necessary since otherwise we would loop endless with the same Actor, without getting any other Actor to give a chance to run. This is especially an issue if we have only one thread.
Important is this if we write to dispatcher and the LogStorageAppender has to clean the dispatcher.
Was found in investigating #10526
To Reproduce
Use the RetryStrategy you will see that it doesn't yield.
Expected behavior
Give others actors an chance to run.
Log/Stacktrace
Full Stacktrace
Environment:
The text was updated successfully, but these errors were encountered: