RetryStrategies do not yield #10539

Zelldon · 2022-09-28T13:40:00Z

Describe the bug

Previously we had implemented the ActorRetryStrategies with Actor#runUntilDone this was replaced with #9850 we realized that Actor#submit is not optimal and replaced it with Actor#run again #10289, but we missed to add an Actor#yield if we want to retry (and loop).
https://github.com/camunda/zeebe/blob/main/scheduler/src/main/java/io/camunda/zeebe/scheduler/retry/AbortableRetryStrategy.java#L47

This is necessary since otherwise we would loop endless with the same Actor, without getting any other Actor to give a chance to run. This is especially an issue if we have only one thread.

Important is this if we write to dispatcher and the LogStorageAppender has to clean the dispatcher.

Was found in investigating #10526

To Reproduce
Use the RetryStrategy you will see that it doesn't yield.

Expected behavior
Give others actors an chance to run.

Log/Stacktrace

Full Stacktrace

<STACKTRACE>

Environment:

OS:
Zeebe Version: 8.x
Configuration:

The text was updated successfully, but these errors were encountered:

megglos · 2022-09-30T09:26:38Z

@npepinpe & @Zelldon will sync on it

Zelldon · 2022-09-30T12:08:55Z

@npepinpe lets do that next week I have no time for it right now

npepinpe · 2022-09-30T12:39:26Z

👍

I still don't see it 😅 So when we don't yield, the current job is resubmitted for this task. This then returns that the task should be resubmitted to the thread. But that should still only cause the task to be resubmitted to the work stealing group, not cause the thread to busy loop without ever trying to dequeue other tasks (I think?) 🤔

EDIT: don't get me wrong, I believe you, I just don't get why resubmitting the task to the thread (via an append) means other tasks are not dequeued...

npepinpe · 2022-09-30T12:50:11Z

nvm, found it. In ActorTask we loop until resubmit is false, or shouldYield is true. That's...eh. I'm wondering if we shouldn't just get rid of this behavior now, but as with all things in the Actor scheduler...

Zelldon · 2022-09-30T12:52:29Z

Ah I was just writing you a nice unit test :)

  @Test
  void shouldRunOtherActor() throws InterruptedException {
    // given
    final var busy = new Actor() {
      @Override
      protected void onActorStarted() {
        actor.run(this::myWork);
      }

      private void myWork() {
        System.out.println("I'm busy I have work to do");
        actor.run(this::myWork);
      }


    };
    final CountDownLatch latch = new CountDownLatch(1);
    final var yolo = new Actor() {
      @Override
      protected void onActorStarted() {
        actor.run(this::myWork);
      }

      private void myWork() {
        System.out.println("Yolo I would also do someee stuff");
        latch.countDown();
        actor.run(this::myWork);
      }
    };
    final var scheduler = ActorScheduler.newActorScheduler().setCpuBoundActorThreadCount(1).build();
    scheduler.start();

    // when
    scheduler.submitActor(busy);
    scheduler.submitActor(yolo);

    // then
    assertThat(latch.await(10, TimeUnit.SECONDS)).isTrue();
  }

npepinpe · 2022-09-30T12:55:43Z

imo as we don't have runUntilDone anymore, we don't need this logic of looping anymore, but let's first backport it with yields to be safe, then we can remove the resubmit in snapshot and see if it's safe over time.

10844: Yield actor thread between retries r=npepinpe a=npepinpe ## Description Ensures retry strategies are yielding their thread between retries. Note that this doesn't really apply to the `BackOffRetryStrategy`, as it's already yielding via its usage of timers. This PR also updates the `RetryStrategyTest` to junit 5, and adds test coverage by applying the existing tests to the other strategies where applicable (previously, only the `RecoverableRetryStrategy` and `AbortableRetryStrategy` were tested there). ## Related issues  closes #10539 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>

10866: [Backport stable/8.1] Yield actor thread between retries r=npepinpe a=backport-action # Description Backport of #10844 to `stable/8.1`. relates to #10539 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>

10869: [Backport stable/8.0] Yield actor thread between retries r=deepthidevaki a=npepinpe # Description Backport of #10844 to `stable/8.1`. relates to #10539 # Conflicts I had to backport the `ControlledActorSchedulerExtension`, mostly, but I think that'll be useful in the future for backporting other things (maybe). There was nothing special for the actual test/PR itself. Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>

Zelldon added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround labels Sep 28, 2022

Zelldon mentioned this issue Sep 28, 2022

Writing TaskResult in ProcessScheduleService can block complete processing #10526

Closed

Zelldon assigned npepinpe and Zelldon Sep 30, 2022

Zelldon removed their assignment Oct 14, 2022

npepinpe mentioned this issue Oct 28, 2022

Yield actor thread between retries #10844

Merged

10 tasks

zeebe-bors-camunda bot closed this as completed in b8e2293 Oct 31, 2022

backport-action mentioned this issue Oct 31, 2022

[Backport stable/8.1] Yield actor thread between retries #10866

Merged

npepinpe mentioned this issue Oct 31, 2022

[Backport stable/8.0] Yield actor thread between retries #10869

Merged

korthout added release/8.0.8 version:8.1.3 Marks an issue as being completely or in parts released in 8.1.3 labels Nov 1, 2022

remcowesterhoud added the version:8.2.0-alpha2 Marks an issue as being completely or in parts released in 8.2.0-alpha2 label Dec 6, 2022

npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RetryStrategies do not yield #10539

RetryStrategies do not yield #10539

Zelldon commented Sep 28, 2022

megglos commented Sep 30, 2022

Zelldon commented Sep 30, 2022

npepinpe commented Sep 30, 2022 •

edited

npepinpe commented Sep 30, 2022

Zelldon commented Sep 30, 2022 •

edited

npepinpe commented Sep 30, 2022

RetryStrategies do not yield #10539

RetryStrategies do not yield #10539

Comments

Zelldon commented Sep 28, 2022

megglos commented Sep 30, 2022

Zelldon commented Sep 30, 2022

npepinpe commented Sep 30, 2022 • edited

npepinpe commented Sep 30, 2022

Zelldon commented Sep 30, 2022 • edited

npepinpe commented Sep 30, 2022

npepinpe commented Sep 30, 2022 •

edited

Zelldon commented Sep 30, 2022 •

edited