Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetryStrategies do not yield #10539

Closed
Zelldon opened this issue Sep 28, 2022 · 6 comments · Fixed by #10844
Closed

RetryStrategies do not yield #10539

Zelldon opened this issue Sep 28, 2022 · 6 comments · Fixed by #10844
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:8.1.3 Marks an issue as being completely or in parts released in 8.1.3 version:8.2.0-alpha2 Marks an issue as being completely or in parts released in 8.2.0-alpha2 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0

Comments

@Zelldon
Copy link
Member

Zelldon commented Sep 28, 2022

Describe the bug

Previously we had implemented the ActorRetryStrategies with Actor#runUntilDone this was replaced with #9850 we realized that Actor#submit is not optimal and replaced it with Actor#run again #10289, but we missed to add an Actor#yield if we want to retry (and loop).
https://github.com/camunda/zeebe/blob/main/scheduler/src/main/java/io/camunda/zeebe/scheduler/retry/AbortableRetryStrategy.java#L47

This is necessary since otherwise we would loop endless with the same Actor, without getting any other Actor to give a chance to run. This is especially an issue if we have only one thread.

Important is this if we write to dispatcher and the LogStorageAppender has to clean the dispatcher.

Was found in investigating #10526

To Reproduce
Use the RetryStrategy you will see that it doesn't yield.

Expected behavior
Give others actors an chance to run.

Log/Stacktrace

Full Stacktrace

<STACKTRACE>

Environment:

  • OS:
  • Zeebe Version: 8.x
  • Configuration:
@Zelldon Zelldon added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround labels Sep 28, 2022
@megglos
Copy link
Contributor

megglos commented Sep 30, 2022

@npepinpe & @Zelldon will sync on it

@Zelldon
Copy link
Member Author

Zelldon commented Sep 30, 2022

@npepinpe lets do that next week I have no time for it right now

@npepinpe
Copy link
Member

npepinpe commented Sep 30, 2022

👍

I still don't see it 😅 So when we don't yield, the current job is resubmitted for this task. This then returns that the task should be resubmitted to the thread. But that should still only cause the task to be resubmitted to the work stealing group, not cause the thread to busy loop without ever trying to dequeue other tasks (I think?) 🤔

EDIT: don't get me wrong, I believe you, I just don't get why resubmitting the task to the thread (via an append) means other tasks are not dequeued...

@npepinpe
Copy link
Member

nvm, found it. In ActorTask we loop until resubmit is false, or shouldYield is true. That's...eh. I'm wondering if we shouldn't just get rid of this behavior now, but as with all things in the Actor scheduler...

@Zelldon
Copy link
Member Author

Zelldon commented Sep 30, 2022

Ah I was just writing you a nice unit test :)

  @Test
  void shouldRunOtherActor() throws InterruptedException {
    // given
    final var busy = new Actor() {
      @Override
      protected void onActorStarted() {
        actor.run(this::myWork);
      }

      private void myWork() {
        System.out.println("I'm busy I have work to do");
        actor.run(this::myWork);
      }


    };
    final CountDownLatch latch = new CountDownLatch(1);
    final var yolo = new Actor() {
      @Override
      protected void onActorStarted() {
        actor.run(this::myWork);
      }

      private void myWork() {
        System.out.println("Yolo I would also do someee stuff");
        latch.countDown();
        actor.run(this::myWork);
      }
    };
    final var scheduler = ActorScheduler.newActorScheduler().setCpuBoundActorThreadCount(1).build();
    scheduler.start();

    // when
    scheduler.submitActor(busy);
    scheduler.submitActor(yolo);

    // then
    assertThat(latch.await(10, TimeUnit.SECONDS)).isTrue();
  }

@npepinpe
Copy link
Member

imo as we don't have runUntilDone anymore, we don't need this logic of looping anymore, but let's first backport it with yields to be safe, then we can remove the resubmit in snapshot and see if it's safe over time.

@Zelldon Zelldon removed their assignment Oct 14, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Oct 31, 2022
10844: Yield actor thread between retries r=npepinpe a=npepinpe

## Description

Ensures retry strategies are yielding their thread between retries. Note that this doesn't really apply to the `BackOffRetryStrategy`, as it's already yielding via its usage of timers.

This PR also updates the `RetryStrategyTest` to junit 5, and adds test coverage by applying the existing tests to the other strategies where applicable (previously, only the `RecoverableRetryStrategy` and `AbortableRetryStrategy` were tested there).

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #10539 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Oct 31, 2022
10844: Yield actor thread between retries r=npepinpe a=npepinpe

## Description

Ensures retry strategies are yielding their thread between retries. Note that this doesn't really apply to the `BackOffRetryStrategy`, as it's already yielding via its usage of timers.

This PR also updates the `RetryStrategyTest` to junit 5, and adds test coverage by applying the existing tests to the other strategies where applicable (previously, only the `RecoverableRetryStrategy` and `AbortableRetryStrategy` were tested there).

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #10539 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Oct 31, 2022
10866: [Backport stable/8.1] Yield actor thread between retries r=npepinpe a=backport-action

# Description
Backport of #10844 to `stable/8.1`.

relates to #10539

Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Nov 1, 2022
10869: [Backport stable/8.0] Yield actor thread between retries r=deepthidevaki a=npepinpe

# Description
Backport of #10844 to `stable/8.1`.

relates to #10539

# Conflicts

I had to backport the `ControlledActorSchedulerExtension`, mostly, but I think that'll be useful in the future for backporting other things (maybe). There was nothing special for the actual test/PR itself.

Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@korthout korthout added release/8.0.8 version:8.1.3 Marks an issue as being completely or in parts released in 8.1.3 labels Nov 1, 2022
@remcowesterhoud remcowesterhoud added the version:8.2.0-alpha2 Marks an issue as being completely or in parts released in 8.2.0-alpha2 label Dec 6, 2022
@npepinpe npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround version:8.1.3 Marks an issue as being completely or in parts released in 8.1.3 version:8.2.0-alpha2 Marks an issue as being completely or in parts released in 8.2.0-alpha2 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants