Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(log/appender): yield thread when experiencing backpressure #8582

Merged
1 commit merged into from
Jan 19, 2022

Conversation

romansmirnov
Copy link
Member

@romansmirnov romansmirnov commented Jan 13, 2022

Description

Yield the thread, when the log storage appender experiences backpressure when trying to append the fragments to the log storage. That way, the actual actor task (log storage appender) is resubmitted to the working queue, and the actor thread is released to execute other actor tasks.

Related issues

closes #8540

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement

@romansmirnov
Copy link
Member Author

Dear reviewer, thanks for reviewing my PR 😄 However, writing a test case for this bug fix isn't that easy. It requires controlling different components (like the append limiter, actor thread, ...). In the end, I decided to submit this PR without a test case (and actually I didn't succeed in writing a working test case), because it does not change any existing behavior apart from handling the backpressure gracefully in the log storage appender. Do you see any working approach to test it? I am happy to get your feedback.

@romansmirnov romansmirnov marked this pull request as ready for review January 18, 2022 08:52
@Zelldon
Copy link
Member

Zelldon commented Jan 19, 2022

Dear reviewer, thanks for reviewing my PR smile However, writing a test case for this bug fix isn't that easy. It requires controlling different components (like the append limiter, actor thread, ...). In the end, I decided to submit this PR without a test case (and actually I didn't succeed in writing a working test case), because it does not change any existing behavior apart from handling the backpressure gracefully in the log storage appender. Do you see any working approach to test it? I am happy to get your feedback.

Hey Roman I would like to at least see a benchmark for it. Furthermore how we found this issues? Is there anything how we could reproduce the problem?

@npepinpe
Copy link
Member

Just my two cents, but if we can't write a test, it may be an indication that the class/unit/component needs to be refactored 😄

@romansmirnov
Copy link
Member Author

As discussed with @Zelldon:

  • I already run a benchmark containing that bug fix. The bug fix doesn't have any negative (nor positive) impact in terms of performance.
  • The bug has been found when working on a customer's support case, i.e., when analyzing the provided heap dump and thread dump. Details can be found here: https://jira.camunda.com/browse/SUPPORT-11966 and LogStorageAppender Actor occupies an Actor Thread forever due to a full backpressue queue #8540 - if necessary, I can do a walkthrough of the heap dump.
  • Regarding the test case, we discussed a potential approach to how to test the scenario. However, I will give it a try next week. In the meantime, I would like to get this merged so that the customer can run their tests with a snapshot. (Unless the test case is a blocker for this bug fix.)

Copy link
Member

@Zelldon Zelldon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romansmirnov :) 🎖️

Had some smaller comments, please take a look before merging.

I think @npepinpe is right that it might make sense to refactor all of this a bit. But maybe we can write a test as discussed @romansmirnov lets see, ping me if you want to discuss this further. :)


// Yield the thread, when no bytes are read or
// when the block could not be appended.
if (!shouldAppendBlock || !appendBlockSucceeded) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Is it necessary to yield if we not append? I think the subscription (actor condition) is only called if it was notified right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if there is nothing to append, it is still necessary to yield, because of

https://github.com/camunda-cloud/zeebe/blob/3da1a75d52e524b123f91a8ec9e480a0337bbf42/util/src/main/java/io/camunda/zeebe/util/sched/channel/ChannelConsumerCondition.java#L43-L48

In more detail, because of that condition polledCount > processedTiggersCount (BTW there is a typo in the source code 😄 ). There might be a case, where the polledCount is much greater than the processedTiggersCount but currently there is nothing to peek from the write buffer. Meaning, the condition in #poll() still returns true. If the yield would not happen, then the task would continue executing the corresponding actor job until the condition is false (i.e., the processedTiggersCount is greater or equal to polledCount).

@romansmirnov
Copy link
Member Author

bors merge

@romansmirnov
Copy link
Member Author

bors r-

@ghost
Copy link

ghost commented Jan 19, 2022

Canceled.

@romansmirnov romansmirnov force-pushed the 8540-resubmit-log-storage-appender-task branch from 3881596 to 987680a Compare January 19, 2022 12:45
@romansmirnov
Copy link
Member Author

bors r+

ghost pushed a commit that referenced this pull request Jan 19, 2022
8582: fix(log/appender): yield thread when experiencing backpressure r=romansmirnov a=romansmirnov

## Description

Yield the thread, when the log storage appender experiences backpressure when trying to append the fragments to the log storage. That way, the actual actor task (log storage appender) is resubmitted to the working queue, and the actor thread is released to execute other actor tasks.

## Related issues

closes #8540 



8605: fix(log/stream): ensure the appender future always gets completed r=romansmirnov a=romansmirnov

## Description

* Handles any kind of thrown `Throwable`s in the `LogStream` actor, so that the appender future gets completed exceptionally.
* Handles the situation when opening the appender, the `LogStream` actor is supposed to be closed. In this situation, the appender future gets completed exceptionally as well.

## Related issues

closes #7992



8615: deps(maven): bump value from 2.8.9-ea-1 to 2.9.0 r=npepinpe a=dependabot[bot]

Bumps [value](https://github.com/immutables/immutables) from 2.8.9-ea-1 to 2.9.0.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/immutables/immutables/commits">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.immutables:value&package-manager=maven&previous-version=2.8.9-ea-1&new-version=2.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: Roman <roman.smirnov@camunda.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@ghost
Copy link

ghost commented Jan 19, 2022

Build failed (retrying...):

@romansmirnov
Copy link
Member Author

bors retry

@ghost
Copy link

ghost commented Jan 19, 2022

Already running a review

@ghost ghost merged commit 6ebfa79 into develop Jan 19, 2022
@ghost ghost deleted the 8540-resubmit-log-storage-appender-task branch January 19, 2022 14:17
@github-actions
Copy link
Contributor

Successfully created backport PR #8617 for stable/1.2.

@github-actions
Copy link
Contributor

Successfully created backport PR #8618 for stable/1.3.

ghost pushed a commit that referenced this pull request Jan 20, 2022
8617: [Backport stable/1.2] fix(log/appender): yield thread when experiencing backpressure r=romansmirnov a=github-actions[bot]

# Description
Backport of #8582 to `stable/1.2`.

relates to #8540

Co-authored-by: Roman <roman.smirnov@camunda.com>
ghost pushed a commit that referenced this pull request Jan 20, 2022
8618: [Backport stable/1.3] fix(log/appender): yield thread when experiencing backpressure r=romansmirnov a=github-actions[bot]

# Description
Backport of #8582 to `stable/1.3`.

relates to #8540

Co-authored-by: Roman <roman.smirnov@camunda.com>
@npepinpe npepinpe added the version:1.3.2 Marks an issue as being completely or in parts released in 1.3.2 label Jan 28, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:1.3.2 Marks an issue as being completely or in parts released in 1.3.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogStorageAppender Actor occupies an Actor Thread forever due to a full backpressue queue
3 participants