-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure retries are not interleaved even on multiple sequential calls #10289
Conversation
One additional thing, we could potentially refactor the retry mechanism such that each retry operation is self contained and isolated from others, but I wasn't sure what that would mean in terms of semantic. To some extent, our retry operations should be sequenced between themselves, right? Perhaps we do need
Right now, the fix relies on the fact that when we're executing a task, calling I'm wondering if, in general, instead of using the retry strategies, we might not simply build retrying into the state machines that need them. Would that make things easier to understand? 🤔 |
engine/src/test/java/io/camunda/zeebe/streamprocessor/ProcessingScheduleServiceImplTest.java
Fixed
Show resolved
Hide resolved
2aa835c
to
7a1958e
Compare
@Test | ||
public void shouldNotInterleaveRetry() { | ||
// given | ||
final AtomicReference<ActorFuture<Boolean>> firstFuture = new AtomicReference<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you revert the fix and use submit
again, you will see this test fails (the first future is incomplete), and you will get the logs that the second future was completed twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @npepinpe 🎉
The test looks good to me. It can reproduce the root cause of the bug.
🔧 As an integration test, may be you can also add a similar test in ProcessSchedulingServiceTest
@@ -116,6 +118,25 @@ public void shouldAbortOnOtherException() { | |||
assertThat(resultFuture.getException()).isExactlyInstanceOf(RuntimeException.class); | |||
} | |||
|
|||
@Test | |||
public void shouldNotInterleaveRetry() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 Do we want to prevent interleaving the retries, and ensure that the ordering is preserved? Or just that two concurrently submitted tasks can be completed successfully even if they have to be retried? The test verifies the second behavior, which is acceptable. But just rename the method to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I think we also guarantee the former, and it would be interesting to verify this, as I'm unsure if it's safe to ignore the ordering based on how consumers of the API are using it.
Ah, and regarding your comment about reusing the actor retry mechanism on the issue - I wasn't sure if this would have adverse effect. We can isolate the retry attempts by making sure they contain all their state, but then they might still be interleaved, and I'm not sure this is expected by users of this interface. So I went with the safe option right now. We could of course still isolate them AND also using |
I'm sort of struggling with the integration tests, because our timers are non-deterministic. If I submit two timers one after the other, I'm not guaranteed that they will be executed in the order in which I submitted them, unless there is enough time in between them. I also cannot use a latch to coordinate both timer tasks, since with the fix, the second task shouldn't run anyway before the first one does... I thought about scheduling the second task only once the first one runs (but before it writes), which works anyway with the fix. Without the fix, I cannot guarantee that it fails, because I would have to guarantee that the first task at least starts retrying before the second task is scheduled...A latch there would help (but I can't use it with the fix, as explained above). |
OK, I wrote an integration test, and good thing! I found a second bug: we don't reset the batch writer in the retry, meaning on each retry we re-append the record :x |
I also hit a weird bug (seldom), where one of the scheduled tasks just isn't called. Fun times. |
By using `ActorControl#submit` in some of the retry strategies, we can create race conditions if the retry strategy is reused. Since the initial call uses run to prepend a retry attempt, and further retries use submit, it's possible for one run to retry (thus submitting the retry job to the end of the queue) and the next call to `runWithRetry` cause its state to be overwritten, causing issues when it comes to completing the future (as well as potential shared state by the operations).
dce918f
to
5dff90e
Compare
I'm omitting anything about the issue with the timer tasks, I will open another issue for this. |
This might explain why, in the exception reported in the issue, it was trying to claim 20MB. |
If the jobs have to be ordered, then there is not much value in isolating the retry attempts. So I'm ok to leave it as it is. |
The test fail due to the timer not triggering 😄 I'm reluctant to merge it as it'll be flaky, but I'm now wondering if it's related to the other flaky test with timers 🤔 |
I'll omit the flaky test so we can merge, and I'll add it to the follow up issue #10306 as an action item so it can be added back (and used for reproduction) when that's root caused. |
bors merge |
10289: Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description By using `ActorControl#submit` in some of the retry strategies, we can create race conditions if the retry strategy is reused. Since the initial call uses run to prepend a retry attempt, and further retries use submit, it's possible for one run to retry (thus submitting the retry job to the end of the queue) and the next call to `runWithRetry` cause its state to be overwritten, causing issues when it comes to completing the future (as well as potential shared state by the operations). Additionally, this PR fixes an issue where on retry, we were not resetting the writer, causing the same command to be written multiple times. There is a regression test added which isn't perfect, and I'd like some suggestions on how to improve it. The integration test added to the `ProcessingScheduleServiceTest` is not amazing and likely to flaky, as it's hard to write controlled tests with our timers. Suggestions are welcomed 👍 ## Related issues closes #10240 10302: deps(go): bump github.com/google/go-cmp from 0.5.8 to 0.5.9 in /clients/go r=npepinpe a=dependabot[bot] Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.8 to 0.5.9. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/google/go-cmp/releases">github.com/google/go-cmp's releases</a>.</em></p> <blockquote> <h2>v0.5.9</h2> <p>Reporter changes:</p> <ul> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/299">#299</a>) Adjust heuristic for line-based versus byte-based diffing</li> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/306">#306</a>) Use <code>value.TypeString</code> in <code>PathStep.String</code></li> </ul> <p>Code cleanup changes:</p> <ul> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/297">#297</a>) Use <code>reflect.Value.IsZero</code></li> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/304">#304</a>) Format with Go 1.19 formatter</li> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/300">#300</a> )Fix typo in Result documentation</li> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/302">#302</a>) Pre-declare global type variables</li> <li>(<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/309">#309</a>) Run tests on Go 1.19</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/google/go-cmp/commit/a97318bf6562f2ed2632c5f985db51b1bc5bdcd0"><code>a97318b</code></a> Adjust heuristic for line-based versus byte-based diffing (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/299">#299</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/377d28384c85781079e04aab3937170479da8cd6"><code>377d283</code></a> Run tests on Go 1.19 (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/309">#309</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/6606d4d51e3239f038565f525940ac6043aff53e"><code>6606d4d</code></a> Use value.TypeString in PathStep.String (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/306">#306</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/f36a68d19a9bca43e070954ab9170a8305662d15"><code>f36a68d</code></a> Pre-declare global type variables (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/302">#302</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/5dac6aa44b75666a956f67df1b5bd4e2e044e1f8"><code>5dac6aa</code></a> Fix typo in Result documentation (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/300">#300</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/14ad8a02f30ba66e7e19f9814e69daab44219cb8"><code>14ad8a0</code></a> Format with Go 1.19 formatter (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/304">#304</a>)</li> <li><a href="https://github.com/google/go-cmp/commit/a53d7e09b000ee6e0ca9f2676820299b5de8e89f"><code>a53d7e0</code></a> Use reflect.Value.IsZero (<a href="https://github-redirect.dependabot.com/google/go-cmp/issues/297">#297</a>)</li> <li>See full diff in <a href="https://github.com/google/go-cmp/compare/v0.5.8...v0.5.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/google/go-cmp&package-manager=go_modules&previous-version=0.5.8&new-version=0.5.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Build failed (retrying...): |
10289: Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description By using `ActorControl#submit` in some of the retry strategies, we can create race conditions if the retry strategy is reused. Since the initial call uses run to prepend a retry attempt, and further retries use submit, it's possible for one run to retry (thus submitting the retry job to the end of the queue) and the next call to `runWithRetry` cause its state to be overwritten, causing issues when it comes to completing the future (as well as potential shared state by the operations). Additionally, this PR fixes an issue where on retry, we were not resetting the writer, causing the same command to be written multiple times. There is a regression test added which isn't perfect, and I'd like some suggestions on how to improve it. The integration test added to the `ProcessingScheduleServiceTest` is not amazing and likely to flaky, as it's hard to write controlled tests with our timers. Suggestions are welcomed 👍 ## Related issues closes #10240 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Build failed: |
bors r+ |
10289: Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description By using `ActorControl#submit` in some of the retry strategies, we can create race conditions if the retry strategy is reused. Since the initial call uses run to prepend a retry attempt, and further retries use submit, it's possible for one run to retry (thus submitting the retry job to the end of the queue) and the next call to `runWithRetry` cause its state to be overwritten, causing issues when it comes to completing the future (as well as potential shared state by the operations). Additionally, this PR fixes an issue where on retry, we were not resetting the writer, causing the same command to be written multiple times. There is a regression test added which isn't perfect, and I'd like some suggestions on how to improve it. The integration test added to the `ProcessingScheduleServiceTest` is not amazing and likely to flaky, as it's hard to write controlled tests with our timers. Suggestions are welcomed 👍 ## Related issues closes #10240 10324: deps(maven): bump software.amazon.awssdk:bom from 2.17.269 to 2.17.271 r=npepinpe a=dependabot[bot] Bumps [software.amazon.awssdk:bom](https://github.com/aws/aws-sdk-java-v2) from 2.17.269 to 2.17.271. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/aws/aws-sdk-java-v2/blob/master/CHANGELOG.md">software.amazon.awssdk:bom's changelog</a>.</em></p> <blockquote> <h1><strong>2.17.271</strong> <strong>2022-09-09</strong></h1> <h2><strong>AWS CloudTrail</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release adds CloudTrail getChannel and listChannels APIs to allow customer to view the ServiceLinkedChannel configurations.</li> </ul> </li> </ul> <h2><strong>AWS Performance Insights</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Increases the maximum values of two RDS Performance Insights APIs. The maximum value of the Limit parameter of DimensionGroup is 25. The MaxResult maximum is now 25 for the following APIs: DescribeDimensionKeys, GetResourceMetrics, ListAvailableResourceDimensions, and ListAvailableResourceMetrics.</li> </ul> </li> </ul> <h2><strong>AWS SDK for Java v2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Updated service endpoint metadata.</li> </ul> </li> </ul> <h2><strong>Amazon Lex Model Building V2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release is for supporting Composite Slot Type feature in AWS Lex V2. Composite Slot Type will help developer to logically group coherent slots and maintain their inter-relationships in runtime conversation.</li> </ul> </li> </ul> <h2><strong>Amazon Lex Runtime V2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release is for supporting Composite Slot Type feature in AWS Lex V2. Composite Slot Type will help developer to logically group coherent slots and maintain their inter-relationships in runtime conversation.</li> </ul> </li> </ul> <h2><strong>Amazon Redshift</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release updates documentation for AQUA features and other description updates.</li> </ul> </li> </ul> <h1><strong>2.17.270</strong> <strong>2022-09-08</strong></h1> <h2><strong>AWS Common Runtime HTTP Client</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Adds support for Https proxy system properties: host, port, user, password</li> </ul> </li> </ul> <h2><strong>AWS Elemental MediaLive</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This change exposes API settings which allow Dolby Atmos and Dolby Vision to be used when running a channel using Elemental Media Live</li> </ul> </li> </ul> <h2><strong>AWS SDK for Java v2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Updated service endpoint metadata.</li> </ul> </li> </ul> <h2><strong>Amazon EMR Containers</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>EMR on EKS now allows running Spark SQL using the newly introduced Spark SQL Job Driver in the Start Job Run API</li> </ul> </li> </ul> <h2><strong>Amazon Elastic Compute Cloud</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release adds support to send VPC Flow Logs to kinesis-data-firehose as new destination type</li> </ul> </li> </ul> <h2><strong>Amazon Lookout for Metrics</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Release dimension value filtering feature to allow customers to define dimension filters for including only a subset of their dataset to be used by LookoutMetrics.</li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/0c8422ebc6449e1e691656d7291da77d6011649d"><code>0c8422e</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/aws/aws-sdk-java-v2/issues/2142">#2142</a> from aws/staging/bea6ab8b-b330-4cc1-8ca8-94bfb1689861</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/0f4f98929600e72eb967627794e35e96771e2afe"><code>0f4f989</code></a> Release 2.17.271. Updated CHANGELOG.md, README.md and all pom.xml.</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/8d63789be9dfd364fefd00b5854f06c485e2b180"><code>8d63789</code></a> Updated endpoints.json.</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/6351b993b38a7d07ce10173aa5aaac81ca2ea975"><code>6351b99</code></a> Amazon Lex Model Building V2 Update: This release is for supporting Composite...</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/583f5018daa6c105350297b3c071bcbe76e2e940"><code>583f501</code></a> Amazon Redshift Update: This release updates documentation for AQUA features ...</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/3404db0de6a9f1dba7f021f6d23850f17d61b284"><code>3404db0</code></a> AWS Performance Insights Update: Increases the maximum values of two RDS Perf...</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/7ae99600853441be20fd3a1ebce4c62f197c66aa"><code>7ae9960</code></a> AWS CloudTrail Update: This release adds CloudTrail getChannel and listChanne...</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/b0edee8185b2a07916326fc6c32eef02ad1698c5"><code>b0edee8</code></a> Amazon Lex Runtime V2 Update: This release is for supporting Composite Slot T...</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/e377518b08ccc2cd9cc32be74345064b2fadee64"><code>e377518</code></a> Update LaunchChangelog.md (<a href="https://github-redirect.dependabot.com/aws/aws-sdk-java-v2/issues/3417">#3417</a>)</li> <li><a href="https://github.com/aws/aws-sdk-java-v2/commit/3e1f08ad9562e3203f904fe6f51f7fb1d2878953"><code>3e1f08a</code></a> Update to next snapshot version: 2.17.271-SNAPSHOT</li> <li>Additional commits viewable in <a href="https://github.com/aws/aws-sdk-java-v2/compare/2.17.269...2.17.271">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=software.amazon.awssdk:bom&package-manager=maven&previous-version=2.17.269&new-version=2.17.271)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> 10325: deps(maven): bump version.micrometer from 1.9.3 to 1.9.4 r=npepinpe a=dependabot[bot] Bumps `version.micrometer` from 1.9.3 to 1.9.4. Updates `micrometer-core` from 1.9.3 to 1.9.4 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/micrometer-metrics/micrometer/releases">micrometer-core's releases</a>.</em></p> <blockquote> <h2>1.9.4</h2> <h2>:star: New Features</h2> <ul> <li>HTTP server instrumentation TCK <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3379">#3379</a></li> </ul> <h2>:lady_beetle: Bug Fixes</h2> <ul> <li>system.cpu.usage missing with OpenJ9 0.33.0 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3349">#3349</a></li> <li>Uri tag replaced with REDIRECTION on all HTTP redirect responses with Jersey server <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3327">#3327</a></li> </ul> <h2>:hammer: Dependency Upgrades</h2> <ul> <li>Upgrade to signalfx-java 1.0.23 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3407">#3407</a></li> <li>Upgrade to aws-java-sdk-cloudwatch 1.12.300 and software.amazon.awssdk:cloudwatch 2.17.271 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3406">#3406</a></li> <li>Upgrade to Reactor 2020.0.22 and netty 4.1.81 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3405">#3405</a></li> <li>Upgrade to Test Retry Gradle Plugin 1.4.1 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3380">#3380</a></li> <li>Bump com.gradle.enterprise from 3.10.3 to 3.11.1 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3353">#3353</a></li> </ul> <h2>:heart: Contributors</h2> <p>We'd like to thank all the contributors who worked on this release!</p> <ul> <li><a href="https://github.com/izeye"><code>`@izeye</code></a></li>` </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/aa5be1ef19281aa83df19d4242803e5e2206640c"><code>aa5be1e</code></a> Remove conditional check for disabling japicmp in otlp</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/134dca6c09d3fba691c68ccd0d3fb9a4ca6cea2a"><code>134dca6</code></a> Merge branch '1.8.x' into 1.9.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/bd470ce080153c474fde9fb5d559bcdc56589f48"><code>bd470ce</code></a> HTTP server instrumentation TCK (<a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3379">#3379</a>)</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/277c8dd1acac0a91010505ca194c7514bf304395"><code>277c8dd</code></a> Merge branch '1.8.x' into 1.9.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/f89e67c83db62f436d48e69db82ce61c0c527e9c"><code>f89e67c</code></a> Upgrade to signalfx-java 1.0.23</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/60412c4b838e96ae2c7b2e4dd35b323fcb9ce508"><code>60412c4</code></a> Upgrade to aws-java-sdk-cloudwatch 1.12.300 and software.amazon.awssdk:cloudw...</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/c18a194c4292d3e79206f89ac519dbcad5e33db8"><code>c18a194</code></a> Upgrade to Reactor 2020.0.22 and netty 4.1.81</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/20c423caf98fef3763af88df2811026a2d8dd92a"><code>20c423c</code></a> Enable Gradle's stable configuration cache feature flag (<a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3390">#3390</a>)</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/2a497ab93539101abd5499c841fed5b7891ac86b"><code>2a497ab</code></a> japicmp for 1.9.x branch</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/71bad060f23170259ece454304a5b9b8448372a9"><code>71bad06</code></a> Merge branch '1.8.x' into 1.9.x</li> <li>Additional commits viewable in <a href="https://github.com/micrometer-metrics/micrometer/compare/v1.9.3...v1.9.4">compare view</a></li> </ul> </details> <br /> Updates `micrometer-registry-prometheus` from 1.9.3 to 1.9.4 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/micrometer-metrics/micrometer/releases">micrometer-registry-prometheus's releases</a>.</em></p> <blockquote> <h2>1.9.4</h2> <h2>:star: New Features</h2> <ul> <li>HTTP server instrumentation TCK <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3379">#3379</a></li> </ul> <h2>:lady_beetle: Bug Fixes</h2> <ul> <li>system.cpu.usage missing with OpenJ9 0.33.0 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3349">#3349</a></li> <li>Uri tag replaced with REDIRECTION on all HTTP redirect responses with Jersey server <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3327">#3327</a></li> </ul> <h2>:hammer: Dependency Upgrades</h2> <ul> <li>Upgrade to signalfx-java 1.0.23 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3407">#3407</a></li> <li>Upgrade to aws-java-sdk-cloudwatch 1.12.300 and software.amazon.awssdk:cloudwatch 2.17.271 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3406">#3406</a></li> <li>Upgrade to Reactor 2020.0.22 and netty 4.1.81 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3405">#3405</a></li> <li>Upgrade to Test Retry Gradle Plugin 1.4.1 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3380">#3380</a></li> <li>Bump com.gradle.enterprise from 3.10.3 to 3.11.1 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3353">#3353</a></li> </ul> <h2>:heart: Contributors</h2> <p>We'd like to thank all the contributors who worked on this release!</p> <ul> <li><a href="https://github.com/izeye"><code>`@izeye</code></a></li>` </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/aa5be1ef19281aa83df19d4242803e5e2206640c"><code>aa5be1e</code></a> Remove conditional check for disabling japicmp in otlp</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/134dca6c09d3fba691c68ccd0d3fb9a4ca6cea2a"><code>134dca6</code></a> Merge branch '1.8.x' into 1.9.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/bd470ce080153c474fde9fb5d559bcdc56589f48"><code>bd470ce</code></a> HTTP server instrumentation TCK (<a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3379">#3379</a>)</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/277c8dd1acac0a91010505ca194c7514bf304395"><code>277c8dd</code></a> Merge branch '1.8.x' into 1.9.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/f89e67c83db62f436d48e69db82ce61c0c527e9c"><code>f89e67c</code></a> Upgrade to signalfx-java 1.0.23</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/60412c4b838e96ae2c7b2e4dd35b323fcb9ce508"><code>60412c4</code></a> Upgrade to aws-java-sdk-cloudwatch 1.12.300 and software.amazon.awssdk:cloudw...</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/c18a194c4292d3e79206f89ac519dbcad5e33db8"><code>c18a194</code></a> Upgrade to Reactor 2020.0.22 and netty 4.1.81</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/20c423caf98fef3763af88df2811026a2d8dd92a"><code>20c423c</code></a> Enable Gradle's stable configuration cache feature flag (<a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3390">#3390</a>)</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/2a497ab93539101abd5499c841fed5b7891ac86b"><code>2a497ab</code></a> japicmp for 1.9.x branch</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/71bad060f23170259ece454304a5b9b8448372a9"><code>71bad06</code></a> Merge branch '1.8.x' into 1.9.x</li> <li>Additional commits viewable in <a href="https://github.com/micrometer-metrics/micrometer/compare/v1.9.3...v1.9.4">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> 10326: deps(maven): bump aws-java-sdk-core from 1.12.298 to 1.12.300 r=npepinpe a=dependabot[bot] Bumps [aws-java-sdk-core](https://github.com/aws/aws-sdk-java) from 1.12.298 to 1.12.300. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md">aws-java-sdk-core's changelog</a>.</em></p> <blockquote> <h1><strong>1.12.300</strong> <strong>2022-09-09</strong></h1> <h2><strong>AWS CloudTrail</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release adds CloudTrail getChannel and listChannels APIs to allow customer to view the ServiceLinkedChannel configurations.</li> </ul> </li> </ul> <h2><strong>AWS Performance Insights</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Increases the maximum values of two RDS Performance Insights APIs. The maximum value of the Limit parameter of DimensionGroup is 25. The MaxResult maximum is now 25 for the following APIs: DescribeDimensionKeys, GetResourceMetrics, ListAvailableResourceDimensions, and ListAvailableResourceMetrics.</li> </ul> </li> </ul> <h2><strong>Amazon Lex Model Building V2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release is for supporting Composite Slot Type feature in AWS Lex V2. Composite Slot Type will help developer to logically group coherent slots and maintain their inter-relationships in runtime conversation.</li> </ul> </li> </ul> <h2><strong>Amazon Lex Runtime V2</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release is for supporting Composite Slot Type feature in AWS Lex V2. Composite Slot Type will help developer to logically group coherent slots and maintain their inter-relationships in runtime conversation.</li> </ul> </li> </ul> <h2><strong>Amazon Redshift</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release updates documentation for AQUA features and other description updates.</li> </ul> </li> </ul> <h1><strong>1.12.299</strong> <strong>2022-09-08</strong></h1> <h2><strong>AWS Elemental MediaLive</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This change exposes API settings which allow Dolby Atmos and Dolby Vision to be used when running a channel using Elemental Media Live</li> </ul> </li> </ul> <h2><strong>AWS SDK for Java</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Adding support for me-central-1 region</li> </ul> </li> </ul> <h2><strong>Amazon EMR Containers</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>EMR on EKS now allows running Spark SQL using the newly introduced Spark SQL Job Driver in the Start Job Run API</li> </ul> </li> </ul> <h2><strong>Amazon Elastic Compute Cloud</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release adds support to send VPC Flow Logs to kinesis-data-firehose as new destination type</li> </ul> </li> </ul> <h2><strong>Amazon Lookout for Metrics</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Release dimension value filtering feature to allow customers to define dimension filters for including only a subset of their dataset to be used by LookoutMetrics.</li> </ul> </li> </ul> <h2><strong>Amazon Route 53</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>Amazon Route 53 now supports the Middle East (UAE) Region (me-central-1) for latency records, geoproximity records, and private DNS for Amazon VPCs in that region.</li> </ul> </li> </ul> <h2><strong>Amazon SageMaker Service</strong></h2> <ul> <li> <h3>Features</h3> <ul> <li>This release adds Mode to AutoMLJobConfig.</li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/aws/aws-sdk-java/commit/874a8771641b0d825e5f2fb6cd806680f22028e6"><code>874a877</code></a> AWS SDK for Java 1.12.300</li> <li><a href="https://github.com/aws/aws-sdk-java/commit/48bc7cfbdc9b0806cee3b4a00d72924186dbe70d"><code>48bc7cf</code></a> Update GitHub version number to 1.12.300-SNAPSHOT</li> <li><a href="https://github.com/aws/aws-sdk-java/commit/b08cda01a176eed97fd6c7823c35747736f95f25"><code>b08cda0</code></a> AWS SDK for Java 1.12.299</li> <li><a href="https://github.com/aws/aws-sdk-java/commit/ecfdc1f5a9d9e9984bb13cecd3e88ca401640e91"><code>ecfdc1f</code></a> Update GitHub version number to 1.12.299-SNAPSHOT</li> <li>See full diff in <a href="https://github.com/aws/aws-sdk-java/compare/1.12.298...1.12.300">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.amazonaws:aws-java-sdk-core&package-manager=maven&previous-version=1.12.298&new-version=1.12.300)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Build failed (retrying...): |
Build succeeded: |
10540: [Backport stable/8.0] Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description This PR backports part of the changes found in #10289, notably the ones in the scheduler about the retry strategies (skipping the engine ones which are all around new 8.1.0 code). It seems I forgot to backport that part, as we did remove `runUntilDone` in 8.0 as well, so we should fix the retry strategies. ## Related issues backports #10289 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10540: [Backport stable/8.0] Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description This PR backports part of the changes found in #10289, notably the ones in the scheduler about the retry strategies (skipping the engine ones which are all around new 8.1.0 code). It seems I forgot to backport that part, as we did remove `runUntilDone` in 8.0 as well, so we should fix the retry strategies. ## Related issues backports #10289 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10540: [Backport stable/8.0] Ensure retries are not interleaved even on multiple sequential calls r=npepinpe a=npepinpe ## Description This PR backports part of the changes found in #10289, notably the ones in the scheduler about the retry strategies (skipping the engine ones which are all around new 8.1.0 code). It seems I forgot to backport that part, as we did remove `runUntilDone` in 8.0 as well, so we should fix the retry strategies. ## Related issues backports #10289 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10494: [Backport stable/8.0] Do not fail consistency check if log is empty r=deepthidevaki a=backport-action # Description Backport of #10463 to `stable/8.0`. relates to #10451 10540: [Backport stable/8.0] Ensure retries are not interleaved even on multiple sequential calls r=deepthidevaki a=npepinpe ## Description This PR backports part of the changes found in #10289, notably the ones in the scheduler about the retry strategies (skipping the engine ones which are all around new 8.1.0 code). It seems I forgot to backport that part, as we did remove `runUntilDone` in 8.0 as well, so we should fix the retry strategies. ## Related issues backports #10289 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com> Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
10540: [Backport stable/8.0] Ensure retries are not interleaved even on multiple sequential calls r=deepthidevaki a=npepinpe ## Description This PR backports part of the changes found in #10289, notably the ones in the scheduler about the retry strategies (skipping the engine ones which are all around new 8.1.0 code). It seems I forgot to backport that part, as we did remove `runUntilDone` in 8.0 as well, so we should fix the retry strategies. ## Related issues backports #10289 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Description
By using
ActorControl#submit
in some of the retry strategies, we can create race conditions if the retry strategy is reused. Since the initial call uses run to prepend a retry attempt, and further retries use submit, it's possible for one run to retry (thus submitting the retry job to the end of the queue) and the next call torunWithRetry
cause its state to be overwritten, causing issues when it comes to completing the future (as well as potential shared state by the operations).Additionally, this PR fixes an issue where on retry, we were not resetting the writer, causing the same command to be written multiple times.
There is a regression test added which isn't perfect, and I'd like some suggestions on how to improve it. The integration test added to the
ProcessingScheduleServiceTest
is not amazing and likely to flaky, as it's hard to write controlled tests with our timers. Suggestions are welcomed 👍Related issues
closes #10240
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.