DRAFT: ProcessSchedulingService no extra submit #10390

Zelldon · 2022-09-16T17:22:43Z

Description

@npepinpe I need your input here

I removed the additional submit, but it turns out to be needed for the tests, since Scheduling timers from another Actor is not allowed somehow (?).

The timer will be added to the timer queue which is part of the ActorThread. Tbh I see not really an issue here, even if this is another thread. When the timer is due the ActorTask will be enqueued and run from some ActorThread 🤷 Maybe I oversee something. Do you think we can remove the restriction?

Related issues

closes #10291

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

Zelldon · 2022-09-16T17:23:19Z

scheduler/src/main/java/io/camunda/zeebe/scheduler/ActorControl.java

@@ -136,7 +136,7 @@ public ActorFuture<Void> call(final Runnable action) {
   * @return
   */
  public ScheduledTimer runDelayed(final Duration delay, final Runnable runnable) {
-    ensureCalledFromWithinActor("runDelayed(...)");
+//    ensureCalledFromWithinActor("runDelayed(...)");


The timer will be added to the timer queue which is part of the ActorThread. Tbh I see not really an issue here, even if this is another thread. When the timer is due the ActorTask will be enqueued and run from some ActorThread 🤷 Maybe I oversee something. Do you think we can remove the restriction?

@npepinpe or @saig0 maybe you have an idea?

IIRC the DeadlineTimerWheel (which the ActorTimerQueue extends) is not thread safe, so modifying it from a different actor may cause race conditions.

It's a little unclear how the check that the current thread's job is, well, us ("us", as in, this actor) - meaning I guess we check if the thread is currently owned by us, so no other concurrent access to the timer queue can be done. I guess that's the reasoning, and why it's OK then to use a non-thread safe queue?

A little like how it's OK to prepend a job to the fast lane when the same check is true (the fast lane being a non-thread safe queue).

Ah good point so it is about the DeadlineTimerWheel, why we have to use our thread. So one possibility would be to make it thread safe? 🤔

Here btw is the check https://github.com/camunda/zeebe/blob/main/scheduler/src/main/java/io/camunda/zeebe/scheduler/ActorControl.java#L363-L377 where we verify whether it is our OWN actor 🙈

Yeah, that's the one 😄 To me it's more confusing - I mean I know what it does, but I don't think it's obvious the first time you read it what it guarantees 😄

I don't know about making the actor timer queue thread safe. I guess there's no real harm other than maybe performance, but I wasn't there when this was decided and it's not documented, so maybe we'll find out there was a good reason 😄

@npepinpe another idea. What do you think of we keep the additional submit, to keep it thread-safe, but create an own actor for the PRocessingScheduleService. This might reduce the potential delays (since the actor only has to schedule and execute timers). This would also solve other issues, like blocking processing actor when having many timers or timeouts. If you agree then I would create another PR where I migrate the ProcessingScheduleService to an own actor with own

Hm, so upside is that it's not blocking the processing actor anymore (which is nice), downside is that there is some communication between both actors (in the form of the abort condition, the batch writer, etc.) which probably gets more complicated. I don't think it's too crazy for it to have its own writer tbh, and maybe even own abort condition (which is controlled when closing the whole stream processor), so maybe it's not too bad. OTOH I think it's a good idea

github-actions · 2022-09-16T17:33:25Z

Test Results

  547 files -   312   547 suites - 312 10m 34s ⏱️ - 1h 41m 26s
3 744 tests - 3 082 3 738 ✔️ - 3 077 6 💤 - 5 0 ❌ ±0
3 905 runs - 3 105 3 899 ✔️ - 3 100 6 💤 - 5 0 ❌ ±0

Results for commit 222f027. ± Comparison against base commit 9649e4d.

10428: Remove the extra submit from ProcessingScheduleServiceImpl r=Zelldon a=Zelldon ## Description And another round! 😁 After #10414 and #10390 I think I found a good way to replace the additional Actor#submit from the production code and move it to where it is necessary in tests. As described here #10414 (comment) it is not easily possible to create a separate Actor for the ScheduleService, due to concurrency issues and further work we would need to do here. ## Details In this PR I iterate over the ProcessingScheduleServiceImpl and remove the StreamProcessorContext from the constructor. The service gets all necessary dependencies injected on the constructor (except the actor), mostly as suppliers. This allows to lazy load dependencies and reduce dependencies for tests. It allowed to write some more unit-test like tests. The service gets an open method to initialize the service with the actor control and create its own writer. This allows us to no longer share the writer with the StreamProcessor, since this might lead to issues especially if we now run in between (during processing phases). Due to the later initializing of the writers we reduce resources on followers, if the service is never scheduled we don't need to create a writer. When the StreamProcessor is in the closing phase it will close the scheduled service as well.  ## Related issues  closes #10414 closes #10390 closes #10291 Co-authored-by: Christopher Zell <zelldon91@googlemail.com>

10418: Add monitor backup API endpoint r=npepinpe a=npepinpe ## Description This PR extends the backup management endpoint to add a monitoring API under `GET /{id}` (e.g. if the actuator is at `http://localhost:9600/actuator/backups`, then `GET http://localhost:9600/actuator/backups/{id}`). At the moment we use the internal types provided by the broker client as the response types, but in the future this will be replaced by the OpenAPI generated types. As such, I didn't think it was necessary to create user facing types and write some mapping code, since we will scrape it and replace it. However, I understand that we don't know when that will done, so I'm open to doing it anyway for safety. The acceptance tests were updated (and renamed to better reflect what they do), and as they overlap with the old `BackupIT` tests, those were deleted. ## Related issues closes #9902 10428: Remove the extra submit from ProcessingScheduleServiceImpl r=Zelldon a=Zelldon ## Description And another round! 😁 After #10414 and #10390 I think I found a good way to replace the additional Actor#submit from the production code and move it to where it is necessary in tests. As described here #10414 (comment) it is not easily possible to create a separate Actor for the ScheduleService, due to concurrency issues and further work we would need to do here. ## Details In this PR I iterate over the ProcessingScheduleServiceImpl and remove the StreamProcessorContext from the constructor. The service gets all necessary dependencies injected on the constructor (except the actor), mostly as suppliers. This allows to lazy load dependencies and reduce dependencies for tests. It allowed to write some more unit-test like tests. The service gets an open method to initialize the service with the actor control and create its own writer. This allows us to no longer share the writer with the StreamProcessor, since this might lead to issues especially if we now run in between (during processing phases). Due to the later initializing of the writers we reduce resources on followers, if the service is never scheduled we don't need to create a writer. When the StreamProcessor is in the closing phase it will close the scheduled service as well.  ## Related issues  closes #10414 closes #10390 closes #10291 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: Christopher Zell <zelldon91@googlemail.com>

Zelldon added 3 commits September 16, 2022 19:16

fix: do not use additional submit

f292a12

wip: no longer throw on different thread

38fa3b1

wip: use other actor in thread

222f027

Zelldon commented Sep 16, 2022

View reviewed changes

This was referenced Sep 19, 2022

DefaultActorClock is not thread safe #10400

Closed

Migrate ProcessingScheduleService to own actor #10414

Closed

Remove the extra submit from ProcessingScheduleServiceImpl #10428

Merged

zeebe-bors-camunda bot closed this in f8ee08a Sep 21, 2022

Zelldon deleted the zell-no-extra-submit branch March 28, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: ProcessSchedulingService no extra submit #10390

DRAFT: ProcessSchedulingService no extra submit #10390

Zelldon commented Sep 16, 2022

Zelldon Sep 16, 2022

npepinpe Sep 16, 2022 •

edited

Zelldon Sep 19, 2022

npepinpe Sep 19, 2022

npepinpe Sep 19, 2022

Zelldon Sep 20, 2022

npepinpe Sep 20, 2022

github-actions bot commented Sep 16, 2022

DRAFT: ProcessSchedulingService no extra submit #10390

DRAFT: ProcessSchedulingService no extra submit #10390

Conversation

Zelldon commented Sep 16, 2022

Description

Related issues

Definition of Done

Zelldon Sep 16, 2022

Choose a reason for hiding this comment

npepinpe Sep 16, 2022 • edited

Choose a reason for hiding this comment

Zelldon Sep 19, 2022

Choose a reason for hiding this comment

npepinpe Sep 19, 2022

Choose a reason for hiding this comment

npepinpe Sep 19, 2022

Choose a reason for hiding this comment

Zelldon Sep 20, 2022

Choose a reason for hiding this comment

npepinpe Sep 20, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 16, 2022

Test Results

npepinpe Sep 16, 2022 •

edited