-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: ProcessSchedulingService no extra submit #10390
Conversation
@@ -136,7 +136,7 @@ public ActorFuture<Void> call(final Runnable action) { | |||
* @return | |||
*/ | |||
public ScheduledTimer runDelayed(final Duration delay, final Runnable runnable) { | |||
ensureCalledFromWithinActor("runDelayed(...)"); | |||
// ensureCalledFromWithinActor("runDelayed(...)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer will be added to the timer queue which is part of the ActorThread. Tbh I see not really an issue here, even if this is another thread. When the timer is due the ActorTask will be enqueued and run from some ActorThread 🤷 Maybe I oversee something. Do you think we can remove the restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the DeadlineTimerWheel (which the ActorTimerQueue extends) is not thread safe, so modifying it from a different actor may cause race conditions.
It's a little unclear how the check that the current thread's job is, well, us ("us", as in, this actor) - meaning I guess we check if the thread is currently owned by us, so no other concurrent access to the timer queue can be done. I guess that's the reasoning, and why it's OK then to use a non-thread safe queue?
A little like how it's OK to prepend a job to the fast lane when the same check is true (the fast lane being a non-thread safe queue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point so it is about the DeadlineTimerWheel, why we have to use our thread. So one possibility would be to make it thread safe? 🤔
Here btw is the check https://github.com/camunda/zeebe/blob/main/scheduler/src/main/java/io/camunda/zeebe/scheduler/ActorControl.java#L363-L377 where we verify whether it is our OWN actor 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the one 😄 To me it's more confusing - I mean I know what it does, but I don't think it's obvious the first time you read it what it guarantees 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about making the actor timer queue thread safe. I guess there's no real harm other than maybe performance, but I wasn't there when this was decided and it's not documented, so maybe we'll find out there was a good reason 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npepinpe another idea. What do you think of we keep the additional submit, to keep it thread-safe, but create an own actor for the PRocessingScheduleService. This might reduce the potential delays (since the actor only has to schedule and execute timers). This would also solve other issues, like blocking processing actor when having many timers or timeouts. If you agree then I would create another PR where I migrate the ProcessingScheduleService to an own actor with own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so upside is that it's not blocking the processing actor anymore (which is nice), downside is that there is some communication between both actors (in the form of the abort condition, the batch writer, etc.) which probably gets more complicated. I don't think it's too crazy for it to have its own writer tbh, and maybe even own abort condition (which is controlled when closing the whole stream processor), so maybe it's not too bad. OTOH I think it's a good idea
10428: Remove the extra submit from ProcessingScheduleServiceImpl r=Zelldon a=Zelldon ## Description And another round! 😁 After #10414 and #10390 I think I found a good way to replace the additional Actor#submit from the production code and move it to where it is necessary in tests. As described here #10414 (comment) it is not easily possible to create a separate Actor for the ScheduleService, due to concurrency issues and further work we would need to do here. ## Details In this PR I iterate over the ProcessingScheduleServiceImpl and remove the StreamProcessorContext from the constructor. The service gets all necessary dependencies injected on the constructor (except the actor), mostly as suppliers. This allows to lazy load dependencies and reduce dependencies for tests. It allowed to write some more unit-test like tests. The service gets an open method to initialize the service with the actor control and create its own writer. This allows us to no longer share the writer with the StreamProcessor, since this might lead to issues especially if we now run in between (during processing phases). Due to the later initializing of the writers we reduce resources on followers, if the service is never scheduled we don't need to create a writer. When the StreamProcessor is in the closing phase it will close the scheduled service as well. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> closes #10414 closes #10390 closes #10291 Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
10418: Add monitor backup API endpoint r=npepinpe a=npepinpe ## Description This PR extends the backup management endpoint to add a monitoring API under `GET /{id}` (e.g. if the actuator is at `http://localhost:9600/actuator/backups`, then `GET http://localhost:9600/actuator/backups/{id}`). At the moment we use the internal types provided by the broker client as the response types, but in the future this will be replaced by the OpenAPI generated types. As such, I didn't think it was necessary to create user facing types and write some mapping code, since we will scrape it and replace it. However, I understand that we don't know when that will done, so I'm open to doing it anyway for safety. The acceptance tests were updated (and renamed to better reflect what they do), and as they overlap with the old `BackupIT` tests, those were deleted. ## Related issues closes #9902 10428: Remove the extra submit from ProcessingScheduleServiceImpl r=Zelldon a=Zelldon ## Description And another round! 😁 After #10414 and #10390 I think I found a good way to replace the additional Actor#submit from the production code and move it to where it is necessary in tests. As described here #10414 (comment) it is not easily possible to create a separate Actor for the ScheduleService, due to concurrency issues and further work we would need to do here. ## Details In this PR I iterate over the ProcessingScheduleServiceImpl and remove the StreamProcessorContext from the constructor. The service gets all necessary dependencies injected on the constructor (except the actor), mostly as suppliers. This allows to lazy load dependencies and reduce dependencies for tests. It allowed to write some more unit-test like tests. The service gets an open method to initialize the service with the actor control and create its own writer. This allows us to no longer share the writer with the StreamProcessor, since this might lead to issues especially if we now run in between (during processing phases). Due to the later initializing of the writers we reduce resources on followers, if the service is never scheduled we don't need to create a writer. When the StreamProcessor is in the closing phase it will close the scheduled service as well. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> closes #10414 closes #10390 closes #10291 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
Description
@npepinpe I need your input here
I removed the additional submit, but it turns out to be needed for the tests, since Scheduling timers from another Actor is not allowed somehow (?).
The timer will be added to the timer queue which is part of the ActorThread. Tbh I see not really an issue here, even if this is another thread. When the timer is due the ActorTask will be enqueued and run from some ActorThread 🤷 Maybe I oversee something. Do you think we can remove the restriction?
Related issues
closes #10291
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.