fix(amber): make region kill synchronous before scheduling next region#4557
Conversation
…ronous kill logic.
…xiaozhen-sync-region-kill
|
The design looks right to me. Can we add a visual diagram to illustrate it? |
|
I see there are a few recent changes with scheduler. Given its importance, I hope we can be careful on those changes. Two questions:
|
|
|
Thanks. It will be good to have those discussions on the bug issue, before this PR. If you want to include this in the release, we need more test cases and runs of tests. so in general I suggest we merge it after the release if it only blocks control block, which is an ongoing work. |
Test cases added. We can merge it after the release. It does not affect current user experience. |
aglinxinyuan
left a comment
There was a problem hiding this comment.
LGTM. The changes in this PR were discussed offline, and both the issue and PR description accurately reflect that discussion. Let’s hold off on merging until the release.
What changes were proposed in this PR?
This PR makes region termination synchronous with respect to region scheduling. Previously, the workflow coordinator could schedule the next region before the previous region's workers were fully terminated.
The main changes are:
RegionExecutionCoordinatorin a non-completed phase while worker termination is still in progress.WorkflowExecutionCoordinatorbefore scheduling the next region.gracefulStoponly after all workers successfully reply toendWorker.endWorkerfails because a worker still has queued messages.EndHandlerstrict:endWorkerfails whenever the worker still has any unprocessed message.Any related issues, documentation, discussions?
Closes #4556
How was this PR tested?
Manually tested; Added test cases for
RegionExecutionCoordinator,WorkflowExecutionCoordinator, andEndHandler.Was this PR authored or co-authored using generative AI tooling?
Generated-by: OpenAI Codex