Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable suspend-resume of orchestrators. #764

Merged
merged 42 commits into from
Aug 18, 2022

Conversation

Julio-Arroyo
Copy link
Contributor

@Julio-Arroyo Julio-Arroyo commented Jul 13, 2022

Introducing the first draft of an implementation of a suspend-resume feature for orchestrations.

Some highlights of the design include the decision not to introduce new storage types to handle incoming messages while an orchestration is suspended. Instead, we treat incoming messages as a piece of state that gets re-built during replay of the history.

@ghost
Copy link

ghost commented Jul 13, 2022

CLA assistant check
All CLA requirements met.

@amdeel amdeel requested review from amdeel and cgillum July 13, 2022 22:33
Copy link
Collaborator

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial batch of comments. I'll try to add more over the next couple of days.

src/DurableTask.Core/History/ExecutionResumedEvent.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/History/ExecutionSuspendedEvent.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/IOrchestrationServiceClient.cs Outdated Show resolved Hide resolved
@elokaac
Copy link
Collaborator

elokaac commented Jul 14, 2022

Just curious about the design for this feature. How would sub orchestrations be handled? Would a "suspend" on the parent fan out "suspend" events to all running sub orchestrations? Or would any running sub orchestration still proceed while the parent is suspended?

@cgillum
Copy link
Collaborator

cgillum commented Jul 14, 2022

@aueloka the current design only suspends the target orchestration. Sub-orchestrations would continue running.

@davidmrdavid davidmrdavid self-requested a review July 14, 2022 16:28
@elokaac
Copy link
Collaborator

elokaac commented Jul 14, 2022

@aueloka the current design only suspends the target orchestration. Sub-orchestrations would continue running.

I see. Okay thanks.

Copy link
Collaborator

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also left a first batch of comments.

Copy link
Collaborator

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments from me, this time focused on the core of the implementation.

src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/OrchestrationStatus.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/TaskOrchestrationContext.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/OrchestrationRuntimeState.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/OrchestrationRuntimeState.cs Outdated Show resolved Hide resolved
src/DurableTask.Core/OrchestrationRuntimeState.cs Outdated Show resolved Hide resolved
@@ -151,6 +151,16 @@ public async Task ForceTerminateTaskOrchestrationAsync(string instanceId, string
}
}

public Task SuspendTaskOrchestrationAsync(string instanceId, string reason)
{
throw new NotImplementedException();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make a note to notify the ServiceFabric backend maintainers about this change.

@@ -258,6 +256,18 @@ void SetMarkerEvents(HistoryEvent historyEvent)
}

ExecutionCompletedEvent = completedEvent;
orchestrationStatus = OrchestrationStatus.Completed;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be completedEvent.OrchestrationStatus, just like it was in the previous implementation of SetMarkerEvents(). Otherwise, orchestration failure won't be handled correctly. This is because ExecutionCompletedEvent handles multiple forms of completion, including failure.

@cgillum
Copy link
Collaborator

cgillum commented Aug 12, 2022

/azp run

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 12, 2022

EDIT: Pipeline started successfully

Copy link
Collaborator

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me and the CI is passing, so signing off on this! Since this is a big change that touches the core engine, let's verify that the scenario is working as expected in the extension repo before merging this change.

@davidmrdavid
Copy link
Collaborator

davidmrdavid commented Aug 15, 2022

❤️ thanks @amdeel and @cgillum for the detailed reviews. Will work with @Julio-Arroyo to make sure this works as expected in the Extension prior to the merge

@Julio-Arroyo
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 764 in repo Azure/durabletask

@cgillum
Copy link
Collaborator

cgillum commented Aug 17, 2022

/azp run

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 17, 2022

EDIT: Pipeline started successfully

@davidmrdavid
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@cgillum
Copy link
Collaborator

cgillum commented Aug 17, 2022

I just took a look at the CI results, and it looks like tests never actually ran! We need to investigate what's going on, and whether it's just this PR or if other PRs have the same problem.

@cgillum
Copy link
Collaborator

cgillum commented Aug 18, 2022

I think the CI may have been silently broken since this PR back in April: #707. Basically, our CI filters the list of test assemblies using folder paths. When we changed the TFM from net461 to net462 for the test projects, that changed the folder paths. As a result, the CI wouldn't find any test assemblies and simply marked the run as "passed" since nothing failed (problem is, no tests passed either).

I've fixed the problem with the CI configuration and will try re-running the CI. If we're lucky, it will pass, but it's possible that a regression may have snuck in from some other PR during the time that the tests were silently disabled.

@cgillum
Copy link
Collaborator

cgillum commented Aug 18, 2022

/azp run

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 18, 2022

EDIT: Pipeline started successfully

@cgillum
Copy link
Collaborator

cgillum commented Aug 18, 2022

Looks like some existing tests not related to this PR is failing.

image

My guess is that the regression was caused by #750. I verified by debugging locally that at least one of these tests fail consistently. The regression has not shipped yet so the problem doesn't exist out in the wild, but it's unfortunately a blocker for this release. I'll contact the contributor to see if they can fix the issue. Worst case scenario is that we have to revert their changes to get the CI green again.

@cgillum
Copy link
Collaborator

cgillum commented Aug 18, 2022

/azp run

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 18, 2022

EDIT: Pipeline started successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants