Skip to content

Fix abortion for injected tasks#516

Merged
dmitry-lyfar merged 1 commit into
mainfrom
fix/injected-task-abort
Oct 20, 2025
Merged

Fix abortion for injected tasks#516
dmitry-lyfar merged 1 commit into
mainfrom
fix/injected-task-abort

Conversation

@dmitry-lyfar
Copy link
Copy Markdown
Collaborator

@dmitry-lyfar dmitry-lyfar commented Oct 20, 2025

Description

If tasks were injected after the change was aborted, it results in a deadlock between the
tasks that would wait for the parent to finish and the parent that would wait for the children to
undo.

Self-review quick check

  • Make decisions that cost a lot to reverse explicit in the PR description.
  • Avoid nested conditions.
  • Delete dead code and redundant comments.
  • Normalise symmetries by sticking to doing identical things identically.
// one way to handle errors
if err := f(); err != nil {
   ...
}

// one way to handle multiple returns
val, err := f()
if err != nil {
   ...
}
...
  • Check that coupled code elements, files, and directories are adjacent. For example, test data is stored as close as possible to a test.
  • Put variable declaration and initialisation together.
  • Divide large expressions into digestable and self-explanatory ones. Use multiple variables if required.
  • Put a blank line between two logically different chunks of code.
  • Follow the style guide for new error messages.

Docs

  • I have checked and added or updated relevant documentation.
  • I have checked and added or updated relevant release notes.
  • I have included the technical author in the review.

Or:

  • I confirm the PR has no implications for documentation.

@dmitry-lyfar dmitry-lyfar self-assigned this Oct 20, 2025
If tasks were injected after the change was
aborted, it results in a deadlock between the
tasks that would wait for the parent to finish and
the parent that would wait for the children to
undo.
Copy link
Copy Markdown
Contributor

@jonathan-conder jonathan-conder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the expertise to know if this is the best approach, but it makes sense to me. Will be interesting to see what the Snap team says

@github-actions
Copy link
Copy Markdown

TICS Quality Gate

✔️ Passed

workshop

All conditions passed

See the results in the TICS Viewer

The following files have been checked for this project
  • internal/overlord/handlersetup/handler_setup.go

Automatic-tests / code-coverage / TICS GitHub Action

@dmitry-lyfar dmitry-lyfar merged commit 834fe4c into main Oct 20, 2025
12 checks passed
@dmitry-lyfar dmitry-lyfar deleted the fix/injected-task-abort branch October 20, 2025 20:04
@olivercalder
Copy link
Copy Markdown
Member

I need to have a look at this in snapd. Was discussed here previously: https://chat.canonical.com/canonical/pl/g4dfeqnsqfbu9x74t7x6enbkxh

@olivercalder
Copy link
Copy Markdown
Member

The test you added does not seem to actually exercise the bug. If I back out the code changes but leave the test changes, it still passes. This is on master:

[N] oac@fw12 ~/P/w/i/o/handlersetup (main)> git diff | cat
diff --git a/internal/overlord/handlersetup/handler_setup.go b/internal/overlord/handlersetup/handler_setup.go
index 4759818b..6c92be30 100644
--- a/internal/overlord/handlersetup/handler_setup.go
+++ b/internal/overlord/handlersetup/handler_setup.go
@@ -168,19 +168,4 @@ func InjectTasks(mainTask *state.Task, extraTasks *state.TaskSet) {
 
 	// make the extra tasks wait for main task
 	extraTasks.WaitFor(mainTask)
-
-	// Update status of the injected tasks in case the main task was aborted
-	// already. Lets consider what status the main task can have at the time
-	// of the call:
-	// - Do (request processing stage, change is not in a Doing state)
-	// - Doing (Do handler is executed for the main task)
-	// - Abort (the task was aborted *before* the InjectTasks was called AND the
-	// state was locked but *after* the Do handler for the task was started)
-	// - Undoing (Undo handler is executed for the task)
-	status := mainTask.Status()
-	if status == state.AbortStatus {
-		for _, t := range extraTasks.Tasks() {
-			t.SetStatus(state.HoldStatus)
-		}
-	}
 }
[I] oac@fw12 ~/P/w/i/o/handlersetup (main)> go test
OK: 5 passed
PASS
ok  	github.com/canonical/workshop/internal/overlord/handlersetup	0.008s

@dmitry-lyfar @jonathan-conder

@olivercalder
Copy link
Copy Markdown
Member

I opened a PR against snapd to backport this change, but we should make sure the tests exercise the fix before progressing there: canonical/snapd#16235

@jonathan-conder
Copy link
Copy Markdown
Contributor

I think this change will make it exercise the fix. I'll wait for Dmitry to confirm though, I never went in depth on understanding this one.

diff --git a/internal/overlord/handlersetup/handler_setup_test.go b/internal/overlord/handlersetup/handler_setup_test.go
index 64e73f19..e4e933d3 100644
--- a/internal/overlord/handlersetup/handler_setup_test.go
+++ b/internal/overlord/handlersetup/handler_setup_test.go
@@ -158,14 +158,14 @@ func (s *CommonStateFuncs) TestInjectTasksMainAborted(c *check.C) {
        t0.SetStatus(state.DoingStatus)
        chg.AddTask(t0)
        t0.JoinLane(lane)
+       // This will abort the main task.
+       chg.Abort()
        t01 := s.state.NewTask("task1-1", "")
        t01.WaitFor(t0)
        t02 := s.state.NewTask("task1-2", "")
        t02.WaitFor(t0)
 
        ts := state.NewTaskSet(t01, t02)
-       // This will abort the main task.
-       chg.Abort()
        handlersetup.InjectTasks(t0, ts)
 
        // verify that extra tasks are on hold

The reason the test passes without the fix is this section of Change.abortTasks:

for _, halted := range t.HaltTasks() {
if !seenTasks[halted.id] {
tasks = append(tasks, halted)
}
}

@dmitry-lyfar
Copy link
Copy Markdown
Collaborator Author

@olivercal

I opened a PR against snapd to backport this change, but we should make sure the tests exercise the fix before progressing there: canonical/snapd#16235

Thanks for this! The issue is that chg.Abort() aborts all the tasks it can reach via tracking halted and waiting ones, not only the ones the belong to the change. But, anyway, in this case I should have not made t01 and t02 to wait for t0 anyway as this is the InjectTasks job.

Please see the fix here: https://github.com/canonical/workshop/pull/527/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants