Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancellation issue in V2 workflows running Input generation. #1092

Closed
sambles opened this issue Jul 25, 2024 · 2 comments · Fixed by #1093
Closed

Cancellation issue in V2 workflows running Input generation. #1092

sambles opened this issue Jul 25, 2024 · 2 comments · Fixed by #1093
Assignees
Milestone

Comments

@sambles
Copy link
Contributor

sambles commented Jul 25, 2024

Issue Description

Had a report of an analyses cancellation call failing when running generate inputs.

The cancellation call returns HTTP 200, and sub-tasks are marked as cancelled, but tasks appear to keep running in the background.

Cancellation issued - 13:12:25

INFO 2024-07-23 13:12:25,828 middleware 66 140474528149952 ^[[36mPOST /api/v2/analyses/1813/cancel_generate_inputs/ - 200^[[0m

Worker monitor - 13:29:19

[2024-07-23 13:29:19,709: INFO/ForkPoolWorker-9] set_task_status_v2[1efe6c6d-f504-44ea-917f-8bd72a92a0d9]: Task Status Update: analysis_pk: 1813, status: INPUTS_GENERATION_STARTED
@sambles sambles self-assigned this Jul 25, 2024
@sambles sambles moved this to In Progress in Oasis Dev Team Tasks Jul 25, 2024
@sambles sambles changed the title Possible cancellation issue in V2 workflows running Input generation. Cancellation issue in V2 workflows running Input generation. Jul 25, 2024
@sambles
Copy link
Contributor Author

sambles commented Jul 29, 2024

This Issue also feels linked to #1057

The attached parent task id used to identify the celery canvas workflow is not correct. analysis.generate_inputs_task_id

  1. run generate_inputs is posted.
  2. No workers are available to accept initial task, all sub-tasks don't yet have task id's as they are all PENDING.
  3. cancel_generate_inputs is posted.
  4. the sub-task cancellation does nothing, since there are no sub-task id's to revoke yet. The parent task revoke call also fails (because the main task id is not correct)
  5. Analysis is marked as cancelled in DB, while on the backend nothing has been cancelled.

--- Worker becomes available ---
6. initial task (still on queue) is picked up and the sub-tasks start to execute.
7. follow-up post to cancel_generate_inputs do noting, becuase the analysis is tagged as already cancelled.

@sambles sambles linked a pull request Jul 30, 2024 that will close this issue
@sambles
Copy link
Contributor Author

sambles commented Jul 31, 2024

Part of the issue is described here: celery/celery#8888

Issue: The challenge arises when the worker associated with the task's queue is offline. In such cases, the broadcast signal sent by .revoke() is lost. Consequently, even if tasks have been revoked, upon restarting the workers, these tasks are still executed.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Oasis Dev Team Tasks Aug 5, 2024
@awsbuild awsbuild added this to the 2.3.7 milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants