-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use retryStrategy and hooks in unison on intermediate steps/tasks #12120
Comments
@toyamagu-2021 could you take a look at this and see if you can diagnose why |
I think this is the bug comes from the following line (only consider argo-workflows/workflow/controller/operator.go Lines 1729 to 1746 in 877a2e7
I can fix this, but @MarcusMoe you might want to use |
Or we can use step for a workaround if your use-case allows to use steps, not DAG. |
I need to investigate more carefully, the root cause is dag task does not wait for the TemplateLevelLifeCycleHook. In this issue case, we can use: hooks:
exit: # NOTE: YOU CAN NOT CHANGE THIS STRING
template: exit-handler
arguments: {}
expression: tasks["some-task"].status == "Failed" P.S.: I noticed the following logic. I will add TemplateLevelLifeCycleHook to this. argo-workflows/workflow/controller/dag.go Line 863 in 877a2e7
|
@toyamagu-2021 Thank you for all your suggestions! Unfortunately in my case this would be part of a larger workflow where |
@MarcusMoe - name: exit-handler
steps:
- - name: suceed
template: celebrate
when: "tasks["some-task"].status == Succeeded"
- name: failed
template: cry
when: "tasks["some-task"].status == Failed" This might work for both Succeeded and Failed (Sorry if I missed anything). |
@MarcusMoe |
@toyamagu-2021 |
Pre-requisites
:latest
What happened/what you expected to happen?
I encountered a bug where the workflow is unable to complete using hooks and retryStrategy. When
some-task
fails or succeeds, I am using anexit-handler
to send a status update to Github. Theexit-handler
has a retryStrategy to ensure that the status update is sent. While the status update is happening, thefinish
task depending onsome-task
continues correctly. Whethersome-task
fails or succeeds, a hook with theexit-handler
is launched and the workflow gets stuck. The hook's retry template gets stuck in "Running" state, even though the pod has completed its task. This results in the whole workflow getting stuck in "Running" state as well.Removing the
finish
task from the workflow fixes the issue, so it seems to only occure when hooks are launched from intermediate tasks/steps. Removing retryStrategy also removes the issue, as the pod is the only thing launched and it completes successfully. So far it seems to affect both DAGs and Steps.I would like the hook with the
exit-template
to complete, allowing the workflow continue executing and eventually exit with whatever state it has achieved (Failed, Succeeded or Error).Failed
some-task
with a failure hook:Successful
some-task
with a success hook:Version
v3.4.11
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: