Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflows with error in onExit handler never finish #5835

Closed
vladlosev opened this issue May 5, 2021 · 2 comments · Fixed by #5837
Closed

Workflows with error in onExit handler never finish #5835

vladlosev opened this issue May 5, 2021 · 2 comments · Fixed by #5837
Assignees
Labels
Milestone

Comments

@vladlosev
Copy link
Contributor

Summary

What happened/what you expected to happen?

When Argo fails to run an onExit handler in a workflow, it never marks the workflow as finished, even if all the nodes in it are fulfilled.
Expected: the workflow is marked as errored.

Diagnostics

What Kubernetes provider are you using?
EKS 1.19, kind v0.10.0

What version of Argo Workflows are you running?
3.0.2

Paste a workflow that reproduces the bug, including status:

# kubectl get wftmpl -o yaml test-exit-handler
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: test-exit-handler
  namespace: test
spec:
  entrypoint: argosay
  onExit: exit-handler
  templates:
    - name: argosay
      container:
        name: main
        image: 'argoproj/argosay:v2'
        command:
          - /argosay
        args:
          - echo
          - hello argo
    - name: exit-handler
      templateRef:
        name: nonexistent
        template: exit-handler

# The bug only manifests when the workflow is created via workflowTemplateRef.
# kubectl get wf -o yaml test-exit-handler
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-exit-handler-
  namespace: test
spec:
  workflowTemplateRef:
    name: test-exit-handler

Paste the logs from the workflow controller:

These entries show up in the logs every 30 minutes after the workflow is created:

# kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name) | grep ${workflow}
time="2021-05-05T19:30:40.534Z" level=debug msg="Syncing all CronWorkflows"
time="2021-05-05T19:30:44.497Z" level=info msg="Get leases 200"
time="2021-05-05T19:30:44.521Z" level=info msg="Update leases 200"
time="2021-05-05T19:30:49.497Z" level=info msg="Get leases 200"
time="2021-05-05T19:30:49.512Z" level=info msg="Update leases 200"
time="2021-05-05T19:30:49.677Z" level=info msg="Processing workflow" namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.683Z" level=debug msg="Evaluating node test-exit-handler-s77sx: template: *v1alpha1.WorkflowStep (argosay), boundaryID: " namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.683Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (argosay)"
time="2021-05-05T19:30:49.683Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (argosay)"
time="2021-05-05T19:30:49.683Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (argosay)"
time="2021-05-05T19:30:49.683Z" level=debug msg="Node test-exit-handler-s77sx already completed" namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.683Z" level=info msg="Running OnExit handler: exit-handler" namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.684Z" level=debug msg="Evaluating node test-exit-handler-s77sx.onExit: template: *v1alpha1.WorkflowStep (exit-handler), boundaryID: " namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.684Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (exit-handler)"
time="2021-05-05T19:30:49.684Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (exit-handler)"
time="2021-05-05T19:30:49.684Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (exit-handler)"
time="2021-05-05T19:30:49.684Z" level=error msg="Mark error node" error="template 'exit-handler' type is unknown" namespace=test nodeName=test-exit-handler-s77sx.onExit workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.684Z" level=error msg="error in exit template execution" error="template 'exit-handler' type is unknown" namespace=test workflow=test-exit-handler-s77sx
time="2021-05-05T19:30:49.726Z" level=debug msg="Check the workflow existence"
time="2021-05-05T19:30:50.504Z" level=debug msg="Syncing all CronWorkflows"
time="2021-05-05T19:30:54.541Z" level=info msg="Get leases 200"
time="2021-05-05T19:30:54.599Z" level=info msg="Update leases 200"
time="2021-05-05T19:30:59.619Z" level=info msg="Get leases 200"
time="2021-05-05T19:30:59.661Z" level=info msg="Update leases 200"

It appears that upon encountering an error during the execution of the onExit template, the operator() function simply exits and never actually marks the workflow as an error.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec alexec added this to the v3.0 milestone May 5, 2021
@alexec alexec self-assigned this May 5, 2021
@alexec
Copy link
Contributor

alexec commented May 5, 2021

@jesse what's your take on this? I think the workflow should be failed, but want to double-check.

alexec added a commit to alexec/argo-workflows that referenced this issue May 5, 2021
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@terrytangyuan
Copy link
Member

Think you tagged the wrong Jesse. Here it is @jessesuen

My two cents: +1 to fail the workflow in this case. An analogue to this is the finally block in Python which would raise exception as well if there's any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants