Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow stuck in running if it contains non-leaf-task onExit-handler #3431

Closed
4 tasks done
arbielsk opened this issue Jul 9, 2020 · 4 comments · Fixed by #3435
Closed
4 tasks done

Workflow stuck in running if it contains non-leaf-task onExit-handler #3431

arbielsk opened this issue Jul 9, 2020 · 4 comments · Fixed by #3435
Assignees
Labels

Comments

@arbielsk
Copy link

arbielsk commented Jul 9, 2020

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
I created a dag where each task has an exit handler. In the example below, step-3 depends on step-2. After the fix of #3395 all exitHandlers are now being called as expected (thanks a lot to the team for this VERY quick fix). However the exitHandler of the non-leaf task never registers as finished and thus keeps the workflow in status running indefinitely.

What you expected to happen:
I expected the exitHandler to register as finished after its execution and the workflow to finish.

How to reproduce it (as minimally and precisely as possible):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: exit-handler-bug-example
spec:
  entrypoint: dag
  templates:
    - name: dag
      dag:
        tasks:
          - name: step-1
            onExit: exit-handler
            template: step-template
          - name: step-2
            onExit: exit-handler
            template: step-template
          - name: step-3
            dependencies: [step-2]
            onExit: exit-handler
            template: step-template
    - name: exit-handler
      dag:
        tasks:
          - name: exit-1
            template: sleep-10
    - name: sleep-10
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - "sleep 10 && echo exit-handler-step-{{pod.name}}"
    - name: step-template
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - "sleep 5 && echo step {{pod.name}}"

Anything else we need to know?:
It also seems like there is an issue with the execution of the exitHandler for non-leaf tasks if the exitHandler contains more than one task: #3432
These might be related?

Environment:

  • Argo version:
$ argo version
argo: v2.9.2
  BuildDate: 2020-07-08T23:55:01Z
  GitCommit: 65c2bd44e45c11e0a0b03adeef8d6800b72cd551
  GitTreeState: clean
  GitTag: v2.9.2
  GoVersion: go1.13.4
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: "2020-01-15T08:26:26Z"
  compiler: gc
  gitCommit: e7f962ba86f4ce7033828210ca3556393c377bcc
  gitTreeState: clean
  gitVersion: v1.16.6-beta.0
  goVersion: go1.13.5
  major: "1"
  minor: 16+
  platform: darwin/amd64
serverVersion:
  buildDate: "2020-01-15T08:18:29Z"
  compiler: gc
  gitCommit: e7f962ba86f4ce7033828210ca3556393c377bcc
  gitTreeState: clean
  gitVersion: v1.16.6-beta.0
  goVersion: go1.13.5
  major: "1"
  minor: 16+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
argo --loglevel DEBUG get <workflowname>
DEBU[0000] CLI version                                   version="{v2.9.2 2020-07-08T23:55:01Z 65c2bd44e45c11e0a0b03adeef8d6800b72cd551 v2.9.2 clean go1.13.4 gc darwin/amd64}"
DEBU[0000] Client options                                opts="{{ false false}  0x2277640 0xc00084b8b0}"
Name:                exit-handler-bug-example
Namespace:           argo
ServiceAccount:      default
Status:              Running
Created:             Thu Jul 09 11:03:14 +0200 (8 minutes ago)
Started:             Thu Jul 09 11:03:14 +0200 (8 minutes ago)
Duration:            8 minutes 58 seconds

STEP                         TEMPLATE       PODNAME                              DURATION  MESSAGE
 ● exit-handler-bug-example  dag                                                             
 ├-✔ step-1                  step-template  exit-handler-bug-example-3071691002  10s         
 ├-✔ step-2                  step-template  exit-handler-bug-example-3054913383  8s          
 ├-● step-2.onExit           exit-handler                                                    
 | └-✔ exit-1                sleep-10       exit-handler-bug-example-546893857   14s         
 ├-✔ step-3                  step-template  exit-handler-bug-example-3038135764  10s         
 ├-✔ step-1.onExit           exit-handler                                                    
 | ├-✔ exit-1                sleep-10       exit-handler-bug-example-2218368392  15s         
 | ├-✔ exit-2                sleep-10       exit-handler-bug-example-2268701249  13s         
 | └-✔ exit-3                sleep-10       exit-handler-bug-example-2251923630  13s         
 └-✔ step-3.onExit           exit-handler                                                    
   ├-✔ exit-1                sleep-10       exit-handler-bug-example-1275286746  13s         
   ├-✔ exit-2                sleep-10       exit-handler-bug-example-1258509127  13s         
   └-✔ exit-3                sleep-10       exit-handler-bug-example-1241731508  13s 
  • executor logs:
NOT APPLICABLE AS NO POD FAILS kubectl logs <failedpodname> -c init
NOT APPLICABLE AS NO POD FAILS kubectl logs <failedpodname> -c wait
  • workflow-controller logs:
time="2020-07-09T09:18:58Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="DAG node exit-handler-bug-example initialized Running" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="All of node exit-handler-bug-example.step-1 dependencies [] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Pod node exit-handler-bug-example-3071691002 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Created pod: exit-handler-bug-example.step-1 (exit-handler-bug-example-3071691002)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="All of node exit-handler-bug-example.step-2 dependencies [] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Pod node exit-handler-bug-example-3054913383 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Created pod: exit-handler-bug-example.step-2 (exit-handler-bug-example-3054913383)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464278 workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Updating node exit-handler-bug-example-3071691002 message: ContainerCreating"
time="2020-07-09T09:18:58Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464282 workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Updating node exit-handler-bug-example-3054913383 message: ContainerCreating"
time="2020-07-09T09:18:58Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464284 workflow=exit-handler-bug-example
time="2020-07-09T09:18:58Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:18:59Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:03Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:03Z" level=info msg="Updating node exit-handler-bug-example-3071691002 status Pending -> Running"
time="2020-07-09T09:19:03Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464309 workflow=exit-handler-bug-example
time="2020-07-09T09:19:03Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:04Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:04Z" level=info msg="Updating node exit-handler-bug-example-3054913383 status Pending -> Running"
time="2020-07-09T09:19:04Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464312 workflow=exit-handler-bug-example
time="2020-07-09T09:19:04Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Setting node exit-handler-bug-example-3071691002 outputs"
time="2020-07-09T09:19:08Z" level=info msg="Updating node exit-handler-bug-example-3071691002 status Running -> Succeeded"
time="2020-07-09T09:19:08Z" level=info msg="Setting node exit-handler-bug-example-3054913383 outputs"
time="2020-07-09T09:19:08Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="DAG node exit-handler-bug-example-1359570492 initialized Running" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="All of node step-1.onExit.exit-1 dependencies [] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Pod node exit-handler-bug-example-2218368392 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Created pod: step-1.onExit.exit-1 (exit-handler-bug-example-2218368392)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464328 workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3071691002 completed"
time="2020-07-09T09:19:08Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464331 workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Updating node exit-handler-bug-example-2218368392 message: ContainerCreating"
time="2020-07-09T09:19:08Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3071691002 completed"
time="2020-07-09T09:19:08Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464334 workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Updating node exit-handler-bug-example-3054913383 status Running -> Succeeded"
time="2020-07-09T09:19:08Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="DAG node exit-handler-bug-example-4086143951 initialized Running" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="All of node step-2.onExit.exit-1 dependencies [] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Pod node exit-handler-bug-example-546893857 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Created pod: step-2.onExit.exit-1 (exit-handler-bug-example-546893857)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="All of node exit-handler-bug-example.step-3 dependencies [step-2] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Pod node exit-handler-bug-example-3038135764 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:08Z" level=info msg="Created pod: exit-handler-bug-example.step-3 (exit-handler-bug-example-3038135764)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464341 workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3054913383 completed"
time="2020-07-09T09:19:09Z" level=info msg="Updating node exit-handler-bug-example-546893857 message: ContainerCreating"
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464346 workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Updating node exit-handler-bug-example-3038135764 message: ContainerCreating"
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3054913383 completed"
time="2020-07-09T09:19:09Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464347 workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:09Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Updating node exit-handler-bug-example-2218368392 status Pending -> Running"
time="2020-07-09T09:19:14Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464376 workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:14Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Updating node exit-handler-bug-example-3038135764 status Pending -> Running"
time="2020-07-09T09:19:16Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464385 workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:16Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Updating node exit-handler-bug-example-546893857 status Pending -> Running"
time="2020-07-09T09:19:17Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464388 workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:17Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Setting node exit-handler-bug-example-3038135764 outputs"
time="2020-07-09T09:19:20Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464399 workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:20Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Updating node exit-handler-bug-example-3038135764 status Running -> Succeeded"
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="DAG node exit-handler-bug-example-2933386262 initialized Running" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="All of node step-3.onExit.exit-1 dependencies [] completed" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Pod node exit-handler-bug-example-1275286746 initialized Pending" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Created pod: step-3.onExit.exit-1 (exit-handler-bug-example-1275286746)" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464405 workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3038135764 completed"
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464410 workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Labeled pod argo/exit-handler-bug-example-3038135764 completed"
time="2020-07-09T09:19:21Z" level=info msg="Updating node exit-handler-bug-example-1275286746 message: ContainerCreating"
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464411 workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:21Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Setting node exit-handler-bug-example-2218368392 outputs"
time="2020-07-09T09:19:22Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464418 workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:22Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Updating node exit-handler-bug-example-2218368392 status Running -> Succeeded"
time="2020-07-09T09:19:23Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Outbound nodes of exit-handler-bug-example-1359570492 set to [exit-handler-bug-example-2218368392]" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="node exit-handler-bug-example-1359570492 phase Running -> Succeeded" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="node exit-handler-bug-example-1359570492 finished: 2020-07-09 09:19:23.399892402 +0000 UTC" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Checking daemoned children of exit-handler-bug-example-1359570492" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464424 workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Labeled pod argo/exit-handler-bug-example-2218368392 completed"
time="2020-07-09T09:19:23Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:23Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Updating node exit-handler-bug-example-1275286746 status Pending -> Running"
time="2020-07-09T09:19:25Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464433 workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:25Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Setting node exit-handler-bug-example-546893857 outputs"
time="2020-07-09T09:19:27Z" level=info msg="Updating node exit-handler-bug-example-546893857 status Running -> Succeeded"
time="2020-07-09T09:19:27Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464441 workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:27Z" level=info msg="Labeled pod argo/exit-handler-bug-example-546893857 completed"
time="2020-07-09T09:19:35Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Setting node exit-handler-bug-example-1275286746 outputs"
time="2020-07-09T09:19:35Z" level=info msg="Updating node exit-handler-bug-example-1275286746 status Running -> Succeeded"
time="2020-07-09T09:19:35Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Outbound nodes of exit-handler-bug-example-2933386262 set to [exit-handler-bug-example-1275286746]" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="node exit-handler-bug-example-2933386262 phase Running -> Succeeded" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="node exit-handler-bug-example-2933386262 finished: 2020-07-09 09:19:35.70806752 +0000 UTC" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Checking daemoned children of exit-handler-bug-example-2933386262" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1464459 workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Processing workflow" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Labeled pod argo/exit-handler-bug-example-1275286746 completed"
time="2020-07-09T09:19:35Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:35Z" level=info msg="Running OnExit handler: exit-handler" namespace=argo workflow=exit-handler-bug-example
time="2020-07-09T09:19:43Z" level=info msg="Alloc=17068 TotalAlloc=12674488 Sys=347424 NumGC=250 Goroutines=165"

Logs
see above


Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@arbielsk arbielsk changed the title Workflow stuck in running if it contains non-leaf task onExit handler Workflow stuck in running if it contains non-leaf-task onExit-handler Jul 9, 2020
@simster7 simster7 self-assigned this Jul 9, 2020
@simster7
Copy link
Member

simster7 commented Jul 9, 2020

Interestingly enough, this only happens with the sleep part of the Workflow. When I solved the last issue I had taken the sleep off (to expedite testing) and the bug didn't come up:

Name:                exit-handler-bug-example
Namespace:           argo
ServiceAccount:      default
Status:              Succeeded
Conditions:
 Completed           True
Created:             Thu Jul 09 08:49:12 -0700 (35 seconds ago)
Started:             Thu Jul 09 08:49:12 -0700 (35 seconds ago)
Finished:            Thu Jul 09 08:49:47 -0700 (now)
Duration:            35 seconds
ResourcesDuration:   26s*(100Mi memory),48s*(1 cpu)

STEP                         TEMPLATE       PODNAME                              DURATION  MESSAGE
 ✔ exit-handler-bug-example  dag
 ├-✔ step-1                  step-template  exit-handler-bug-example-3071691002  19s
 ├-✔ step-2                  step-template  exit-handler-bug-example-3054913383  14s
 ├-✔ step-2.onExit           exit-handler
 | └-✔ exit-1                sleep-10       exit-handler-bug-example-546893857   6s
 ├-✔ step-3                  step-template  exit-handler-bug-example-3038135764  9s
 ├-✔ step-1.onExit           exit-handler
 | └-✔ exit-1                sleep-10       exit-handler-bug-example-2218368392  7s
 └-✔ step-3.onExit           exit-handler
   └-✔ exit-1                sleep-10       exit-handler-bug-example-1275286746  5s

Working on finding a fix right now

@simster7
Copy link
Member

simster7 commented Jul 9, 2020

This has something to do with the unique nested dag structure

@simster7
Copy link
Member

simster7 commented Jul 9, 2020

This one is certainly a Heisenbug haha :)

@simster7
Copy link
Member

simster7 commented Jul 9, 2020

Found the issue: if step-3 finishes before step-2.onExit, then the logic gets dead locked. Working on a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants