Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG gets stuck when using retry #3096

Closed
4 tasks done
jyotishp opened this issue May 23, 2020 · 1 comment · Fixed by #3035
Closed
4 tasks done

DAG gets stuck when using retry #3096

jyotishp opened this issue May 23, 2020 · 1 comment · Fixed by #3035
Labels

Comments

@jyotishp
Copy link

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
The DAG gets stuck after the retry on error task fails.

What you expected to happen:
The DAG step is supposed to fail and the workflow should stop or trigger the exit handler.

How to reproduce it (as minimally and precisely as possible):

argo submit https://gist.github.com/jyotishp/d1bdd09bd859454aed9dd721536cd54e/raw/5d56d4f4b22344bc786be75e802af8535a537619/argo-wf.yaml

Wait for the pod running task "B" to fail. The DAG step never stops. It keeps running.

Anything else we need to know?:

Environment:

  • Argo version:
$ argo version
argo: v2.8.0
  BuildDate: 2020-05-11T22:55:16Z
  GitCommit: 8f696174746ed01b9bf1941ad03da62d312df641
  GitTreeState: clean
  GitTag: v2.8.0
  GoVersion: go1.13.4
  Compiler: gc
  Platform: linux/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: "2020-04-23T22:11:11Z"
  compiler: gc
  gitCommit: 52c56ce7a8272c798dbc29846288d7cd9fbae032
  gitTreeState: archive
  gitVersion: v1.18.2
  goVersion: go1.14.2
  major: "1"
  minor: "18"
  platform: linux/amd64
serverVersion:
  buildDate: "2020-04-06T16:33:17Z"
  compiler: gc
  gitCommit: 34a615f32e9a0c9e97cdb9f749adb392758349a6
  gitTreeState: clean
  gitVersion: v1.14.10-gke.36
  goVersion: go1.12.12b4
  major: "1"
  minor: 14+
  platform: linux/amd64

Other debugging information (if applicable):

  • Workflow result:
argo --loglevel DEBUG get <workflowname>
Name:                wf-2mb6x
Namespace:           default
ServiceAccount:      default
Status:              Running
Created:             Sun May 24 02:07:12 +0530 (4 minutes ago)
Started:             Sun May 24 02:07:12 +0530 (4 minutes ago)
Duration:            4 minutes 20 seconds

STEP            TEMPLATE   PODNAME              DURATION  MESSAGE
 ● wf-2mb6x     run-steps
 └---● run-dag  run-dag
     └-✖ B(0)   B          wf-2mb6x-2328328885  1m        failed with exit code 2

Logs


Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@jyotishp
Copy link
Author

I think this should fix this issue, https://github.com/argoproj/argo/pull/3100/files

I'm not sure about how I need to write the tests for this one (Should I create a new workflow YAML in test/e2e/testdata and use that YAML file to check this?). Can someone help me with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant