Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG tasks that use withItems are always seen as 'Succeeded' even when skipped #3378

Closed
3 of 4 tasks
markterm opened this issue Jul 2, 2020 · 7 comments
Closed
3 of 4 tasks
Assignees
Labels
solution/workaround There's a workaround, might not be great, but exists
Milestone

Comments

@markterm
Copy link
Contributor

markterm commented Jul 2, 2020

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:

If you have a DAG task with a 'false' when condition (so it will be skipped), and it uses withItems to pass a map into it, then it will be seen by its dependencies as 'Succeeded'.

What you expected to happen:

DAG tasks with a false when condition to always be seen as 'Skipped'.

How to reproduce it (as minimally and precisely as possible):

I would expect this workflow to skip task B, but instead it runs it.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: test
spec:
  entrypoint: dag
  templates:
  - name: dag
    dag:
      tasks:
      - name: A
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: {item}
        withItems:
        - foo: "bar"
          bar: "foo"
        when: "1 == 2"
      - name: B
        depends: "A.Succeeded"
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "B"


  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"

Anything else we need to know?:

Environment:

  • Argo version:
argo: 1befee23
  BuildDate: 2020-05-29T16:16:47Z
  GitCommit: 1befee23b39e5b1be1dd1f814661420ed4a5ead0
  GitTreeState: clean
  GitTag: 1befee23
  GoVersion: go1.14
  Compiler: gc
  Platform: darwin/amd64
argo-server: 41ea9222
  BuildDate: 2020-06-12T09:45:57Z
  GitCommit: 41ea92224128d676844a5dec6c1e5a6936fe155e
  GitTreeState: clean
  GitTag: 41ea9222
  GoVersion: go1.14
  Compiler: gc
  Platform: linux/amd64
  • Kubernetes version :
clientVersion:
  buildDate: "2019-08-05T16:54:35Z"
  compiler: gc
  gitCommit: f6278300bebbb750328ac16ee6dd3aa7d3549568
  gitTreeState: clean
  gitVersion: v1.15.2
  goVersion: go1.12.7
  major: "1"
  minor: "15"
  platform: darwin/amd64
serverVersion:
  buildDate: "2020-05-05T18:38:03Z"
  compiler: gc
  gitCommit: 8ce551a0c972a9f113e326df0537aa3f1798d93e
  gitTreeState: clean
  gitVersion: v1.15.11-gke.15
  goVersion: go1.12.17b4
  major: "1"
  minor: 15+
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
DEBU[0000] CLI version                                   version="{1befee23 2020-05-29T16:16:47Z 1befee23b39e5b1be1dd1f814661420ed4a5ead0 1befee23 clean go1.14 gc darwin/amd64}"
DEBU[0000] Client options                                opts="{{argo-server.staging.svc.cluster.local:2746 false false}  0x2183620 0xc000151ea0}"
Name:                test
Namespace:           staging
ServiceAccount:      default
Status:              Succeeded
Conditions:
 Completed           True
Created:             Thu Jul 02 15:11:14 +0100 (6 minutes ago)
Started:             Thu Jul 02 15:11:14 +0100 (6 minutes ago)
Finished:            Thu Jul 02 15:11:30 +0100 (6 minutes ago)
Duration:            16 seconds
ResourcesDuration:   11s*(1 cpu),7s*(100Mi memory)

STEP                       TEMPLATE  PODNAME          DURATION  MESSAGE
 ✔ test                    dag
 ├-○ A(0:bar:foo,foo:bar)  whalesay                             when '1 == 2' evaluated false
 └-✔ B                     whalesay  test-3125793321  13s

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@alexec
Copy link
Contributor

alexec commented Jul 2, 2020

@simster7 could you investigate in case this is due to DAG changes?

@markterm
Copy link
Contributor Author

markterm commented Jul 3, 2020

btw, here is the yaml of the executed workflow.

A(0:bar:foo,foo:bar) is set to Skipped but its parent TaskGroup A is set to Succeeded.

metadata:
  creationTimestamp: "2020-07-02T14:11:14Z"
  generation: 7
  labels:
    workflows.argoproj.io/completed: "true"
    workflows.argoproj.io/phase: Succeeded
  name: test
  namespace: staging
  resourceVersion: "628340878"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/staging/workflows/test
  uid: 341472b8-f12b-4b14-a4f6-6f0f669765b7
spec:
  arguments: {}
  entrypoint: dag
  templates:
  - arguments: {}
    dag:
      tasks:
      - arguments:
          parameters:
          - name: message
            value: map[item:<nil>]
        name: A
        template: whalesay
        when: 1 == 2
        withItems:
        - bar: foo
          foo: bar
      - arguments:
          parameters:
          - name: message
            value: B
        depends: A.Succeeded
        name: B
        template: whalesay
    inputs: {}
    metadata: {}
    name: dag
    outputs: {}
  - arguments: {}
    container:
      args:
      - '{{inputs.parameters.message}}'
      command:
      - cowsay
      image: docker/whalesay:latest
      name: ""
      resources: {}
    inputs:
      parameters:
      - name: message
    metadata: {}
    name: whalesay
    outputs: {}
status:
  conditions:
  - status: "True"
    type: Completed
  finishedAt: "2020-07-02T14:11:30Z"
  nodes:
    test:
      children:
      - test-3075460464
      displayName: test
      finishedAt: "2020-07-02T14:11:30Z"
      id: test
      name: test
      outboundNodes:
      - test-3125793321
      phase: Succeeded
      startedAt: "2020-07-02T14:11:14Z"
      templateName: dag
      templateScope: local/test
      type: DAG
    test-1448801901:
      boundaryID: test
      children:
      - test-3125793321
      displayName: A(0:bar:foo,foo:bar)
      finishedAt: "2020-07-02T14:11:14Z"
      id: test-1448801901
      message: when '1 == 2' evaluated false
      name: test.A(0:bar:foo,foo:bar)
      phase: Skipped
      startedAt: "2020-07-02T14:11:14Z"
      templateName: whalesay
      templateScope: local/test
      type: Skipped
    test-3075460464:
      boundaryID: test
      children:
      - test-1448801901
      displayName: A
      finishedAt: "2020-07-02T14:11:14Z"
      id: test-3075460464
      name: test.A
      phase: Succeeded
      startedAt: "2020-07-02T14:11:14Z"
      templateName: whalesay
      templateScope: local/test
      type: TaskGroup
    test-3125793321:
      boundaryID: test
      displayName: B
      finishedAt: "2020-07-02T14:11:28Z"
      hostNodeName: gke-eclipse-beta-main-pool-4722b1a3-rhbm
      id: test-3125793321
      inputs:
        parameters:
        - name: message
          value: B
      name: test.B
      outputs:
        exitCode: "0"
      phase: Succeeded
      resourcesDuration:
        cpu: 11
        memory: 7
      startedAt: "2020-07-02T14:11:15Z"
      templateName: whalesay
      templateScope: local/test
      type: Pod
  phase: Succeeded
  resourcesDuration:
    cpu: 11
    memory: 7
  startedAt: "2020-07-02T14:11:14Z"

@simster7 simster7 self-assigned this Jul 6, 2020
@alexec alexec added this to the v2.9 milestone Jul 7, 2020
@simster7
Copy link
Member

simster7 commented Jul 7, 2020

This is actually the current expected behavior. In the example you posted, task A is actually a TaskGroup, not an individual task. A TaskGroup is a node responsible for spawning child nodes when a task has withItems or similar set. The when field of the TaskGroup is not actually enforced to the task group itself, but instead it is enforced in each child node that it spawns. This makes it possible to have Workflows such as this (note that the when clause actually depends on the {{item}}):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: test
spec:
  entrypoint: dag
  templates:
  - name: dag
    dag:
      tasks:
      - name: A
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "{{item}}"
        withItems:
        - "bar"
        - "foo"
        when: "foo == {{item}}"    # note that the `when` clause actually depends on the `{{item}}`
      - name: B
        depends: "A.Succeeded"
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "B"

  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

In this workflow, only certain on the children spawned by withItems get skipped by the when condition as opposed to the TaskGroup node as a whole:

Name:                test
Namespace:           argo
ServiceAccount:      default
Status:              Succeeded
Conditions:
 Completed           True
Created:             Tue Jul 07 09:37:58 -0700 (12 seconds ago)
Started:             Tue Jul 07 09:37:58 -0700 (12 seconds ago)
Finished:            Tue Jul 07 09:38:10 -0700 (now)
Duration:            12 seconds
ResourcesDuration:   3s*(100Mi memory),6s*(1 cpu)

STEP           TEMPLATE  PODNAME          DURATION  MESSAGE
 ✔ test        dag
 ├-○ A(0:bar)  whalesay                             when 'foo == bar' evaluated false
 ├-✔ A(1:foo)  whalesay  test-4041725116  3s
 └-✔ B         whalesay  test-3125793321  5s

This being said, when you use A in you depends statement (depends: A.Succeeded) you're actually referring to the TaskGroup node. This TaskGroup node will always succeed if it its requisite children do not fail or error, regardless if one (or all) of them are skipped.

Possible things to consider to solve this:

  • enhance the syntax to include aggregator functions (such as depends: any(A.Succeeded) or depends: all(A.Succeeded))
  • enforce the when clause at the TaskGroup level (probably undesirable as this would be a breaking changes and would not longer allow the functionality I described above)
  • allow an option for a TaskGroup to be "strict": i.e., only succeed if all of its children are not Skipped

I will bring this up and discuss it with the team and report back.

@simster7
Copy link
Member

simster7 commented Jul 7, 2020

could you investigate in case this is due to DAG changes?

This is unrelated to the DAG changes.

@alexec
Copy link
Contributor

alexec commented Jul 7, 2020

label as invalid?

@simster7
Copy link
Member

simster7 commented Jul 7, 2020

Hi @mark9white, I just spoke with the team. We found a workaround for your current use case: simply rest the template into a deeper call and leave the when condition in the parent template:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: test
spec:
  entrypoint: dag
  templates:
  - name: dag
    dag:
      tasks:
      - name: A
        template: call-loop
        when: "1 == 1"
      - name: B
        depends: "A.Succeeded"
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "B"

  - name: call-loop
    steps:
      - - name: A
          template: whalesay
          arguments:
            parameters:
            - name: message
              value: "{{item}}"
          withItems:
          - foo: "bar"
            bar: "foo"

  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

With regards to being able to select for any or all I opened a separate issue to track this: #3405

@simster7 simster7 added solution/workaround There's a workaround, might not be great, but exists and removed type/bug labels Jul 7, 2020
@markterm
Copy link
Contributor Author

markterm commented Jul 8, 2020

Thanks @simster7 - this behaviour makes sense given that the item can be referred to in the when condition. #3405 would be a great long-term solution to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solution/workaround There's a workaround, might not be great, but exists
Projects
None yet
Development

No branches or pull requests

3 participants