Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

louisnow · 2022-01-23T18:44:20Z

Summary

What happened/what you expected to happen?

Retrying a workflow that uses a DAG and a withParam does not retry the failed task/step.

Screen.Recording.2022-01-23.at.11.57.48.PM.mov

A quick fix that seems to work is to set the retry limit to 0 but it causes other issues in the collected output.

      retryStrategy:
        limit: "0"

The output that is collected in the final step is duplicated and repeated based on the number of retries done.
From the recording, each output is repeated twice in the final array. I expect three outputs but I get six as it was retried twice.

Screen.Recording.2022-01-24.at.12.03.27.AM.mov

What version of Argo Workflows are you running?
3.2.4

Diagnostics

Either a workflow that reproduces the bug, or paste you whole workflow YAML, including status, something like:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: retry-barebones
spec:
  entrypoint: main
  templates:
    - name: main
      inputs: 
        parameters:
          - name: numbers  
            value: |
              ["one", "two", "three"]
      dag:
        tasks:
          - name: whalesay-before-process
            template: whalesay
            arguments:
              parameters:
                - name: message
                  value: "HELLO ALL"

          - name: process
            template: process
            withParam: "{{inputs.parameters.numbers}}"
            dependencies:
              - whalesay-before-process
            arguments:
              parameters:
                - name: message
                  value: "{{item}}"

          - name: whalesay-after-process
            template: whalesay
            dependencies:
              - process
            arguments:
              parameters:
                - name: message
                  value: "Root: {{tasks.process.outputs.parameters.output}}"
                  # On retry there are extra values in the output, duplicated multiple times based on retry count                

    - name: process
      # this fixes the behaviour
      retryStrategy:
        limit: "0"
      inputs:
        parameters:
        - name: message
      outputs:
        parameters:
          - name: output
            valueFrom:
              parameter: "{{tasks.step2.outputs.parameters.output}}"
      dag: 
        tasks:
          - name: step0
            template: whalesay
            arguments:
              parameters:
                - name: message
                  value: "{{inputs.parameters.message}}"

          - name: step1
            template: whalesay
            dependencies:
              - step0
            arguments:
              parameters:
                - name: message
                  value: "{{inputs.parameters.message}}"
                - name: always-pass
                  value: "false"

          - name: step2
            template: whalesay
            dependencies:
              - step1
            arguments:
              parameters:
                - name: message
                  value: "{{tasks.step1.outputs.parameters.output}}"

    - name: whalesay
      inputs:
        parameters:
        - name: message
        - name: always-pass
          value: "true"
      outputs:
        parameters:
          - name: output
            valueFrom:
              path: /tmp/output.txt
      script:
        command: [ python ]
        image: python:alpine
        imagePullPolicy: IfNotPresent
        source: |
          import random
          import os
          msg = '{{inputs.parameters.message}}'
          print(msg)

          if '{{inputs.parameters.always-pass}}' == 'true':
            a = -1
          else:
            a = random.randint(1,10)          

          if a > 0 and a <= 5: 
            raise Exception(msg)

          with open('/tmp/output.txt', 'w') as o:
            o.write('random' + msg + ' : ' +str(a))

What Kubernetes provider are you using?

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary
Emissary

# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

# If the workflow's pods have not been created, you can skip the rest of the diagnostics.

# The workflow's pods that are problematic:
kubectl get pod -o yaml -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

# Logs from in your workflow's wait container, something like:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec · 2022-01-25T22:04:44Z

I'll look into this.

alexec · 2022-01-26T19:34:12Z

e58859d

Signed-off-by: Alex Collins <alex_collins@intuit.com>

alexec · 2022-01-26T20:05:06Z

Signed-off-by: Alex Collins <alex_collins@intuit.com>

louisnow added type/bug triage labels Jan 23, 2022

sarabala1979 assigned alexec Jan 24, 2022

alexec removed the triage label Jan 26, 2022

alexec added a commit to alexec/argo-workflows that referenced this issue Jan 26, 2022

fix: Retry with DAG. Fixes argoproj#7617

f262743

Signed-off-by: Alex Collins <alex_collins@intuit.com>

alexec mentioned this issue Jan 26, 2022

fix: Retry with DAG. Fixes #7617 #7652

Merged

1 task

alexec closed this as completed in #7652 Jan 27, 2022

alexec added a commit that referenced this issue Jan 27, 2022

fix: Retry with DAG. Fixes #7617 (#7652)

6a97a61

Signed-off-by: Alex Collins <alex_collins@intuit.com>

alexec mentioned this issue Jan 27, 2022

v3.2.7 #7674

Closed

4 tasks

alexec added a commit that referenced this issue Jan 27, 2022

fix: Retry with DAG. Fixes #7617 (#7652)

3429b16

Signed-off-by: Alex Collins <alex_collins@intuit.com>

sarabala1979 mentioned this issue Mar 1, 2022

v3.2.9 Cherry-pick #8043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

louisnow commented Jan 23, 2022 •

edited

alexec commented Jan 25, 2022

alexec commented Jan 26, 2022

alexec commented Jan 26, 2022

Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

Comments

louisnow commented Jan 23, 2022 • edited

Summary

Diagnostics

alexec commented Jan 25, 2022

alexec commented Jan 26, 2022

alexec commented Jan 26, 2022

louisnow commented Jan 23, 2022 •

edited