Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrying a workflow that uses a DAG template and a loop does not retry the failed task/step. #7617

Closed
louisnow opened this issue Jan 23, 2022 · 3 comments · Fixed by #7652
Closed
Assignees
Labels

Comments

@louisnow
Copy link

louisnow commented Jan 23, 2022

Summary

What happened/what you expected to happen?

Retrying a workflow that uses a DAG and a withParam does not retry the failed task/step.

Screen.Recording.2022-01-23.at.11.57.48.PM.mov

A quick fix that seems to work is to set the retry limit to 0 but it causes other issues in the collected output.

      retryStrategy:
        limit: "0"

The output that is collected in the final step is duplicated and repeated based on the number of retries done.
From the recording, each output is repeated twice in the final array. I expect three outputs but I get six as it was retried twice.

Screen.Recording.2022-01-24.at.12.03.27.AM.mov

What version of Argo Workflows are you running?
3.2.4

Diagnostics

Either a workflow that reproduces the bug, or paste you whole workflow YAML, including status, something like:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: retry-barebones
spec:
  entrypoint: main
  templates:
    - name: main
      inputs: 
        parameters:
          - name: numbers  
            value: |
              ["one", "two", "three"]
      dag:
        tasks:
          - name: whalesay-before-process
            template: whalesay
            arguments:
              parameters:
                - name: message
                  value: "HELLO ALL"

          - name: process
            template: process
            withParam: "{{inputs.parameters.numbers}}"
            dependencies:
              - whalesay-before-process
            arguments:
              parameters:
                - name: message
                  value: "{{item}}"

          - name: whalesay-after-process
            template: whalesay
            dependencies:
              - process
            arguments:
              parameters:
                - name: message
                  value: "Root: {{tasks.process.outputs.parameters.output}}"
                  # On retry there are extra values in the output, duplicated multiple times based on retry count                

    - name: process
      # this fixes the behaviour
      retryStrategy:
        limit: "0"
      inputs:
        parameters:
        - name: message
      outputs:
        parameters:
          - name: output
            valueFrom:
              parameter: "{{tasks.step2.outputs.parameters.output}}"
      dag: 
        tasks:
          - name: step0
            template: whalesay
            arguments:
              parameters:
                - name: message
                  value: "{{inputs.parameters.message}}"

          - name: step1
            template: whalesay
            dependencies:
              - step0
            arguments:
              parameters:
                - name: message
                  value: "{{inputs.parameters.message}}"
                - name: always-pass
                  value: "false"

          - name: step2
            template: whalesay
            dependencies:
              - step1
            arguments:
              parameters:
                - name: message
                  value: "{{tasks.step1.outputs.parameters.output}}"

    - name: whalesay
      inputs:
        parameters:
        - name: message
        - name: always-pass
          value: "true"
      outputs:
        parameters:
          - name: output
            valueFrom:
              path: /tmp/output.txt
      script:
        command: [ python ]
        image: python:alpine
        imagePullPolicy: IfNotPresent
        source: |
          import random
          import os
          msg = '{{inputs.parameters.message}}'
          print(msg)

          if '{{inputs.parameters.always-pass}}' == 'true':
            a = -1
          else:
            a = random.randint(1,10)          

          if a > 0 and a <= 5: 
            raise Exception(msg)

          with open('/tmp/output.txt', 'w') as o:
            o.write('random' + msg + ' : ' +str(a))

What Kubernetes provider are you using?

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary
Emissary

# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

# If the workflow's pods have not been created, you can skip the rest of the diagnostics.

# The workflow's pods that are problematic:
kubectl get pod -o yaml -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

# Logs from in your workflow's wait container, something like:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec
Copy link
Contributor

alexec commented Jan 25, 2022

I'll look into this.

@alexec
Copy link
Contributor

alexec commented Jan 26, 2022

e58859d

@alexec alexec removed the triage label Jan 26, 2022
alexec added a commit to alexec/argo-workflows that referenced this issue Jan 26, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec
Copy link
Contributor

alexec commented Jan 26, 2022

Screenshot 2022-01-26 at 11 44 00

alexec added a commit that referenced this issue Jan 27, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec mentioned this issue Jan 27, 2022
4 tasks
alexec added a commit that referenced this issue Jan 27, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants