Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline task in DAG makes workflow fail: failed to get a template #12633

Closed
3 of 4 tasks
partomatl opened this issue Feb 6, 2024 · 1 comment · Fixed by #12683
Closed
3 of 4 tasks

inline task in DAG makes workflow fail: failed to get a template #12633

partomatl opened this issue Feb 6, 2024 · 1 comment · Fixed by #12683
Assignees
Labels
area/controller Controller issues, panics P3 Low priority type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@partomatl
Copy link

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

With v3.4.6, the attached workflow works reliably.
From v3.4.7 and upwards, the workflow randomly fails with error failed to get a template :

screen

I don't know if this PR that was merged for v3.4.7 is related : #10786

Version

v3.5.4

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: wftmplcontainer
spec:
  templates:
    - name: print
      inputs:
        parameters:
          - name: message
      container:
        image: alpine
        command: [echo, "{{inputs.parameters.message}}"]
---
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: wftmpl
spec:
  templates:
    - name: dagtmpl
      dag:
        tasks:
          - name: message
            inline:
              script:
                image: python:alpine
                command: [python]
                source: |
                  print("hello world")

          - name: echo
            arguments:
              parameters:
                - name: message
                  value: "{{tasks.message.outputs.result}}"
            dependencies: [message]
            templateRef:
              name: wftmplcontainer
              template: print
---
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: cwf
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 30
  failedJobsHistoryLimit: 30
  workflowSpec:
    entrypoint: default
    templates:
      - name: default
        steps:
          - - name: first
              templateRef:
                name: wftmpl
                template: dagtmpl
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  workflowDefaults: |
    spec:
      retryStrategy:
        retryPolicy: OnTransientError
        limit: 1

Logs from the workflow controller

time="2024-02-06T16:46:00.007Z" level=info msg="Running cwf" namespace=argo workflow=cwf
time="2024-02-06T16:46:00.024Z" level=info msg="Processing workflow" Phase= ResourceVersion=3227 namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.028Z" level=info msg="Processing argo/cwf" cronWorkflow=argo/cwf
time="2024-02-06T16:46:00.029Z" level=info msg="CronWorkflow argo/cwf added" cronWorkflow=argo/cwf
time="2024-02-06T16:46:00.033Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
time="2024-02-06T16:46:00.033Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
time="2024-02-06T16:46:00.033Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="Retry node cwf-1707237960 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="Steps node cwf-1707237960-1248539457 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="StepGroup node cwf-1707237960-1146551653 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="Retry node cwf-1707237960-755145637 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=info msg="DAG node cwf-1707237960-4022235908 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:00.033Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2023533853, taskName message"
time="2024-02-06T16:46:00.033Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2023533853, taskName message"
time="2024-02-06T16:46:00.033Z" level=info msg="All of node cwf-1707237960(0)[0].first(0).message dependencies [] completed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.033Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.034Z" level=info msg="Retry node cwf-1707237960-2023533853 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.034Z" level=info msg="Pod node cwf-1707237960-226052908 initialized Pending" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.041Z" level=info msg="Created pod: cwf-1707237960(0)[0].first(0).message(0) (cwf-1707237960--226052908)" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.041Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:00.041Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:00.041Z" level=info msg="Workflow step group node cwf-1707237960-1146551653 not yet completed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.041Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.041Z" level=info msg=reconcileAgentPod namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:00.041Z" level=info msg="Workflow to be dehydrated" Workflow Size=3054
time="2024-02-06T16:46:00.046Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=3232 workflow=cwf-1707237960
time="2024-02-06T16:46:10.044Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=3232 namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.046Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=cwf-1707237960
time="2024-02-06T16:46:10.046Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/release-3.5/workflow-rbac/" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.046Z" level=warning msg="workflow uses legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/release-3.5/workflow-rbac/" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.046Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=cwf-1707237960-226052908 old.message= old.phase=Pending old.progress=0/1 workflow=cwf-1707237960
time="2024-02-06T16:46:10.047Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:10.047Z" level=info msg="node cwf-1707237960-2023533853 phase Running -> Succeeded" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.047Z" level=info msg="node cwf-1707237960-2023533853 finished: 2024-02-06 16:46:10.0475168 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.047Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:10.047Z" level=warning msg="was unable to obtain the node for cwf-1707237960-2455691263, taskName echo"
time="2024-02-06T16:46:10.047Z" level=info msg="All of node cwf-1707237960(0)[0].first(0).echo dependencies [message] completed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.047Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=info msg="Retry node cwf-1707237960-2455691263 initialized Running" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=info msg="Pod node cwf-1707237960-1543641474 initialized Pending" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=warning msg="Non-transient error: failed to get a template"
time="2024-02-06T16:46:10.048Z" level=error msg="Mark error node" error="failed to get a template" namespace=argo nodeName="cwf-1707237960(0)[0].first(0).echo(0)" workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=info msg="node cwf-1707237960-1543641474 phase Pending -> Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=info msg="node cwf-1707237960-1543641474 message: failed to get a template" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.048Z" level=info msg="node cwf-1707237960-1543641474 finished: 2024-02-06 16:46:10.048943425 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=warning msg="Non-transient error: failed to get a template"
time="2024-02-06T16:46:10.049Z" level=info msg="Retry Policy: OnTransientError (onFailed: false, onError false)" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Node not set to be retried after status: Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-2455691263 phase Running -> Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-2455691263 message: failed to get a template" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-2455691263 finished: 2024-02-06 16:46:10.049136425 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Outbound nodes of cwf-1707237960-4022235908 set to [cwf-1707237960-1543641474]" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-4022235908 phase Running -> Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-4022235908 finished: 2024-02-06 16:46:10.049220716 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=warning msg="Non-transient error: "
time="2024-02-06T16:46:10.049Z" level=info msg="Retry Policy: OnTransientError (onFailed: false, onError false)" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Node not set to be retried after status: Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-755145637 phase Running -> Error" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-755145637 finished: 2024-02-06 16:46:10.049548633 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Step group node cwf-1707237960-1146551653 deemed failed: child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1146551653 phase Running -> Failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1146551653 message: child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1146551653 finished: 2024-02-06 16:46:10.049598133 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="step group cwf-1707237960-1146551653 was unsuccessful: child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Outbound nodes of cwf-1707237960-755145637 is [cwf-1707237960-4022235908]" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Outbound nodes of cwf-1707237960-1248539457 is [cwf-1707237960-4022235908]" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1248539457 phase Running -> Failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1248539457 message: child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960-1248539457 finished: 2024-02-06 16:46:10.0496928 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=warning msg="Non-transient error: child 'cwf-1707237960-755145637' failed"
time="2024-02-06T16:46:10.049Z" level=info msg="Retry Policy: OnTransientError (onFailed: false, onError false)" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Node not set to be retried after status: Failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960 phase Running -> Failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960 message: child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="node cwf-1707237960 finished: 2024-02-06 16:46:10.04988105 +0000 UTC" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg=reconcileAgentPod namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.049Z" level=info msg="Updated phase Running -> Failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.050Z" level=info msg="Updated message  -> child 'cwf-1707237960-755145637' failed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.050Z" level=info msg="Marking workflow completed" namespace=argo workflow=cwf-1707237960
time="2024-02-06T16:46:10.050Z" level=info msg="Workflow to be dehydrated" Workflow Size=4374
time="2024-02-06T16:46:10.055Z" level=info msg="cleaning up pod" action=deletePod key=argo/cwf-1707237960-1340600742-agent/deletePod
time="2024-02-06T16:46:10.060Z" level=info msg="Workflow update successful" namespace=argo phase=Failed resourceVersion=3270 workflow=cwf-1707237960
time="2024-02-06T16:46:10.068Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/cwf-1707237960--226052908/labelPodCompleted
time="2024-02-06T16:46:11.908Z" level=info msg="Processing argo/cwf" cronWorkflow=argo/cwf
time="2024-02-06T16:46:11.910Z" level=info msg="CronWorkflow argo/cwf added" cronWorkflow=argo/cwf

Logs from in your workflow's wait container

N/A
@agilgur5 agilgur5 added type/regression Regression from previous behavior (a specific type of bug) P3 Low priority area/controller Controller issues, panics labels Feb 6, 2024
@shuangkun shuangkun self-assigned this Feb 8, 2024
@shuangkun
Copy link
Member

Reproduced the problem and find something wrong about the retry node status,will find the root cause soon.

shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Feb 20, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@agilgur5 agilgur5 changed the title inline task inside a DAG template makes the workflow fail with error failed to get a template inline task in DAG makes workflow fail: failed to get a template Feb 21, 2024
agilgur5 pushed a commit that referenced this issue Feb 22, 2024
…12683)

Signed-off-by: shuangkun <tsk2013uestc@163.com>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Feb 27, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Feb 28, 2024
…j#12633 (argoproj#12683)

Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: Isitha Subasinghe <isubasinghe@student.unimelb.edu.au>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue Mar 12, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 6, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 7, 2024
@agilgur5 agilgur5 added this to the v3.4.x patches milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics P3 Low priority type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants