Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestTemplateLevelHooksWaitForTriggeredHook fail sometimes (flakey test) #12844

Closed
4 tasks done
shuangkun opened this issue Mar 26, 2024 · 1 comment · Fixed by #12855
Closed
4 tasks done

TestTemplateLevelHooksWaitForTriggeredHook fail sometimes (flakey test) #12844

shuangkun opened this issue Mar 26, 2024 · 1 comment · Fixed by #12855
Assignees
Labels

Comments

@shuangkun
Copy link
Member

shuangkun commented Mar 26, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

hook pod didn't created.

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: example-steps
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: job
            template: argosay
            hooks:
              running:
                expression: steps['job'].status == "Running"
                template: argosay-sleep-2seconds
              failed:
                expression: steps['job'].status == "Failed"
                template: argosay-sleep-2seconds

    - name: argosay
      container:
        image: argoproj/argosay:v2
        command: ["/bin/sh", "-c"]
        args: ["/bin/sleep 1; /argosay"]
    - name: argosay-sleep-2seconds
      container:
        image: argoproj/argosay:v2
        command: ["/bin/sh", "-c"]
        args: ["/bin/sleep 2; /argosay"]

Failure log:

✔ example-stepsjkrkr   Workflow  7s      
 └ ✔ example-stepsjkrkr Steps     7s      
 └ ✔ job                Pod       4s      
 └ ✔ [0]                StepGroup 7s      

Condition "to be Succeeded" met after 7s
Checking expectation example-stepsjkrkr
example-stepsjkrkr : Succeeded 
    hooks_test.go:506: 
        	Error Trace:	/home/runner/work/argo-workflows/argo-workflows/test/e2e/hooks_test.go:506
        	            				/home/runner/work/argo-workflows/argo-workflows/test/e2e/fixtures/then.go:69
        	            				/home/runner/work/argo-workflows/argo-workflows/test/e2e/fixtures/then.go:44
        	            				/home/runner/work/argo-workflows/argo-workflows/test/e2e/hooks_test.go:504
        	Error:      	Not equal: 
        	            	expected: "1/1"
        	            	actual  : "2/2"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-(v1alpha1.Progress) (len=3) "1/1"
        	            	+(v1alpha1.Progress) (len=3) "2/2"
        	            	 
        	Test:       	TestHooksSuite/TestTemplateLevelHooksWaitForTriggeredHook
Checking expectation example-stepsjkrkr
example-stepsjkrkr : Succeeded 
Did not find node
    then.go:111: Did not find expected node

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@shuangkun shuangkun changed the title TestTemplateLevelHooksWaitForTriggeredHook failed TestTemplateLevelHooksWaitForTriggeredHook fail sometimes (flakey test) Mar 26, 2024
@shuangkun
Copy link
Member Author

shuangkun commented Mar 26, 2024

no create hook pod log.

2024-03-26T07:06:04.8770095Z controller: time="2024-03-26T07:05:56.115Z" level=warning msg="Non-transient error: pods \"lifecycle-hook-tmpl-level-h84lb-1340600742-agent\" not found"
2024-03-26T07:06:04.8770614Z controller: time="2024-03-26T07:05:56.119Z" level=info msg="Workflow already deleted 'argo/http-template-par-sqx7r'"
2024-03-26T07:06:04.8771635Z controller: time="2024-03-26T07:05:56.156Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/lifecycle-hook-tmpl-level-h84lb-argosay-617673753/labelPodCompleted
2024-03-26T07:06:04.8772309Z controller: time="2024-03-26T07:05:56.164Z" level=info msg="insignificant pod change" key=argo/lifecycle-hook-tmpl-level-h84lb-argosay-617673753
2024-03-26T07:06:04.8773000Z controller: time="2024-03-26T07:05:56.286Z" level=info msg="Processing workflow" Phase= ResourceVersion=2390 namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8773543Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="resolved artifact repository" artifactRepositoryRef="argo/#"
2024-03-26T07:06:04.8774174Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8774752Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8775452Z controller: time="2024-03-26T07:05:56.290Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8776233Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8776930Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="Steps node example-stepsjkrkr initialized Running" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8777712Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="StepGroup node example-stepsjkrkr-3408461935 initialized Running" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8778406Z controller: time="2024-03-26T07:05:56.290Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8779177Z controller: time="2024-03-26T07:05:56.290Z" level=info msg="Pod node example-stepsjkrkr-1690630010 initialized Pending" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8780024Z controller: time="2024-03-26T07:05:56.297Z" level=info msg="Created pod: example-stepsjkrkr[0].job (example-stepsjkrkr-argosay-1690630010)" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8780845Z controller: time="2024-03-26T07:05:56.298Z" level=info msg="Workflow step group node example-stepsjkrkr-3408461935 not yet completed" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8781605Z controller: time="2024-03-26T07:05:56.298Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8782152Z controller: time="2024-03-26T07:05:56.298Z" level=info msg=reconcileAgentPod namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8782613Z controller: time="2024-03-26T07:05:56.298Z" level=info msg="Workflow to be dehydrated" Workflow Size=1784
2024-03-26T07:06:04.8783369Z controller: time="2024-03-26T07:05:56.305Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2394 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8783984Z controller: time="2024-03-26T07:05:56.305Z" level=info msg="woc.controller.wfQueue.Len() is 0" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8784176Z port-forward: Handling connection for 9000
2024-03-26T07:06:04.8785008Z controller: time="2024-03-26T07:05:56.997Z" level=info msg="cleaning up pod" action=killContainers key=argo/lifecycle-hook-tmpl-level-h84lb-argosay-617673753/killContainers
2024-03-26T07:06:04.8785728Z controller: time="2024-03-26T07:05:57.286Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2394 namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8786363Z controller: time="2024-03-26T07:05:57.287Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8787615Z controller: time="2024-03-26T07:05:57.287Z" level=info msg="node changed" namespace=argo new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=example-stepsjkrkr-1690630010 old.message= old.phase=Pending old.progress=0/1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8788559Z controller: time="2024-03-26T07:05:57.287Z" level=info msg="Workflow step group node example-stepsjkrkr-3408461935 not yet completed" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8789125Z controller: time="2024-03-26T07:05:57.287Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8789661Z controller: time="2024-03-26T07:05:57.287Z" level=info msg=reconcileAgentPod namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8790123Z controller: time="2024-03-26T07:05:57.287Z" level=info msg="Workflow to be dehydrated" Workflow Size=2091
2024-03-26T07:06:04.8790878Z controller: time="2024-03-26T07:05:57.293Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2403 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8791500Z controller: time="2024-03-26T07:05:57.293Z" level=info msg="woc.controller.wfQueue.Len() is 0" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8792219Z controller: time="2024-03-26T07:05:59.338Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2403 namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8792860Z controller: time="2024-03-26T07:05:59.339Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8793606Z controller: time="2024-03-26T07:05:59.339Z" level=info msg="task-result changed" namespace=argo nodeID=example-stepsjkrkr-1690630010 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8794310Z controller: time="2024-03-26T07:05:59.339Z" level=info msg="node unchanged" namespace=argo nodeID=example-stepsjkrkr-1690630010 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8795136Z controller: time="2024-03-26T07:05:59.340Z" level=info msg="Workflow step group node example-stepsjkrkr-3408461935 not yet completed" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8795704Z controller: time="2024-03-26T07:05:59.340Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8796240Z controller: time="2024-03-26T07:05:59.340Z" level=info msg=reconcileAgentPod namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8796807Z controller: time="2024-03-26T07:05:59.340Z" level=info msg="Workflow to be dehydrated" Workflow Size=2127
2024-03-26T07:06:04.8797559Z controller: time="2024-03-26T07:05:59.350Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2411 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8798186Z controller: time="2024-03-26T07:05:59.350Z" level=info msg="woc.controller.wfQueue.Len() is 0" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8798369Z port-forward: Handling connection for 9000
2024-03-26T07:06:04.8799093Z controller: time="2024-03-26T07:06:01.958Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2411 namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8799716Z controller: time="2024-03-26T07:06:01.959Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8800457Z controller: time="2024-03-26T07:06:01.960Z" level=info msg="task-result changed" namespace=argo nodeID=example-stepsjkrkr-1690630010 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8801705Z controller: time="2024-03-26T07:06:01.961Z" level=info msg="node changed" namespace=argo new.message= new.phase=Pending new.progress=0/1 nodeID=example-stepsjkrkr-1690630010 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8802530Z controller: time="2024-03-26T07:06:01.966Z" level=info msg="Workflow step group node example-stepsjkrkr-3408461935 not yet completed" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8803218Z controller: time="2024-03-26T07:06:01.966Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8803757Z controller: time="2024-03-26T07:06:01.966Z" level=info msg=reconcileAgentPod namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8804217Z controller: time="2024-03-26T07:06:01.966Z" level=info msg="Workflow to be dehydrated" Workflow Size=2229
2024-03-26T07:06:04.8804974Z controller: time="2024-03-26T07:06:01.975Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2417 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8805592Z controller: time="2024-03-26T07:06:01.975Z" level=info msg="woc.controller.wfQueue.Len() is 0" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8805997Z port-forward: Handling connection for 9000
2024-03-26T07:06:04.8806732Z controller: time="2024-03-26T07:06:03.832Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2417 namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8807367Z controller: time="2024-03-26T07:06:03.832Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8808103Z controller: time="2024-03-26T07:06:03.832Z" level=info msg="task-result changed" namespace=argo nodeID=example-stepsjkrkr-1690630010 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8809421Z controller: time="2024-03-26T07:06:03.832Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=example-stepsjkrkr-1690630010 old.message= old.phase=Pending old.progress=0/1 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8810175Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Step group node example-stepsjkrkr-3408461935 successful" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8810924Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="node example-stepsjkrkr-3408461935 phase Running -> Succeeded" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8811796Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="node example-stepsjkrkr-3408461935 finished: 2024-03-26 07:06:03.83341557 +0000 UTC" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8812661Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Outbound nodes of example-stepsjkrkr-1690630010 is [example-stepsjkrkr-1690630010]" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8813662Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Outbound nodes of example-stepsjkrkr is [example-stepsjkrkr-1690630010]" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8814370Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="node example-stepsjkrkr phase Running -> Succeeded" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8815170Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="node example-stepsjkrkr finished: 2024-03-26 07:06:03.833455654 +0000 UTC" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8815732Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8816266Z controller: time="2024-03-26T07:06:03.833Z" level=info msg=reconcileAgentPod namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8816881Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8817476Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Marking workflow completed" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8817935Z controller: time="2024-03-26T07:06:03.833Z" level=info msg="Workflow to be dehydrated" Workflow Size=2413
2024-03-26T07:06:04.8818615Z controller: time="2024-03-26T07:06:03.838Z" level=info msg="cleaning up pod" action=deletePod key=argo/example-stepsjkrkr-1340600742-agent/deletePod
2024-03-26T07:06:04.8819517Z controller: time="2024-03-26T07:06:03.839Z" level=info msg="Workflow update successful" namespace=argo phase=Succeeded resourceVersion=2422 workflow=example-stepsjkrkr
2024-03-26T07:06:04.8820145Z controller: time="2024-03-26T07:06:03.839Z" level=info msg="woc.controller.wfQueue.Len() is 0" namespace=argo workflow=example-stepsjkrkr
2024-03-26T07:06:04.8820833Z controller: time="2024-03-26T07:06:03.840Z" level=info msg="Queueing Succeeded workflow argo/cwft-wf-b2r5b for delete due to max rention(10 workflows)"
2024-03-26T07:06:04.8821354Z controller: time="2024-03-26T07:06:03.840Z" level=info msg="Deleting garbage collected workflow 'argo/cwft-wf-b2r5b'"
2024-03-26T07:06:04.8822400Z controller: time="2024-03-26T07:06:03.841Z" level=warning msg="failed to clean-up pod" action=deletePod error="pods \"example-stepsjkrkr-1340600742-agent\" not found" key=argo/example-stepsjkrkr-1340600742-agent/deletePod
2024-03-26T07:06:04.8823033Z controller: time="2024-03-26T07:06:03.841Z" level=warning msg="Non-transient error: pods \"example-stepsjkrkr-1340600742-agent\" not found"
2024-03-26T07:06:04.8823507Z controller: time="2024-03-26T07:06:03.844Z" level=info msg="Workflow already deleted 'argo/cwft-wf-b2r5b'"
2024-03-26T07:06:04.8824277Z controller: time="2024-03-26T07:06:03.863Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/example-stepsjkrkr-argosay-1690630010/labelPodCompleted

@agilgur5 agilgur5 added type/tech-debt area/hooks area/build Build or GithubAction/CI issues labels Mar 26, 2024
@shuangkun shuangkun self-assigned this Mar 28, 2024
shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Mar 28, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Mar 28, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
agilgur5 pushed a commit that referenced this issue Mar 30, 2024
…12855)

Signed-off-by: shuangkun <tsk2013uestc@163.com>
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 19, 2024
agilgur5 pushed a commit that referenced this issue Apr 19, 2024
…12855)

Signed-off-by: shuangkun <tsk2013uestc@163.com>
(cherry picked from commit daa5f7e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants