Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow PodDisruptionBudget support is broken in 3.4.7 #10942

Closed
2 of 3 tasks
kulinskyvs opened this issue Apr 18, 2023 · 8 comments · Fixed by #10944
Closed
2 of 3 tasks

Workflow PodDisruptionBudget support is broken in 3.4.7 #10942

kulinskyvs opened this issue Apr 18, 2023 · 8 comments · Fixed by #10944
Assignees
Labels
area/controller Controller issues, panics type/bug

Comments

@kulinskyvs
Copy link

kulinskyvs commented Apr 18, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Looks like "podDisruptionBudget" support has been broken in Argo Workflows (tested in version 3.4.7)

Steps to reproduce (using the workflow definition listed below)

  1. Check no PDBs exist before test
$ kubectl get pdb
No resources found in usr-p-vku namespace.
  1. Create the workflow
$ kubectl apply -f  pdb-issue.yaml
workflow.argoproj.io/pdb-wf-1 created
  1. Check workflow details
kubectl describe workflow pdb-wf-1

Name:         pdb-wf-1
....
Status:
  ....
  Conditions:
    Status:     True
    Type:       Completed
  Finished At:  2023-04-18T17:15:48Z
  Message:      Unable to create PDB resource for workflow, pdb-wf-1 error: poddisruptionbudgets.policy "pdb-wf-1" already exists
  Phase:        Failed
  Progress:     0/0
Events:
  Type     Reason           Age   From                 Message
  ----     ------           ----  ----                 -------
  Normal   WorkflowRunning  57s   workflow-controller  Workflow Running
  Normal   WorkflowRunning  57s   workflow-controller  Workflow Running
  Warning  WorkflowFailed   57s   workflow-controller  Unable to create PDB resource for workflow, pdb-wf-1 error: poddisruptionbudgets.policy "pdb-wf-1" already exists
  1. Check not PDBs exist after workflows failure
$ kubectl get pdb
No resources found in usr-p-vku namespace.

Version

v3.4.7

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: pdb-wf-1
spec:
  entrypoint: main
  podDisruptionBudget:
    minAvailable: 9999
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command: [cowsay]
        args: ["test"]

Logs from the workflow controller

time="2023-04-18T17:15:48.421Z" level=info msg="Processing workflow" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.430Z" level=info msg="Updated phase  -> Running" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.447Z" level=info msg="Created PDB resource for workflow." namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.447Z" level=info msg="Pod node pdb-wf-1 initialized Pending" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.480Z" level=info msg="Created pod: pdb-wf-1 (pdb-wf-1)" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.481Z" level=info msg="TaskSet Reconciliation" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.481Z" level=info msg=reconcileAgentPod namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.487Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"pdb-wf-1\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.488Z" level=info msg="Re-applying updates on latest version and retrying update" namespace=usr-p-vku workflow=pdb-wf-1
time="2023-04-18T17:15:48.494Z" level=info msg="Failed to re-apply update" error="must never update completed workflows" namespace=usr-p-vku workflow=pdb-wf-1

Logs from in your workflow's wait container

time="2023-04-18T17:15:51.528Z" level=info msg="Starting Workflow Executor" version=v3.4.7
time="2023-04-18T17:15:51.551Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-04-18T17:15:51.552Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=usr-p-vku podName=pdb-wf-1 template="{\"name\":\"main\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"test\"],\"resources\":{}}}" version="&Version{Version:v3.4.7,BuildDate:2023-04-11T16:19:29Z,GitCommit:f2292647c5a6be2f888447a1fef71445cc05b8fd,GitTag:v3.4.7,GitTreeState:clean,GoVersion:go1.19.8,Compiler:gc,Platform:linux/amd64,}"
time="2023-04-18T17:15:51.552Z" level=info msg="Starting deadline monitor"
time="2023-04-18T17:15:53.558Z" level=info msg="Main container completed" error="<nil>"
time="2023-04-18T17:15:53.558Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-04-18T17:15:53.558Z" level=info msg="No output parameters"
time="2023-04-18T17:15:53.558Z" level=info msg="No output artifacts"
time="2023-04-18T17:15:53.558Z" level=info msg="Alloc=6927 TotalAlloc=14842 Sys=29037 NumGC=5 Goroutines=7"
@kulinskyvs kulinskyvs changed the title PodDisruptionBudget support is broken in 3.4.7 Workflow PodDisruptionBudget support is broken in 3.4.7 Apr 18, 2023
@terrytangyuan
Copy link
Member

I could not reproduce this. What's your Kubernetes version?

@kulinskyvs
Copy link
Author

I could not reproduce this. What's your Kubernetes version?

AWS EKS 1.22

kubectl version -o yaml

clientVersion:
  buildDate: "2022-12-08T19:58:30Z"
  compiler: gc
  gitCommit: b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d
  gitTreeState: clean
  gitVersion: v1.26.0
  goVersion: go1.19.4
  major: "1"
  minor: "26"
  platform: linux/amd64
kustomizeVersion: v4.5.7
serverVersion:
  buildDate: "2023-01-24T09:34:06Z"
  compiler: gc
  gitCommit: 47b89ea2caa1f7958bc6539d6865820c86b4bf60
  gitTreeState: clean
  gitVersion: v1.22.17-eks-48e63af
  goVersion: go1.16.15
  major: "1"
  minor: 22+
  platform: linux/amd64

@kulinskyvs
Copy link
Author

kulinskyvs commented Apr 18, 2023

An update, the problem is not reproduced with workflow templates.
The workflow below is OK.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: pdb-wft
spec:
  podDisruptionBudget:
    maxUnavailable: 0
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command: [cowsay]
        args: ["test"]
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: pdb-wf
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: call
            templateRef:
              name: pdb-wft
              template: main

@terrytangyuan
Copy link
Member

terrytangyuan commented Apr 18, 2023

I could not reproduce it. I even tried creating an existing PDB before submitting the workflow and everything seems to work fine.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-wf-1
spec:
  minAvailable: 2
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: pdb-wf-1
spec:
  entrypoint: main
  podDisruptionBudget:
    minAvailable: 9999
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command: [cowsay]
        args: ["test"]

@kulinskyvs
Copy link
Author

@terrytangyuan ,

Thank you for your effort.
I've noticed that a WARN message with an actual Error is present in Argo Workflows Controller logs

time="2023-04-18T17:15:48.487Z" level=warning msg="Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"pdb-wf-1\": the object has been modified; please apply your changes to the latest version and try again Conflict" namespace=usr-p-vku workflow=pdb-wf-1

Not sure if this can be related.

@terrytangyuan
Copy link
Member

I submitted a potential fix in #10944. Would you like to try it out? It should be available in an image tag dev-fix-pdb as soon as the release builds finish https://github.com/argoproj/argo-workflows/actions/runs/4736276262/jobs/8407593700?pr=10944.

@kulinskyvs
Copy link
Author

I submitted a potential fix in #10944. Would you like to try it out? It should be available in an image tag dev-fix-pdb as soon as the release builds finish https://github.com/argoproj/argo-workflows/actions/runs/4736276262/jobs/8407593700?pr=10944.

Awesome!
I do confirm that the fix actually resolves the problem - tested with the dev-fix-pdb workflows controller tag.
Thank you for your reactivity!

@terrytangyuan
Copy link
Member

Great to hear!

terrytangyuan added a commit that referenced this issue Apr 19, 2023
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
terrytangyuan added a commit that referenced this issue Apr 19, 2023
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
terrytangyuan added a commit that referenced this issue Apr 19, 2023
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@terrytangyuan terrytangyuan self-assigned this Apr 19, 2023
terrytangyuan added a commit that referenced this issue Apr 19, 2023
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
terrytangyuan added a commit that referenced this issue May 25, 2023
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
JPZ13 pushed a commit to pipekit/argo-workflows that referenced this issue Jul 4, 2023
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Apr 22, 2024
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this issue May 9, 2024
argoproj#10944)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants