Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional PodDisruptionBudget to the Helm chart #383

Merged
merged 3 commits into from
Feb 28, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Feb 28, 2024

The platform-engineer who deploys approver-policy using the Helm chart wants to avoid disruption of the approver-policy service when draining a node, for example.

[PodDisruptionBudget] ensures that a voluntary disruption, such as the draining of a Node, cannot proceed until at least one other replica has been successfully scheduled and started on another Node. The Helm chart has parameters to enable and configure a PodDisruptionBudget for each of the long-running cert-manager components.
https://cert-manager.io/docs/installation/best-practice/#poddisruptionbudget

They can use the following values.yaml to achieve this:

podDisruptionBudget:
   enabled: true

Testing

Without enabling PDB

$ helm upgrade --install -n venafi approver-policy _bin/scratch/image/cert-manager-approver-policy-v0.13.0-alpha.0-11-g6f7f6646bd238c-dirty.tgz
Release "approver-policy" does not exist. Installing it now.
NAME: approver-policy
LAST DEPLOYED: Wed Feb 28 10:09:08 2024
NAMESPACE: venafi
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
⚠️  WARNING: Consider increasing the Helm value `replicaCount` to 2 if you require high availability.
⚠️  WARNING: Consider setting the Helm value `podDisruptionBudget.enabled` to true if you require high availability.

CHART NAME: cert-manager-approver-policy
CHART VERSION: v0.13.0-alpha.0-11-g6f7f6646bd238c-dirty
APP VERSION: v0.13.0-alpha.0-11-g6f7f6646bd238c-dirty

cert-manager-approver-policy is a cert-manager project.

If you're a new user, we recommend that you read the [cert-manager Approval Policy documentation] to learn more.

[cert-manager Approval Policy documentation]: https://cert-manager.io/docs/policy/approval/

With PodDisruptionBudget enabled

$ helm upgrade --install -n venafi approver-policy _bin/scratch/image/cert-manager-approver-policy-v0.13.0-alpha.0-12-g075790a76f86b9-dirty.tgz  --set replicaCount=2 --set podDisruptionBudget.enabled=true --set image.tag=v0.12.1
Release "approver-policy" has been upgraded. Happy Helming!
NAME: approver-policy
LAST DEPLOYED: Wed Feb 28 10:22:50 2024
NAMESPACE: venafi
STATUS: deployed
REVISION: 4
TEST SUITE: None
NOTES:
CHART NAME: cert-manager-approver-policy
CHART VERSION: v0.13.0-alpha.0-12-g075790a76f86b9-dirty
APP VERSION: v0.13.0-alpha.0-12-g075790a76f86b9-dirty

cert-manager-approver-policy is a cert-manager project.

If you're a new user, we recommend that you read the [cert-manager Approval Policy documentation] to learn more.

[cert-manager Approval Policy documentation]: https://cert-manager.io/docs/policy/approval/

Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
@jetstack-bot jetstack-bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 28, 2024
{{- if .maxUnavailable -}}
maxUnavailable: {{ .maxUnavailable }}
{{- end -}}
{{- end -}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example output of the validation in this new helper.

$ helm upgrade --install -n venafi approver-policy _bin/scratch/image/cert-manager-approver-policy-v0.13.0-alpha.0-12-g075790a76f86b9-dirty.tgz --set podDisruptionBudget.enabled=true --set podDisruptionBudget.minAvailable=1 --set podDisruptionBudget.maxUnavailable=1 --set image.tag=v0.12.1
Error: UPGRADE FAILED: execution error at (cert-manager-approver-policy/templates/poddisruptionbudget.yaml:10:6): Cannot set both .minAvailable and .maxUnavailable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto generated by make generate-helm-schema

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto generated by make generate-helm-docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this NOTES file to give some warning to the user. Inspired by something @inteon added to venafi-enhanced-issuer.

@wallrj
Copy link
Member Author

wallrj commented Feb 28, 2024

/retest

Possible flakey test

approver-policy-controllers: [It] Ready if reconcilers return not-ready should set not-ready. After enqueue event but for wrong name, should not change returning not-ready expand_less 0s
{Failed after 0.046s.
expected the condition to maintain not-ready
Expected
: true
to be false failed [FAILED] Failed after 0.046s.
expected the condition to maintain not-ready
Expected
: true
to be false
In [It] at: /home/prow/go/src/github.com/cert-manager/approver-policy/pkg/internal/controllers/test/ready.go:363 @ 02/28/24 10:50:14.189
}
https://prow.build-infra.jetstack.net/view/gs/jetstack-logs/pr-logs/pull/cert-manager_approver-policy/383/pull-cert-manager-approver-policy-test/1762790588257669120

@wallrj
Copy link
Member Author

wallrj commented Feb 28, 2024

/hold while I test that this actually works on a multi-node cluster.

@jetstack-bot jetstack-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2024
Copy link
Member

@SgtCoDFish SgtCoDFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

My expectation for this is that it would copy https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/templates/poddisruptionbudget.yaml but this looks better - thanks!

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2024
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SgtCoDFish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wallrj
Copy link
Member Author

wallrj commented Feb 28, 2024

Demonstration of PodDisruptionBudget preventing disruption of the approver-policy service

  • Create three-node cluster
# kind.config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kind create cluster --config kind.config.yaml
  • Label and taint first worker only
kubectl label node kind-worker node-restriction.kubernetes.io/reserved-for=platform
kubectl taint node kind-worker node-restriction.kubernetes.io/reserved-for=platform:NoExecute
  • Deploy approver-policy
# values.yaml

image:
  tag: v0.12.1

replicaCount: 2

podDisruptionBudget:
  enabled: true

app:
  webhook:

    affinity:
      nodeAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
         - matchExpressions:
           - key: node-restriction.kubernetes.io/reserved-for
             operator: In
             values:
             - platform

    tolerations:
    - key: node-restriction.kubernetes.io/reserved-for
      operator: Equal
      value: platform
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.crds.yaml

helm upgrade approver-policy _bin/scratch/image/cert-manager-approver-policy-v0.13.0-alpha.0-12-g075790a76f86b9-dirty.tgz --install --create-namespace --namespace venafi --values values.yaml
  • Both Pods are assigned to first worker node
$ kubectl get pods -n venafi -o wide
NAME                                            READY   STATUS    RESTARTS      AGE     IP           NODE          NOMINATED NODE   READINESS GATES
cert-manager-approver-policy-78bdd57987-ps2jh   1/1     Running   4 (78s ago)   3m25s   10.244.2.3   kind-worker   <none>           <none>
cert-manager-approver-policy-78bdd57987-zkqn4   1/1     Running   4 (99s ago)   3m39s   10.244.2.2   kind-worker   <none>           <none>
  • Label and taint the second worker
kubectl label node kind-worker2 node-restriction.kubernetes.io/reserved-for=platform
kubectl taint node kind-worker2 node-restriction.kubernetes.io/reserved-for=platform:NoExecute

(Does not automatically trigger the re-scheduling of approver-policy Pod 2)

  • Drain worker node 1
$ kubectl drain kind-worker --ignore-daemonsets --delete-emptydir-data
node/kind-worker already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/kindnet-pw4rl, kube-system/kube-proxy-lf94r
evicting pod venafi/cert-manager-approver-policy-78bdd57987-zkqn4
evicting pod venafi/cert-manager-approver-policy-78bdd57987-ps2jh
error when evicting pods/"cert-manager-approver-policy-78bdd57987-zkqn4" -n "venafi" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/cert-manager-approver-policy-78bdd57987-ps2jh evicted
evicting pod venafi/cert-manager-approver-policy-78bdd57987-zkqn4
error when evicting pods/"cert-manager-approver-policy-78bdd57987-zkqn4" -n "venafi" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod venafi/cert-manager-approver-policy-78bdd57987-zkqn4
error when evicting pods/"cert-manager-approver-policy-78bdd57987-zkqn4" -n "venafi" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod venafi/cert-manager-approver-policy-78bdd57987-zkqn4
pod/cert-manager-approver-policy-78bdd57987-zkqn4 evicted
node/kind-worker drained
  • approver-policy Pods were moved to worker 2 one at a time
$ kubectl get pods -n venafi -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
cert-manager-approver-policy-78bdd57987-c9xdw   1/1     Running   0          12m   10.244.1.2   kind-worker2   <none>           <none>
cert-manager-approver-policy-78bdd57987-v62wp   1/1     Running   0          12m   10.244.1.3   kind-worker2   <none>           <none>

@wallrj
Copy link
Member Author

wallrj commented Feb 28, 2024

/unhold

@jetstack-bot jetstack-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2024
@jetstack-bot jetstack-bot merged commit bdd3a70 into cert-manager:main Feb 28, 2024
4 of 5 checks passed
@wallrj wallrj deleted the add-pdb branch February 28, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants