Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting priorityClassName on ServiceLB daemonset. #10033

Closed
josephshanak opened this issue Apr 27, 2024 · 6 comments
Closed

Allow setting priorityClassName on ServiceLB daemonset. #10033

josephshanak opened this issue Apr 27, 2024 · 6 comments
Assignees
Milestone

Comments

@josephshanak
Copy link

Is your feature request related to a problem? Please describe.
I would like to set priorityClassName on all of the pods in my cluster so I can control the order in which they preempted. The pods created by ServiceLB daemonsets do not have a priorityClassName so they receive the default priority of 0, which is lower than other priority classes I have defined. This means these pods will likely be preempted when the cluster is over-committed.

Describe the solution you'd like
I would like the ability to set a priorityClassName on the pods created by ServiceLB / k3s:

Template: core.PodTemplateSpec{
ObjectMeta: meta.ObjectMeta{
Labels: labels.Set{
"app": name,
svcNameLabel: svc.Name,
svcNamespaceLabel: svc.Namespace,
},
},
Spec: core.PodSpec{
ServiceAccountName: "svclb",
AutomountServiceAccountToken: utilsptr.To(false),
SecurityContext: &core.PodSecurityContext{
Sysctls: sysctls,
},
Tolerations: []core.Toleration{
{
Key: util.MasterRoleLabelKey,
Operator: "Exists",
Effect: "NoSchedule",
},
{
Key: util.ControlPlaneRoleLabelKey,
Operator: "Exists",
Effect: "NoSchedule",
},
{
Key: "CriticalAddonsOnly",
Operator: "Exists",
},
},
},
},

Perhaps via a commandline option --servicelb-priority-class=my-priority-class.

Describe alternatives you've considered

  1. I could use a Priority Class with globalDefault: true to define a global default. However, this means pods without a priorityClassName will be scheduled with the same priority, which is not ideal because it priorityClassName could be forgotten.

  2. I could create priority classes with negative values (per https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass this should be fine), and use the global default for ServiceLB pods only (however, this is not ideal for the same reason above).

  3. k3s could create the pods with system-cluster-critical or system-node-critical priority classes.

  4. I could disable ServiceLB with --disable=servicelb and install another load balancer provider like MetalLB, which seems to support priorityClassName (helm option for priorityClassName metallb/metallb#995).

@ChristianCiach
Copy link

You could probably also use a mutating admission controller like Kyverno to modify the pod-spec based on custom rules. See: https://kyverno.io/docs/writing-policies/mutate/

This is surely not an attractive option, but it's a possibility nonetheless.

@brandond
Copy link
Contributor

Seems reasonable. See the linked PR.

@josephshanak
Copy link
Author

PR looks good to me! And an annotation seems much more flexible!

@brandond
Copy link
Contributor

brandond commented May 6, 2024

The pods created by ServiceLB daemonsets do not have a priorityClassName so they receive the default priority of 0, which is lower than other priority classes I have defined.

I will note that the svclb pods have no requests or reservations and consume basically no resources since all they just go to sleep after adding iptables rules.

root@k3s-server-1:~# kubectl top pod -n kube-system
NAME                                      CPU(cores)   MEMORY(bytes)
coredns-6799fbcd5-zxktb                   2m           13Mi
local-path-provisioner-6c86858495-dpfb6   1m           6Mi
metrics-server-54fd9b65b-9xqxs            5m           21Mi
svclb-traefik-49baafe9-xwvrd              0m           0Mi
traefik-7d5f6474df-hfhwd                  1m           26Mi

This means these pods will likely be preempted when the cluster is over-committed.

Are you actually seeing the svclb pods get preempted, or is this a theoretical problem?

@josephshanak
Copy link
Author

The pods created by ServiceLB daemonsets do not have a priorityClassName so they receive the default priority of 0, which is lower than other priority classes I have defined.

I will note that the svclb pods have no requests or reservations and consume basically no resources since all they just go to sleep after adding iptables rules.

root@k3s-server-1:~# kubectl top pod -n kube-system
NAME                                      CPU(cores)   MEMORY(bytes)
coredns-6799fbcd5-zxktb                   2m           13Mi
local-path-provisioner-6c86858495-dpfb6   1m           6Mi
metrics-server-54fd9b65b-9xqxs            5m           21Mi
svclb-traefik-49baafe9-xwvrd              0m           0Mi
traefik-7d5f6474df-hfhwd                  1m           26Mi

This means these pods will likely be preempted when the cluster is over-committed.

Are you actually seeing the svclb pods get preempted, or is this a theoretical problem?

This is theoretical. I have not experienced this. I came upon this while attempting to assign priority classes to all pods.

@mdrahman-suse
Copy link

mdrahman-suse commented Jun 4, 2024

Validated on master (v1.30) branch with commit 1268779

Environment and Config

Ubuntu 22.04, Single server
  • config.yaml
write-kubeconfig-mode: 644
cluster-init: true
token: summerheat
node-name: server1
  • pc.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for XYZ service pods only."

Testing Steps:

  • Copy config in path /etc/rancher/k3s/
  • Install k3s
  • Ensure cluster is up and running
  • Check for priority class
$ kubectl get priorityClass
NAME                      VALUE        GLOBAL-DEFAULT   AGE
system-cluster-critical   2000000000   false            35m
system-node-critical      2000001000   false            35m
  • Create and apply a new priorityClass (pc.yaml) (OR Use an existing priorityClass)
$ kubectl apply -f pc.yaml
priorityclass.scheduling.k8s.io/high-priority created

$ kubectl get priorityClass
NAME                      VALUE        GLOBAL-DEFAULT   AGE
high-priority             1000000      false            7s
system-cluster-critical   2000000000   false            56m
system-node-critical      2000001000   false            56m
  • Check priority class on the svclb pod and ensure default priority and priorityClassName is applied
$ kubectl get pods -n kube-system svclb-traefik-v1 -o yaml | grep priority
  priority: 2000001000
  priorityClassName: system-node-critical
  • Annotate the priority class on the service
kubectl annotate svc -n kube-system traefik svccontroller.k3s.cattle.io/priorityclassname=high-priority
  • Check priority class on the svclb pod again and ensure default priority and priorityClassName gets overridden and svclb pod is restarted
$ kubectl get pods -n kube-system svclb-traefik-v2 -o yaml | grep priority
  priority: 1000000
  priorityClassName: high-priority

Replication:

$ k3s -v
k3s version v1.30.1+k3s1 (80978b5b)
go version go1.22.2
  • Pods
$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-576bfc4dc7-2sw8q                  1/1     Running     0          25m
kube-system   helm-install-traefik-594pt                0/1     Completed   1          25m
kube-system   helm-install-traefik-crd-pgjm7            0/1     Completed   0          25m
kube-system   local-path-provisioner-75bb9ff978-8zjj9   1/1     Running     0          25m
kube-system   metrics-server-557ff575fb-44mlq           1/1     Running     0          25m
kube-system   svclb-traefik-091b054d-8ct4c              2/2     Running     0          25m
kube-system   traefik-5fb479b77-ghsmz                   1/1     Running     0          25m
  • No priorityClassName on svclb
$ kubectl get pods -n kube-system svclb-traefik-091b054d-8ct4c -o yaml | grep priority
  priority: 0
  • Applied and annotated new priority class on service
$ kubectl annotate svc -n kube-system traefik svccontroller.k3s.cattle.io/priorityclassname=high-priority
service/traefik annotated

$ kubectl get svc -n kube-system traefik -o  yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: traefik
    meta.helm.sh/release-namespace: kube-system
    svccontroller.k3s.cattle.io/priorityclassname: high-priority
  • No change in svclb pod
$ kubectl get pods -n kube-system svclb-traefik-091b054d-8ct4c -o yaml | grep priority
  priority: 0

Validation:

$ k3s -v
k3s version v1.30.1+k3s-1268779e (1268779e)
go version go1.22.2
  • Pods before change
$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-576bfc4dc7-snzxc                  1/1     Running     0          56m
kube-system   helm-install-traefik-crd-26scq            0/1     Completed   0          56m
kube-system   helm-install-traefik-lhh8g                0/1     Completed   1          56m
kube-system   local-path-provisioner-86f46b7bf7-xvsbb   1/1     Running     0          56m
kube-system   metrics-server-557ff575fb-jqpd8           1/1     Running     0          56m
kube-system   svclb-traefik-9d42b1d1-ckxrz              2/2     Running     0          59s
kube-system   traefik-5fb479b77-hsmng                   1/1     Running     0          55m
  • Default priority and priorityClassName is observed
$ kubectl get pods -n kube-system svclb-traefik-9d42b1d1-ckxrz -o yaml | grep priority
  priority: 2000001000
  priorityClassName: system-node-critical
  • Applied and annotated new priority class on service
$ kubectl annotate svc -n kube-system traefik svccontroller.k3s.cattle.io/priorityclassname=high-priority
service/traefik annotated

$ kubectl get svc -n kube-system traefik -o  yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: traefik
    meta.helm.sh/release-namespace: kube-system
    svccontroller.k3s.cattle.io/priorityclassname: high-priority
  • Pods after annotations applied, observed svclb pod are restarted
$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-576bfc4dc7-snzxc                  1/1     Running     0          60m
kube-system   helm-install-traefik-crd-26scq            0/1     Completed   0          60m
kube-system   helm-install-traefik-lhh8g                0/1     Completed   1          60m
kube-system   local-path-provisioner-86f46b7bf7-xvsbb   1/1     Running     0          60m
kube-system   metrics-server-557ff575fb-jqpd8           1/1     Running     0          60m
kube-system   svclb-traefik-9d42b1d1-hxl74              2/2     Running     0          25s
kube-system   traefik-5fb479b77-hsmng                   1/1     Running     0          60m
  • Changes are applied as expected on the pod with new priority and priorityClassName
$ kubectl get pods -n kube-system svclb-traefik-9d42b1d1-hxl74 -o yaml | grep priority
  priority: 1000000
  priorityClassName: high-priority

Additional testing and observation

  • When svccontroller.k3s.cattle.io/priorityclassname=, the property gets removed from svclb and priority is 0
  • When svccontroller.k3s.cattle.io/priorityclassname=blah, the svclb pod fails to start as the property is invalid and throws error in the logs
Jun 04 21:14:47 server1 k3s[1675]: E0604 21:14:47.580870    1675 daemon_controller.go:1030] pods "svclb-traefik-9d42b1d1-" is forbidden: no PriorityClass with name blah was found
Jun 04 21:14:47 server1 k3s[1675]: E0604 21:14:47.608220    1675 daemon_controller.go:324] kube-system/svclb-traefik-9d42b1d1 failed with : pods "svclb-traefik-9d42b1d1-" is forbidden: no PriorityClass with name blah was found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

5 participants