Describe the bug
Trident appears to ignore node taints and its daemonset deploy pods on all nodes rather than the nodes without a NoSchedule taint such as:
spec:
taints:
- effect: NoSchedule
key: juju.is/kubernetes-control-plane
value: "true
The issue appeared when upgrading Charmed Kubernetes from v1.23 to v1.24. The puzzling bit is the daemonset definition has the following node selector:
Node-Selector: kubernetes.io/arch=amd64,kubernetes.io/os=linux
Irrespective, I would have expected the taint to apply as per test trying to schedule pods onto these nodes, which as expected failed without the correct taints.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 21.04.1 and 22.07.0
- Trident installation flags used: ./tridentctl -n trident install
- Container runtime: containerd 1.5.5-0ubuntu3~18.04.2 via apt install
- Kubernetes version: v1.24.3
- Kubernetes orchestrator: Charmed Kubernetes
- Kubernetes enabled feature gates: N/A
- OS: Ubuntu 18.04.1
- NetApp backend types:
ontap-nas
- Other:
To Reproduce
Node has the following key elements:
apiVersion: v1
kind: Node
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/os: linux
spec:
taints:
- effect: NoSchedule
key: juju.is/kubernetes-control-plane
value: "true"
The trident daemonset looks like this:
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "2"
creationTimestamp: "2022-08-04T18:57:36Z"
generation: 2
labels:
app: node.csi.trident.netapp.io
kubectl.kubernetes.io/default-container: trident-main
name: trident-csi
namespace: trident
resourceVersion: "171442562"
uid: 9a20849e-24ae-407a-a9bc-889113ecfdd9
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: node.csi.trident.netapp.io
template:
metadata:
creationTimestamp: null
labels:
app: node.csi.trident.netapp.io
spec:
containers:
- args:
- --no_persistence
- --rest=false
- --csi_node_name=$(KUBE_NODE_NAME)
- --csi_endpoint=$(CSI_ENDPOINT)
- --csi_role=node
- --log_format=text
- --http_request_timeout=1m30s
- --https_rest
- --https_port=17546
command:
- /trident_orchestrator
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix://plugin/csi.sock
- name: PATH
value: /netapp:/bin
image: netapp/trident:22.07.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /liveness
port: 17546
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: trident-main
readinessProbe:
failureThreshold: 5
httpGet:
path: /readiness
port: 17546
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: true
privileged: true
startupProbe:
failureThreshold: 5
httpGet:
path: /liveness
port: 17546
scheme: HTTPS
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /plugin
name: plugin-dir
- mountPath: /var/lib/kubelet/plugins
mountPropagation: Bidirectional
name: plugins-mount-dir
- mountPath: /var/lib/kubelet/pods
mountPropagation: Bidirectional
name: pods-mount-dir
- mountPath: /dev
name: dev-dir
- mountPath: /sys
name: sys-dir
- mountPath: /host
mountPropagation: Bidirectional
name: host-dir
- mountPath: /var/lib/trident/tracking
mountPropagation: Bidirectional
name: trident-tracking-dir
- mountPath: /certs
name: certs
readOnly: true
- args:
- --v=2
- --csi-address=$(ADDRESS)
- --kubelet-registration-path=$(REGISTRATION_PATH)
env:
- name: ADDRESS
value: /plugin/csi.sock
- name: REGISTRATION_PATH
value: /var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
imagePullPolicy: IfNotPresent
name: driver-registrar
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /plugin
name: plugin-dir
- mountPath: /registration
name: registration-dir
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostNetwork: true
hostPID: true
nodeSelector:
juju-application: kubernetes-worker
kubernetes.io/arch: amd64
kubernetes.io/os: linux
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: trident-csi
serviceAccountName: trident-csi
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /var/lib/kubelet/plugins/csi.trident.netapp.io/
type: DirectoryOrCreate
name: plugin-dir
- hostPath:
path: /var/lib/kubelet/plugins_registry/
type: Directory
name: registration-dir
- hostPath:
path: /var/lib/kubelet/plugins
type: DirectoryOrCreate
name: plugins-mount-dir
- hostPath:
path: /var/lib/kubelet/pods
type: DirectoryOrCreate
name: pods-mount-dir
- hostPath:
path: /dev
type: Directory
name: dev-dir
- hostPath:
path: /sys
type: Directory
name: sys-dir
- hostPath:
path: /
type: Directory
name: host-dir
- hostPath:
path: /var/lib/trident/tracking
type: DirectoryOrCreate
name: trident-tracking-dir
- name: certs
projected:
defaultMode: 420
sources:
- secret:
name: trident-csi
- secret:
name: trident-encryption-keys
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 4
desiredNumberScheduled: 4
numberAvailable: 4
numberMisscheduled: 0
numberReady: 4
observedGeneration: 2
updatedNumberScheduled: 4
Expected behavior
My expectation is that the daemonset would respect the node taints and not schedule pods on those nodes. To prove to myself the taints work, I tried to schedule a pod using the following yaml and as expected it failed:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-juju-dns
spec:
containers:
- name: ubuntu
image: docker.io/ubuntu
imagePullPolicy: IfNotPresent
command:
- "/bin/sh"
args:
- "-c"
- "sleep 100000"
nodeSelector:
kubernetes.io/hostname: juju-afc56e-21-lxd-2
Additional context
As I upgraded between v1.23 and v1.24, which had an existing trident installation, its possible the upgrade process is a factor but in my case would be impossible to reproduce.
I want to say, otherwise I've had a flawless experience with NetApp Trident - keep up the good work 😄 !
Thanks!
Describe the bug
Trident appears to ignore node taints and its
daemonsetdeploy pods on all nodes rather than the nodes without aNoScheduletaint such as:The issue appeared when upgrading Charmed Kubernetes from v1.23 to v1.24. The puzzling bit is the
daemonsetdefinition has the following node selector:Irrespective, I would have expected the taint to apply as per test trying to schedule pods onto these nodes, which as expected failed without the correct taints.
Environment
Provide accurate information about the environment to help us reproduce the issue.
ontap-nasTo Reproduce
Node has the following key elements:
The trident
daemonsetlooks like this:Expected behavior
My expectation is that the
daemonsetwould respect the node taints and not schedule pods on those nodes. To prove to myself the taints work, I tried to schedule a pod using the following yaml and as expected it failed:Additional context
As I upgraded between v1.23 and v1.24, which had an existing trident installation, its possible the upgrade process is a factor but in my case would be impossible to reproduce.
I want to say, otherwise I've had a flawless experience with NetApp Trident - keep up the good work 😄 !
Thanks!