-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added nodeSelector based on k8s node labels for kube-prometheus
#44
feat: added nodeSelector based on k8s node labels for kube-prometheus
#44
Conversation
Facing issues (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) k get no
NAME STATUS ROLES AGE VERSION
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i Ready <none> 31m v1.28.2+k3s1
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-7u8kj Ready <none> 31m v1.28.2+k3s1
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-19mw8 Ready <none> 31m v1.28.2+k3s1 specific labels (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) kubectl get nodes -o custom-columns=NAME:.metadata.name,cncf-project:.metadata.labels.cncf-project,cncf-project-sub:.metadata.labels."cncf-project-sub"
NAME cncf-project cncf-project-sub
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i wg-green-reviews internal
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-7u8kj <none> <none>
k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-19mw8 <none> <none> Refering to this file contents #44 (comment) and also artifact-hub https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack here is the pod status (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) kubectl get po -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitoring-kube-prometheus-stack-prometheus-node-exporter-h8hlp 0/1 Pending 0 14m <none> <none> <none> <none>
monitoring-kube-prometheus-stack-prometheus-node-exporter-5d68b 0/1 Pending 0 14m <none> <none> <none> <none>
monitoring-kube-prometheus-stack-prometheus-node-exporter-kdgcx 0/1 Pending 0 14m <none> <none> <none> <none>
monitoring-kube-prometheus-stack-grafana-657764b55c-l4f9c 0/3 ContainerCreating 0 14m <none> k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-7u8kj <none> <none>
monitoring-kube-prometheus-operator-d786f8cd6-xq4jt 1/1 Running 0 14m 10.42.1.8 k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i <none> <none>
monitoring-kube-prometheus-stack-kube-state-metrics-5c4665mr7ks 1/1 Running 0 14m 10.42.1.9 k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i <none> <none>
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 0 14m 10.42.1.10 k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i <none> <none>
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 14m 10.42.1.11 k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-5qz1i <none> <none> the issue is hers is the describe for grafana (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) k describe po -nmonitoring monitoring-kube-prometheus-stack-grafana-657764b55c-l4f9c
Name: monitoring-kube-prometheus-stack-grafana-657764b55c-l4f9c
Namespace: monitoring
Priority: 0
Service Account: monitoring-kube-prometheus-stack-grafana
Node: k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-7u8kj/192.168.1.3
Start Time: Tue, 06 Feb 2024 23:48:46 +0530
Labels: app.kubernetes.io/instance=monitoring-kube-prometheus-stack
app.kubernetes.io/name=grafana
pod-template-hash=657764b55c
Annotations: checksum/config: 0e9cbd0ea8e24e32f7dfca5bab17a2ba05652642f0a09a4882833ae88e4cc4a3
checksum/sc-dashboard-provider-config: 593c0a8778b83f11fe80ccb21dfb20bc46705e2be3178df1dc4c89d164c8cd9c
checksum/secret: 032056e9c62bbe9d1daa41ee49cd3d9524c076f51ca4c65adadf4ef08ef28712
kubectl.kubernetes.io/default-container: grafana
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monitoring-kube-prometheus-stack-grafana-657764b55c
Containers:
grafana-sc-dashboard:
Container ID:
Image: quay.io/kiwigrid/k8s-sidecar:1.25.2
Image ID:
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
METHOD: WATCH
LABEL: grafana_dashboard
LABEL_VALUE: 1
FOLDER: /tmp/dashboards
RESOURCE: both
NAMESPACE: ALL
REQ_USERNAME: <set to the key 'admin-user' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
REQ_PASSWORD: <set to the key 'admin-password' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
REQ_URL: http://localhost:3000/api/admin/provisioning/dashboards/reload
REQ_METHOD: POST
Mounts:
/tmp/dashboards from sc-dashboard-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lfpwl (ro)
grafana-sc-datasources:
Container ID:
Image: quay.io/kiwigrid/k8s-sidecar:1.25.2
Image ID:
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
METHOD: WATCH
LABEL: grafana_datasource
LABEL_VALUE: 1
FOLDER: /etc/grafana/provisioning/datasources
RESOURCE: both
REQ_USERNAME: <set to the key 'admin-user' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
REQ_PASSWORD: <set to the key 'admin-password' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
REQ_URL: http://localhost:3000/api/admin/provisioning/datasources/reload
REQ_METHOD: POST
Mounts:
/etc/grafana/provisioning/datasources from sc-datasources-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lfpwl (ro)
grafana:
Container ID:
Image: docker.io/grafana/grafana:10.2.3
Image ID:
Ports: 3000/TCP, 9094/TCP, 9094/UDP
Host Ports: 0/TCP, 0/TCP, 0/UDP
SeccompProfile: RuntimeDefault
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
Readiness: http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
GF_SECURITY_ADMIN_USER: <set to the key 'admin-user' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
GF_SECURITY_ADMIN_PASSWORD: <set to the key 'admin-password' in secret 'monitoring-kube-prometheus-stack-grafana'> Optional: false
GF_PATHS_DATA: /var/lib/grafana/
GF_PATHS_LOGS: /var/log/grafana
GF_PATHS_PLUGINS: /var/lib/grafana/plugins
GF_PATHS_PROVISIONING: /etc/grafana/provisioning
Mounts:
/etc/grafana/grafana.ini from config (rw,path="grafana.ini")
/etc/grafana/provisioning/dashboards/sc-dashboardproviders.yaml from sc-dashboard-provider (rw,path="provider.yaml")
/etc/grafana/provisioning/datasources from sc-datasources-volume (rw)
/tmp/dashboards from sc-dashboard-volume (rw)
/var/lib/grafana from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lfpwl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monitoring-kube-prometheus-stack-grafana
Optional: false
storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
sc-dashboard-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
sc-dashboard-provider:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monitoring-kube-prometheus-stack-grafana-config-dashboards
Optional: false
sc-datasources-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-lfpwl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned monitoring/monitoring-kube-prometheus-stack-grafana-657764b55c-l4f9c to k3s-test-green-reviews-7bff-8cd1e6-node-pool-6f11-7u8kj
Warning FailedMount 4m6s (x14 over 16m) kubelet MountVolume.SetUp failed for volume "config" : configmap "monitoring-kube-prometheus-stack-grafana" not found
Warning FailedMount 1s (x16 over 16m) kubelet MountVolume.SetUp failed for volume "sc-dashboard-provider" : configmap "monitoring-kube-prometheus-stack-grafana-config-dashboards" not found and for the daemonset (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) k get daemonset -n monitoring
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
monitoring-kube-prometheus-stack-prometheus-node-exporter 3 3 0 3 0 kubernetes.io/os=linux 17m and there pod status (base) (⎈ |test-green-reviews:default)
➜ base git:(add-node-selector-kepler-and-kube-prometheus-stack) k describe po -n monitoring monitoring-kube-prometheus-stack-prometheus-node-exporter-5d68b
Name: monitoring-kube-prometheus-stack-prometheus-node-exporter-5d68b
Namespace: monitoring
Priority: 0
Service Account: monitoring-kube-prometheus-stack-prometheus-node-exporter
Node: <none>
Labels: app.kubernetes.io/component=metrics
app.kubernetes.io/instance=monitoring-kube-prometheus-stack
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=prometheus-node-exporter
app.kubernetes.io/part-of=prometheus-node-exporter
app.kubernetes.io/version=1.7.0
controller-revision-hash=6fd4c99b97
helm.sh/chart=prometheus-node-exporter-4.25.0
jobLabel=node-exporter
pod-template-generation=1
release=monitoring-kube-prometheus-stack
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/monitoring-kube-prometheus-stack-prometheus-node-exporter
Containers:
node-exporter:
Image: quay.io/prometheus/node-exporter:v1.7.0
Port: 9100/TCP
Host Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
--path.udev.data=/host/root/run/udev/data
--web.listen-address=[$(HOST_IP)]:9100
--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
--collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
Liveness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOST_IP: 0.0.0.0
Mounts:
/host/proc from proc (ro)
/host/root from root (ro)
/host/sys from sys (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType:
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17m default-scheduler 0/3 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Warning FailedScheduling 7m46s (x2 over 12m) default-scheduler 0/3 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.. |
also dont worry about if any secret specific to the cluster is present its a temporaty cluster |
kepler got provisioned to the correct node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dipankardas011 Thanks for the PR!
I'm not sure why the grafana and node-exporter pods are not starting. The errors don't look related to the node selectors.
I'll deploy your changes to the cluster to debug it. I can do that later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dipankardas011 This was mostly working but there were some problems with the values.
I debugged it on my home lab. See https://github.com/rossf7/green-k8s-lab/blob/b0d4cb7ba19b7782ea3e0f0b65416a5f5f6a89c5/clusters/kube-prometheus-stack.yaml
k -n monitoring get statefulset alertmanager-monitoring-kube-prometheus-alertmanager -o yaml | yq e '.spec.template.spec.nodeSelector' -
cncf-project: wg-green-reviews
cncf-project-sub: internal
k -n monitoring get statefulset prometheus-monitoring-kube-prometheus-prometheus -o yaml | yq e '.spec.template.spec.nodeSelector' -
cncf-project: wg-green-reviews
cncf-project-sub: internal
k -n monitoring get deploy monitoring-kube-prometheus-stack-grafana -o yaml | yq e '.spec.template.spec.nodeSelector' -
cncf-project: wg-green-reviews
cncf-project-sub: internal
k -n monitoring get deploy monitoring-kube-prometheus-operator -o yaml | yq e '.spec.template.spec.nodeSelector' -
cncf-project: wg-green-reviews
cncf-project-sub: internal
k -n monitoring get deploy monitoring-kube-prometheus-stack-kube-state-metrics -o yaml | yq e '.spec.template.spec.nodeSelector' -
cncf-project: wg-green-reviews
cncf-project-sub: internal
@dipankardas011 Could you also remove kepler from the PR title and description? We can't merge this until we have an extra node with these labels. That's the next thing I'm going to work on. |
kube-prometheus
, kepler
kube-prometheus
so your manifest file worked? |
so node exporter uses kube-state-metrics?? |
in my system the nodeexporter is not running and is pending
|
@dipankardas011 It uses port 9100 is that available in your cluster? On the Equinix cluster it's running OK. The restarts are because of a Falco load test which is stressing the box 😅
Once the 1Password team is ready we can get you readonly access to the cluster. As I know working on the automation without cluster access can be challenging. |
The helm chart creates a separate daemonset
Yes but I'll re-test on the cluster once the extra worker node is created |
Okay if it works on equinox then okay |
Signed-off-by: Dipankar Das <65275144+dipankardas011@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dipankardas011 The separate worker nodes are now created.
Please downgrade the flux api version and then this is good to go.
We will upgrade flux separately. See #54
k get no
NAME STATUS ROLES AGE VERSION
green-reviews-control-plane Ready control-plane,master 21d v1.29.0+k3s1
green-reviews-worker-falco-a Ready <none> 19m v1.29.0+k3s1
green-reviews-worker-internal-1 Ready <none> 16m v1.29.0+k3s1
What it does?
it adds kepler and kube-promtehus helm deployment to specific nodes based on nodeselector aka node labels
Fixes