-
Notifications
You must be signed in to change notification settings - Fork 29
Closed
Labels
documentationIssue/PR focused on fixing/editing/adding documentation bitsIssue/PR focused on fixing/editing/adding documentation bits
Description
IMHO there is a bug in the documentation for setting up Prometheus: kubernetes/kube-prometheus.rst .
The following rule should collect all Kubernetes endpoints and use them to scrape metrics:
additionalScrapeConfigs:
- job_name: gpu-metrics
scrape_interval: 1s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- gpu-operator
relabel_configs:
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_nodeHowever, not all endpoints in the gpu-operator namespace provide Prometheus metrics. In particular, node-feature-discovery-master has only one gRPC endpoint on port 8080, which cannot be scraped. I have changed this rule as follows to fix the problem:
additionalScrapeConfigs:
- job_name: gpu-metrics
scrape_interval: 1s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- gpu-operator
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
action: drop
regex: .*-node-feature-discovery-master
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_nodeMetadata
Metadata
Assignees
Labels
documentationIssue/PR focused on fixing/editing/adding documentation bitsIssue/PR focused on fixing/editing/adding documentation bits