Mistake in the documentation of the Prometheus setup

IMHO there is a bug in the documentation for setting up Prometheus: kubernetes/kube-prometheus.rst .

The following rule should collect all Kubernetes endpoints and use them to scrape metrics:

```yaml
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gpu-operator
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
```

However, not all endpoints in the gpu-operator namespace provide Prometheus metrics. In particular, node-feature-discovery-master has only one gRPC endpoint on port 8080, which cannot be scraped. I have changed this rule as follows to fix the problem:

```yaml
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gpu-operator
  relabel_configs:
  - source_labels: [__meta_kubernetes_endpoints_name]
    action: drop
    regex: .*-node-feature-discovery-master
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
```  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mistake in the documentation of the Prometheus setup #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mistake in the documentation of the Prometheus setup #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions