-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Component(s)
No response
What happened?
Description
After upgrading from OpenTelemetry Collector 0.127.0 to 0.128.0, the Prometheus receiver fails to create scrape pools for various ServiceMonitors and PodMonitors with the error "invalid metric name validation scheme". This appears to be a breaking change in metric name validation that affects commonly used Prometheus exporters including Kyverno, Loki, kube-state-metrics, and others.
Steps to Reproduce
- Deploy OpenTelemetry Collector 0.128.0 with Prometheus receiver and Target Allocator enabled
- Configure Target Allocator to discover ServiceMonitors and PodMonitors from various namespaces
- Deploy common Kubernetes monitoring components (Kyverno, Loki, kube-state-metrics, etc.)
- Observe collector logs
Expected Result
Prometheus receiver should successfully create scrape pools and collect metrics from all discovered targets, as it did in version 0.127.0.
Actual Result
Prometheus receiver fails to create scrape pools with error:
error creating new scrape pool {"err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/kdp-core-kyverno/kyverno-background-controller/0"}
This affects multiple scrape targets including:
- serviceMonitor/kdp-core-kyverno/*
- podMonitor/loki/*
- serviceMonitor/kube-state-metrics/*
- podMonitor/opentelemetry-collectors/*
The same configuration works without issues in version 0.127.0.
Collector version
0.128.0
Environment information
Environment
OS: Kubernetes (Linux containers)
OpenTelemetry Collector: 0.128.0 (official container image)
Target Allocator: 0.127.0
OpenTelemetry Operator: 0.127.0
Deployment: Kubernetes StatefulSet via OpenTelemetry Operator
OpenTelemetry Collector configuration
---
# Source: opentelemetry-collectors/templates/otel-metrics-collector/otel-metrics-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel-metrics
labels:
eb/envtype: dev
eb/owner: MyTeam
eb/service: opentelemetry
spec:
mode: statefulset
replicas: 3
ports:
- appProtocol: grpc
name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
- appProtocol: http
name: otlp-http
port: 4318
protocol: TCP
targetPort: 4318
targetAllocator:
enabled: true
replicas: 1
resources:
limits:
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
prometheusCR:
enabled: true
podMonitorSelector: {}
serviceMonitorSelector: {}
resources:
limits:
memory: 3Gi
requests:
cpu: 750m
memory: 2Gi
config:
receivers:
prometheus:
config:
global: {}
scrape_configs: []
exporters:
otlphttp/metrics:
endpoint: http://otel-metrics-gateway.example.com:4318
processors:
batch/metrics:
send_batch_max_size: 1024
send_batch_size: 512
timeout: 2s
memory_limiter:
check_interval: 1s
limit_percentage: 80
spike_limit_percentage: 15
service:
pipelines:
metrics:
receivers:
- prometheus
processors:
- batch/metrics
- memory_limiter
exporters:
- otlphttp/metrics
telemetry:
metrics:
level: detailed
Log output
2025-06-18T00:26:32.298Z info service@v0.128.0/service.go:199 Setting up own telemetry... {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}}
2025-06-18T00:26:32.298Z info memorylimiter@v0.128.0/memorylimiter.go:149 Using percentage memory limiter {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.kind": "processor", "total_memory_mib": 3072, "limit_percentage": 80, "spike_limit_percentage": 15}
2025-06-18T00:26:32.298Z info prometheusreceiver@v0.128.0/metrics_receiver.go:157 Starting discovery manager {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:32.299Z info targetallocator/manager.go:69 Starting target allocator discovery {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:37.299Z error error creating new scrape pool {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/example-namespace/example-service/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161
2025-06-18T00:26:37.299Z error error creating new scrape pool {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "podMonitor/monitoring-namespace/metrics-exporter/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161
Additional context
Version Matrix
- OpenTelemetry Collector: 0.128.0 (fails)
- OpenTelemetry Collector: 0.127.0 (works)
- Target Allocator: 0.127.0
- OpenTelemetry Operator: 0.127.0
Affected Exporters/Components
This issue affects multiple common Kubernetes monitoring components:
- Kyverno (policy engine)
- Loki (log aggregation)
- kube-state-metrics
- CloudWatch exporter
- Custom application metrics
Prometheus Version Context
Prometheus version used by collector: v0.304.1 (from stack trace)
Workaround Status
No known configuration workaround found. Attempted configurations:
- Adding
metric_name_validation_scheme: "utf8"
to global config (failed) - Various Prometheus receiver flags (failed)
- Transform processors (ineffective - validation occurs before processing)
Impact Assessment
This is a blocking issue for upgrading to 0.128.0 in environments with:
- ServiceMonitor/PodMonitor auto-discovery enabled
- Common Kubernetes monitoring stack components
- Prevents adoption of 0.128.0 security fixes and features
Regression Confirmation
Confirmed regression: exact same configuration works in 0.127.0 and fails in 0.128.0
Activity
github-actions commentedon Jun 18, 2025
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
bacherfl commentedon Jun 18, 2025
(Triage): Removing
needs-triage
as the issue contains a detailed description and explains how to reproduce the problem. Addingwaiting-for-code-owners
krajorama commentedon Jun 18, 2025
This sounds like something that should have been caught by a test. Working on adding a failing test.
krajorama commentedon Jun 18, 2025
Managed to reproduce the error in a unit test. Working on making the test nicer and expose the error.
In the test the process doesn't fail, just logs and times out.
ywwg commentedon Jun 18, 2025
When we were writing the config code, we went back and forth about where to set defaults. We ended up deciding that it was best to support blank / default values only in the text config loading code, rather than hiding the default behavior in the implementation. So the values cannot be empty once the structs have been created. I only found tests that created the config structs directly, perhaps there is some other code path that creates those structs?
krajorama commentedon Jun 18, 2025
I wrote a test to check that the targetallocator works end 2 end. I'm waiting on CI for it to properly fail.
Then I can apply a fix in https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/targetallocator/manager.go#L153 to check if the values are provided for
and if not , set some sensible defaults:
legacy
,underscores
.krajorama commentedon Jun 18, 2025
Got the expected error in the test run: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/15735801823/job/44348128262?pr=40803
12 remaining items
fix(prometheusreceiver): invalid metric name validation error in scra…
build(deps): Upgrade otelcollector to v0.131.0 (#1257)