Skip to content

[receiver/prometheus] Invalid metric name validation scheme error in 0.128.0 #40788

@stephen-herd-eb

Description

@stephen-herd-eb

Component(s)

No response

What happened?

Description

After upgrading from OpenTelemetry Collector 0.127.0 to 0.128.0, the Prometheus receiver fails to create scrape pools for various ServiceMonitors and PodMonitors with the error "invalid metric name validation scheme". This appears to be a breaking change in metric name validation that affects commonly used Prometheus exporters including Kyverno, Loki, kube-state-metrics, and others.

Steps to Reproduce

  1. Deploy OpenTelemetry Collector 0.128.0 with Prometheus receiver and Target Allocator enabled
  2. Configure Target Allocator to discover ServiceMonitors and PodMonitors from various namespaces
  3. Deploy common Kubernetes monitoring components (Kyverno, Loki, kube-state-metrics, etc.)
  4. Observe collector logs

Expected Result

Prometheus receiver should successfully create scrape pools and collect metrics from all discovered targets, as it did in version 0.127.0.

Actual Result

Prometheus receiver fails to create scrape pools with error:

    error creating new scrape pool {"err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/kdp-core-kyverno/kyverno-background-controller/0"}

This affects multiple scrape targets including:

  • serviceMonitor/kdp-core-kyverno/*
  • podMonitor/loki/*
  • serviceMonitor/kube-state-metrics/*
  • podMonitor/opentelemetry-collectors/*

The same configuration works without issues in version 0.127.0.

Collector version

0.128.0

Environment information

Environment

OS: Kubernetes (Linux containers)
OpenTelemetry Collector: 0.128.0 (official container image)
Target Allocator: 0.127.0
OpenTelemetry Operator: 0.127.0
Deployment: Kubernetes StatefulSet via OpenTelemetry Operator

OpenTelemetry Collector configuration

---
# Source: opentelemetry-collectors/templates/otel-metrics-collector/otel-metrics-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-metrics
  labels:
    eb/envtype: dev
    eb/owner: MyTeam
    eb/service: opentelemetry
spec:
  mode: statefulset
  replicas: 3
  ports:
    - appProtocol: grpc
      name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
    - appProtocol: http
      name: otlp-http
      port: 4318
      protocol: TCP
      targetPort: 4318
  targetAllocator:
    enabled: true
    replicas: 1
    resources:
      limits:
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 1Gi
    prometheusCR:
      enabled: true
      podMonitorSelector: {}
      serviceMonitorSelector: {}
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: 750m
      memory: 2Gi
  config:
    receivers:
      prometheus:
        config:
          global: {}
          scrape_configs: []

    exporters:
      otlphttp/metrics:
        endpoint: http://otel-metrics-gateway.example.com:4318        

    processors:
      batch/metrics:
        send_batch_max_size: 1024
        send_batch_size: 512
        timeout: 2s
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 15

    service:
      pipelines:
        metrics:
          receivers: 
            - prometheus
          processors:
            - batch/metrics
            - memory_limiter
          exporters: 
            - otlphttp/metrics
      telemetry:
        metrics:
          level: detailed

Log output

2025-06-18T00:26:32.298Z    info    service@v0.128.0/service.go:199    Setting up own telemetry...    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}}
2025-06-18T00:26:32.298Z    info    memorylimiter@v0.128.0/memorylimiter.go:149    Using percentage memory limiter    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.kind": "processor", "total_memory_mib": 3072, "limit_percentage": 80, "spike_limit_percentage": 15}
2025-06-18T00:26:32.298Z    info    prometheusreceiver@v0.128.0/metrics_receiver.go:157    Starting discovery manager    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:32.299Z    info    targetallocator/manager.go:69    Starting target allocator discovery    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:37.299Z    error    error creating new scrape pool    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/example-namespace/example-service/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161
2025-06-18T00:26:37.299Z    error    error creating new scrape pool    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "podMonitor/monitoring-namespace/metrics-exporter/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161

Additional context

Version Matrix

  • OpenTelemetry Collector: 0.128.0 (fails)
  • OpenTelemetry Collector: 0.127.0 (works)
  • Target Allocator: 0.127.0
  • OpenTelemetry Operator: 0.127.0

Affected Exporters/Components

This issue affects multiple common Kubernetes monitoring components:

  • Kyverno (policy engine)
  • Loki (log aggregation)
  • kube-state-metrics
  • CloudWatch exporter
  • Custom application metrics

Prometheus Version Context

Prometheus version used by collector: v0.304.1 (from stack trace)

Workaround Status

No known configuration workaround found. Attempted configurations:

  • Adding metric_name_validation_scheme: "utf8" to global config (failed)
  • Various Prometheus receiver flags (failed)
  • Transform processors (ineffective - validation occurs before processing)

Impact Assessment

This is a blocking issue for upgrading to 0.128.0 in environments with:

  • ServiceMonitor/PodMonitor auto-discovery enabled
  • Common Kubernetes monitoring stack components
  • Prevents adoption of 0.128.0 security fixes and features

Regression Confirmation

Confirmed regression: exact same configuration works in 0.127.0 and fails in 0.128.0

Activity

github-actions

github-actions commented on Jun 18, 2025

@github-actions
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bacherfl

bacherfl commented on Jun 18, 2025

@bacherfl
Contributor

(Triage): Removing needs-triage as the issue contains a detailed description and explains how to reproduce the problem. Adding waiting-for-code-owners

krajorama

krajorama commented on Jun 18, 2025

@krajorama
Member

This sounds like something that should have been caught by a test. Working on adding a failing test.

krajorama

krajorama commented on Jun 18, 2025

@krajorama
Member

Managed to reproduce the error in a unit test. Working on making the test nicer and expose the error.
In the test the process doesn't fail, just logs and times out.

ywwg

ywwg commented on Jun 18, 2025

@ywwg
Contributor

When we were writing the config code, we went back and forth about where to set defaults. We ended up deciding that it was best to support blank / default values only in the text config loading code, rather than hiding the default behavior in the implementation. So the values cannot be empty once the structs have been created. I only found tests that created the config structs directly, perhaps there is some other code path that creates those structs?

krajorama

krajorama commented on Jun 18, 2025

@krajorama
Member

I wrote a test to check that the targetallocator works end 2 end. I'm waiting on CI for it to properly fail.

Then I can apply a fix in https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/targetallocator/manager.go#L153 to check if the values are provided for

metric_name_validation_scheme
metric_name_escaping_scheme

and if not , set some sensible defaults: legacy, underscores.

krajorama

krajorama commented on Jun 18, 2025

@krajorama
Member

Got the expected error in the test run: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/15735801823/job/44348128262?pr=40803

github.com/prometheus/prometheus/scrape.(*Manager).reload
	/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
	/home/runner/go/pkg/mod/github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161
    metrics_receiver_target_allocator_test.go:115: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/prometheusreceiver/metrics_receiver_target_allocator_test.go:115
        	Error:      	Should be zero, but was 1
        	Test:       	TestTargetAllocatorConfigLoad
        	Messages:   	There are log messages over the WARN level, see logs

12 remaining items

added a commit that references this issue on Aug 5, 2025
9a1d97f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @ywwg@bacherfl@krajorama@ArthurSens@starzhanganz

    Issue actions

      [receiver/prometheus] Invalid metric name validation scheme error in 0.128.0 · Issue #40788 · open-telemetry/opentelemetry-collector-contrib