Update Internal Collector Telemetry Docs #7035

avillela · 2025-06-03T17:33:53Z

This PR contains updates to the documentation on Internal Collector Telemetry to help clarify some of the approaches for exporting internal Collector metrics. It also includes an explanation on why self-ingesting telemetry is not advisable.

jmichalek132 · 2025-06-03T18:24:36Z

content/en/docs/collector/internal-telemetry.md

+There are three ways to export internal Collector metrics.
+
+1. Self-ingesting, exporting internal metrics via the
+   [Prometheus exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter).


Afaik this is not using the prometheus exporter, rather https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

@jmichalek132 thanks for the clarification. How does the Go Prometheus exporter differ from one in the Collector?? I was under the impression that the Collector's one was based on the Go one??

@dashpole can answer that nicely.

There are two reasons:

First, the exporters implement different interfaces. There is a Reader, and the exporter.Metrics for the collector.

Second, the collector's exporter is designed to aggregate metrics from multiple resources/targets together, similar to how the Prometheus server's /federate endpoint works. The Go SDK exporter is designed to only handle metrics from a single resource, more like prometheus client_golang.

@dashpole so when exporter is configured to use prometheus to export internal metrics, then it's using the Go Prometheus exporter behind the scenes then?

Yes, that is correct. You can link to go.opentelemetry.io/otel/exporters/prometheus

jmichalek132 · 2025-06-03T18:26:23Z

content/en/docs/collector/internal-telemetry.md

+               exporter:
+                 prometheus:
+                   host: '0.0.0.0'
+                   port: 8888


might be also nice to provide example how to get the original names back
https://github.com/open-telemetry/opentelemetry-collector/blob/e1f670844604a5b119d8560bc079ceca4c92bf72/CHANGELOG.md?plain=1#L347

@jmichalek132 happy to do that. Can you elaborate on what is meant by the line Users who do not customize the Prometheus reader should not be impacted. in the changelog? Is the "Prometheus reader" the same as the "Prometheus receiver"?

Do the Prometheus receiver (receivers::prometheus) and/or Prometheus exporter (exporters::prometheus) have to be configured when the internal metrics exporter is prometheus?

kallangerard · 2025-06-04T04:17:37Z

content/en/docs/collector/internal-telemetry.md

+- [Traces](#configure-internal-traces)
+
+{{% alert title="Who monitors the monitor?" color="info" %}} Internal Collector
+metrics can be exported directly to a backend for analysis, or to the Collector


I agree that a collector shouldn't self monitor it's own telemetry, I would be wary of suggesting the telemetry should be sent directly to a backend.

For example imagine a common agent and gateway pattern on Kubernetes. We could have hundreds or thousands node agents, batching and shipping application telemetry to a gateway layer that could be comprised of only a few instances.

If someone tried to make connections from thousands of node agents to a vendors backend for otelcol telemetry, there may be a lot of scaling issues from that. The internal-telemetry would also never have a chance to be enriched with Kubernetes metadata.

I've been thinking of using a dedicated internal telemetry gateway of otelcol instances for this purpose, so every otelcol instance regardless of whether it's a node agent, a gateway or a load balancing exporter layer, would send to the same collector instances dedicated to otelcol telemetry. I'm not sure what to call this pattern, but maybe we can suggest it here?

@kallangerard I like that idea! I'll make the revisions.

On the same not we just use prometheus to scrape otel collector metrics directly, might be worth calling it out as an option.

@jmichalek132 sorry...I'm confused. Isn't that what setting the exporter to prometheus does in the first config??

For reference, we used to have a similar warning on this page, but it was removed when the self-monitoring section was removed. I'll leave it up to @codeboten if we want to add the warning again.

kallangerard · 2025-06-04T04:20:59Z

content/en/docs/collector/internal-telemetry.md

+data), its internal telemetry won't be sent to its intended destination. This
+makes it difficult to detect problems with the Collector itself. Likewise, if
+the Collector generates additional telemetry related to the above issue, such
+error logs, and those logs are sent into the same collector, it can create a


Do we need an example of excluding otelcol's own logs from log tailing. I know Spunk's otelcol helm had some examples of this. I believe it was using custom exclude annotations if I recall correctly.

@kallangerard can you point me in the direction of this documentation?

I just had a look, I was wrong, Splunk used path filtering in the filelog receiver to exclude their self-logs, while otlp logs will filter out logs with an exclude annotation on the pod. They seem be excluding filelog capability entirely in their latest examples though.

See this old version https://github.com/signalfx/splunk-otel-collector-chart/blob/0fa56adc9c55728094da367fd71a51655b1da40a/examples/only-logs-with-extra-file-logs/rendered_manifests/configmap-agent.yaml#L133-L136

I think if someone is using the filelog receiver, they're likely to have already come across this issue and are handling it in their own way.

For internal telemetry logs to an otlp endpoint, I don't think there's any way to do it safely by sending to its own otlp receiver endpoint. I'm not 100% sure on a safe alternative though. I believe there's some internal rate limiting in the otelcol's internal logs, but I haven't tested it with a self-exporting broken logging pipeline. I've been scraping otelcol logs with Datadog for a while and haven't seen any runaway log volumes, but I guess that's not self consumption. 🤷😅

Outside of this PR I'll try and write up some examples of a dedicated internal telemetry collector.

kallangerard · 2025-06-04T04:26:06Z

content/en/docs/collector/internal-telemetry.md

+           - pull:
+               exporter:
+                 prometheus:
+                   host: '0.0.0.0'


Is 0.0.0.0 right here?

I'm not sure if this will work for IPv6 only stacks

The default behaviour has been changed in endpoints and such from 0.0.0.0 to localhost. Should we just be using localhost if we are intending to expose for self scraping, or be more explicit for public interfaces?

Probably not something we need to tackle here, but I'd love a chime in from anyone with better container/Linux networking knowledge than me.

@kallangerard this was in the original docs, so I can't speak for it (and haven't used it). Hope someone else will chime in with more info. 😁

tiffany76 · 2025-06-06T21:17:48Z

@open-telemetry/collector-approvers, PTAL. Thanks!

avillela added 2 commits June 3, 2025 16:43

Update docs for internal Collector telemetry

a267085

Update docs for internal collector telemetry

e06e7b2

avillela requested a review from a team as a code owner June 3, 2025 17:33

github-project-automation bot added this to SIG Comms: PRs & Issues Jun 3, 2025

opentelemetrybot requested review from a team and bogdandrutu and removed request for a team June 3, 2025 17:34

github-actions bot added the sig:collector label Jun 3, 2025

jmichalek132 reviewed Jun 3, 2025

View reviewed changes

kallangerard reviewed Jun 4, 2025

View reviewed changes

Update Internal Collector Telemetry Docs #7035

Are you sure you want to change the base?

Update Internal Collector Telemetry Docs #7035

Uh oh!

Conversation

avillela commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avillela Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallangerard Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallangerard Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avillela Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiffany76 commented Jun 6, 2025

Uh oh!

Uh oh!

avillela Jun 4, 2025 •

edited

Loading

kallangerard Jun 6, 2025 •

edited

Loading

kallangerard Jun 4, 2025 •

edited

Loading

avillela Jun 4, 2025 •

edited

Loading