Skip to content

Update Internal Collector Telemetry Docs #7035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

avillela
Copy link
Contributor

@avillela avillela commented Jun 3, 2025

This PR contains updates to the documentation on Internal Collector Telemetry to help clarify some of the approaches for exporting internal Collector metrics. It also includes an explanation on why self-ingesting telemetry is not advisable.

@avillela avillela requested a review from a team as a code owner June 3, 2025 17:33
@opentelemetrybot opentelemetrybot requested review from a team and bogdandrutu and removed request for a team June 3, 2025 17:34
There are three ways to export internal Collector metrics.

1. Self-ingesting, exporting internal metrics via the
[Prometheus exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik this is not using the prometheus exporter, rather https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmichalek132 thanks for the clarification. How does the Go Prometheus exporter differ from one in the Collector?? I was under the impression that the Collector's one was based on the Go one??

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole can answer that nicely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two reasons:

First, the exporters implement different interfaces. There is a Reader, and the exporter.Metrics for the collector.

Second, the collector's exporter is designed to aggregate metrics from multiple resources/targets together, similar to how the Prometheus server's /federate endpoint works. The Go SDK exporter is designed to only handle metrics from a single resource, more like prometheus client_golang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dashpole so when exporter is configured to use prometheus to export internal metrics, then it's using the Go Prometheus exporter behind the scenes then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. You can link to go.opentelemetry.io/otel/exporters/prometheus

exporter:
prometheus:
host: '0.0.0.0'
port: 8888

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmichalek132 happy to do that. Can you elaborate on what is meant by the line Users who do not customize the Prometheus reader should not be impacted. in the changelog? Is the "Prometheus reader" the same as the "Prometheus receiver"?

Do the Prometheus receiver (receivers::prometheus) and/or Prometheus exporter (exporters::prometheus) have to be configured when the internal metrics exporter is prometheus?

- [Traces](#configure-internal-traces)

{{% alert title="Who monitors the monitor?" color="info" %}} Internal Collector
metrics can be exported directly to a backend for analysis, or to the Collector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a collector shouldn't self monitor it's own telemetry, I would be wary of suggesting the telemetry should be sent directly to a backend.

For example imagine a common agent and gateway pattern on Kubernetes. We could have hundreds or thousands node agents, batching and shipping application telemetry to a gateway layer that could be comprised of only a few instances.

If someone tried to make connections from thousands of node agents to a vendors backend for otelcol telemetry, there may be a lot of scaling issues from that. The internal-telemetry would also never have a chance to be enriched with Kubernetes metadata.

I've been thinking of using a dedicated internal telemetry gateway of otelcol instances for this purpose, so every otelcol instance regardless of whether it's a node agent, a gateway or a load balancing exporter layer, would send to the same collector instances dedicated to otelcol telemetry. I'm not sure what to call this pattern, but maybe we can suggest it here?

Copy link
Contributor Author

@avillela avillela Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kallangerard I like that idea! I'll make the revisions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the same not we just use prometheus to scrape otel collector metrics directly, might be worth calling it out as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmichalek132 sorry...I'm confused. Isn't that what setting the exporter to prometheus does in the first config??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, we used to have a similar warning on this page, but it was removed when the self-monitoring section was removed. I'll leave it up to @codeboten if we want to add the warning again.

data), its internal telemetry won't be sent to its intended destination. This
makes it difficult to detect problems with the Collector itself. Likewise, if
the Collector generates additional telemetry related to the above issue, such
error logs, and those logs are sent into the same collector, it can create a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an example of excluding otelcol's own logs from log tailing. I know Spunk's otelcol helm had some examples of this. I believe it was using custom exclude annotations if I recall correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kallangerard can you point me in the direction of this documentation?

Copy link

@kallangerard kallangerard Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a look, I was wrong, Splunk used path filtering in the filelog receiver to exclude their self-logs, while otlp logs will filter out logs with an exclude annotation on the pod. They seem be excluding filelog capability entirely in their latest examples though.

See this old version https://github.com/signalfx/splunk-otel-collector-chart/blob/0fa56adc9c55728094da367fd71a51655b1da40a/examples/only-logs-with-extra-file-logs/rendered_manifests/configmap-agent.yaml#L133-L136

I think if someone is using the filelog receiver, they're likely to have already come across this issue and are handling it in their own way.

For internal telemetry logs to an otlp endpoint, I don't think there's any way to do it safely by sending to its own otlp receiver endpoint. I'm not 100% sure on a safe alternative though. I believe there's some internal rate limiting in the otelcol's internal logs, but I haven't tested it with a self-exporting broken logging pipeline. I've been scraping otelcol logs with Datadog for a while and haven't seen any runaway log volumes, but I guess that's not self consumption. 🤷😅

Outside of this PR I'll try and write up some examples of a dedicated internal telemetry collector.

- pull:
exporter:
prometheus:
host: '0.0.0.0'
Copy link

@kallangerard kallangerard Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0.0.0.0 right here?

  • I'm not sure if this will work for IPv6 only stacks
  • The default behaviour has been changed in endpoints and such from 0.0.0.0 to localhost. Should we just be using localhost if we are intending to expose for self scraping, or be more explicit for public interfaces?

Probably not something we need to tackle here, but I'd love a chime in from anyone with better container/Linux networking knowledge than me.

Copy link
Contributor Author

@avillela avillela Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kallangerard this was in the original docs, so I can't speak for it (and haven't used it). Hope someone else will chime in with more info. 😁

@tiffany76
Copy link
Contributor

@open-telemetry/collector-approvers, PTAL. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants