-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Update Internal Collector Telemetry Docs #7035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update Internal Collector Telemetry Docs #7035
Conversation
There are three ways to export internal Collector metrics. | ||
|
||
1. Self-ingesting, exporting internal metrics via the | ||
[Prometheus exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik this is not using the prometheus exporter, rather https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmichalek132 thanks for the clarification. How does the Go Prometheus exporter differ from one in the Collector?? I was under the impression that the Collector's one was based on the Go one??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole can answer that nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two reasons:
First, the exporters implement different interfaces. There is a Reader, and the exporter.Metrics for the collector.
Second, the collector's exporter is designed to aggregate metrics from multiple resources/targets together, similar to how the Prometheus server's /federate endpoint works. The Go SDK exporter is designed to only handle metrics from a single resource, more like prometheus client_golang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole so when exporter
is configured to use prometheus to export internal metrics, then it's using the Go Prometheus exporter behind the scenes then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. You can link to go.opentelemetry.io/otel/exporters/prometheus
exporter: | ||
prometheus: | ||
host: '0.0.0.0' | ||
port: 8888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be also nice to provide example how to get the original names back
https://github.com/open-telemetry/opentelemetry-collector/blob/e1f670844604a5b119d8560bc079ceca4c92bf72/CHANGELOG.md?plain=1#L347
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmichalek132 happy to do that. Can you elaborate on what is meant by the line Users who do not customize the Prometheus reader should not be impacted.
in the changelog? Is the "Prometheus reader" the same as the "Prometheus receiver"?
Do the Prometheus receiver (receivers::prometheus
) and/or Prometheus exporter (exporters::prometheus
) have to be configured when the internal metrics exporter is prometheus
?
- [Traces](#configure-internal-traces) | ||
|
||
{{% alert title="Who monitors the monitor?" color="info" %}} Internal Collector | ||
metrics can be exported directly to a backend for analysis, or to the Collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a collector shouldn't self monitor it's own telemetry, I would be wary of suggesting the telemetry should be sent directly to a backend.
For example imagine a common agent and gateway pattern on Kubernetes. We could have hundreds or thousands node agents, batching and shipping application telemetry to a gateway layer that could be comprised of only a few instances.
If someone tried to make connections from thousands of node agents to a vendors backend for otelcol telemetry, there may be a lot of scaling issues from that. The internal-telemetry would also never have a chance to be enriched with Kubernetes metadata.
I've been thinking of using a dedicated internal telemetry gateway of otelcol instances for this purpose, so every otelcol instance regardless of whether it's a node agent, a gateway or a load balancing exporter layer, would send to the same collector instances dedicated to otelcol telemetry. I'm not sure what to call this pattern, but maybe we can suggest it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kallangerard I like that idea! I'll make the revisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the same not we just use prometheus to scrape otel collector metrics directly, might be worth calling it out as an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmichalek132 sorry...I'm confused. Isn't that what setting the exporter
to prometheus does in the first config??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, we used to have a similar warning on this page, but it was removed when the self-monitoring section was removed. I'll leave it up to @codeboten if we want to add the warning again.
data), its internal telemetry won't be sent to its intended destination. This | ||
makes it difficult to detect problems with the Collector itself. Likewise, if | ||
the Collector generates additional telemetry related to the above issue, such | ||
error logs, and those logs are sent into the same collector, it can create a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an example of excluding otelcol's own logs from log tailing. I know Spunk's otelcol helm had some examples of this. I believe it was using custom exclude annotations if I recall correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kallangerard can you point me in the direction of this documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a look, I was wrong, Splunk used path filtering in the filelog receiver to exclude their self-logs, while otlp logs will filter out logs with an exclude annotation on the pod. They seem be excluding filelog capability entirely in their latest examples though.
I think if someone is using the filelog receiver, they're likely to have already come across this issue and are handling it in their own way.
For internal telemetry logs to an otlp endpoint, I don't think there's any way to do it safely by sending to its own otlp receiver endpoint. I'm not 100% sure on a safe alternative though. I believe there's some internal rate limiting in the otelcol's internal logs, but I haven't tested it with a self-exporting broken logging pipeline. I've been scraping otelcol logs with Datadog for a while and haven't seen any runaway log volumes, but I guess that's not self consumption. 🤷😅
Outside of this PR I'll try and write up some examples of a dedicated internal telemetry collector.
- pull: | ||
exporter: | ||
prometheus: | ||
host: '0.0.0.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0.0.0.0
right here?
- I'm not sure if this will work for IPv6 only stacks
- The default behaviour has been changed in endpoints and such from
0.0.0.0
tolocalhost
. Should we just be usinglocalhost
if we are intending to expose for self scraping, or be more explicit for public interfaces?
Probably not something we need to tackle here, but I'd love a chime in from anyone with better container/Linux networking knowledge than me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kallangerard this was in the original docs, so I can't speak for it (and haven't used it). Hope someone else will chime in with more info. 😁
@open-telemetry/collector-approvers, PTAL. Thanks! |
This PR contains updates to the documentation on Internal Collector Telemetry to help clarify some of the approaches for exporting internal Collector metrics. It also includes an explanation on why self-ingesting telemetry is not advisable.