Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collectd 6] Add an option to configure resource attributes. #4199

Merged
merged 5 commits into from
Dec 28, 2023

Conversation

octo
Copy link
Member

@octo octo commented Dec 19, 2023

ChangeLog: daemon: The new "Resource" option allows users to configure resource attributes manually if desired.

@octo octo requested review from a team as code owners December 19, 2023 13:06
@collectd-bot collectd-bot added this to the 6.0 milestone Dec 19, 2023
@octo octo added this to In progress in collectd 6 via automation Dec 19, 2023
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look fine. But I think documentation needs some improvements:

  • Name is too generic, so some short explanation how this "Resource" differs from all the other "Resource" items already mentioned in the document, would be good
  • Text is on the abstract side and IMHO would need some example of how it changes collectd output, so users unfamiliar with OpenTelemetry know what it's about

src/collectd.conf.pod Outdated Show resolved Hide resolved
@eero-t
Copy link
Contributor

eero-t commented Dec 20, 2023

There should be also None option for disabling resource attribute labels completely, because resource label names use characters that are not valid for Prometheus metrics, see #4204 and: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

@eero-t
Copy link
Contributor

eero-t commented Dec 21, 2023

There should be also None option for disabling resource attribute labels completely, because resource label names use characters that are not valid for Prometheus metrics, see #4204 and: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

That issue is now fixed, but need for disabling the resource labels has gone nowhere, because they just duplicate information that Prometheus already adds to the metrics.

EDIT: there may also be other write plugins / things reading them, that resource labels could have broken. None option would be a good workaround / test for such things too.

@octo
Copy link
Member Author

octo commented Dec 21, 2023

My thinking was that the following config should give you the same effect as a None resource:

<Resource Generic>
  Attribute "service.name" ""
</Resource>

Can you tell me more how the resource attributes and the Prometheus export clash? It's important to me to get this right.

@eero-t
Copy link
Contributor

eero-t commented Dec 22, 2023

My thinking was that the following config should give you the same effect as a None resource:

True, and it's a useful feature, but I'm assuming that you're going to be adding more attributes in the future?

None option would be a more user-friendly way of making sure that all extra labels are disabled.

Can you tell me more how the resource attributes and the Prometheus export clash? It's important to me to get this right.

Problem is that extra labels (on every metric) can noticeably increase Prometheus resource usage in larger clusters, so it's important that one can easily disable anything that is not needed, especially when Prometheus already provides corresponding information in (more relevant kubernetes namespace) labels.

EDIT: other telemetry systems ingesting labels are likely to be impacted too, not just Prometheus.

@octo
Copy link
Member Author

octo commented Dec 23, 2023

My thinking was that the following config should give you the same effect as a None resource:

True, and it's a useful feature, but I'm assuming that you're going to be adding more attributes in the future?

My design idea is this:

  • Metrics that already have resource attributes, e.g. because they were received from somewhere else via gRPC or something similar, are simply passed through. No resource attributes are added, removed, or modified.
  • Metrics that are passed to plugin_dispatch_metric_family without resource attributes are considered to be "local", i.e. as originating from this collectd instance. The resource attributes associated with this instance of collectd are attached to the metric family.
  • Strike a good balance between auto-discovery and user control. Users likely don't want to manually configure the host name. If you're managing a large fleet, having a different config file for each host is quite annoying.
    • Provide presets that users can chose by passing an appropriate argument to the Preset block. At the moment, Generic and Host are the only available presets. Other presets, e.g. for GCE instances, K8s pods, etc. might follow.
    • Allow users to amend or overwrite the defaults, using the Attribute option inside the Resource block.

None option would be a more user-friendly way of making sure that all extra labels are disabled.

I think you're coming from a position where these labels cause problems with the write_prometheus plugin. Once these are properly integrated (I'm working on this in #4213) I think that the existing Generic preset provides more utility for you than a None preset, because they allow you to set the job and instance labels appropriately.

Problem is that extra labels (on every metric) can noticeably increase Prometheus resource usage in larger clusters, so it's important that one can easily disable anything that is not needed, especially when Prometheus already provides corresponding information in (more relevant kubernetes namespace) labels.

Gotcha, thanks for the info! IMHO you have two options already:

  1. Use the Generic preset and unset the one resource attribute it installs if required, as outlined above.
  2. Use metric relabeling in Prometheus to discard unwanted labels.

I'm sure collectd will gain additional features in the future, but we have to start somewhere, and this seems like quite a good position to be in for a first step.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems reasonable, especially considering #4213.

Mostly to avoid an uninformed compiler warning ;)
Since `configfile` now refers to `resource`, this needs to be linked into
tests.
To achieve this, track whether or not the struct has been initialized
separately, so that users can remove all the attributes if they wish.
@octo octo merged commit af8b505 into collectd:collectd-6.0 Dec 28, 2023
23 checks passed
collectd 6 automation moved this from In progress to Done Dec 28, 2023
@octo octo deleted the 6/resource branch December 28, 2023 08:14
@octo octo added the Feature label Jan 12, 2024
@octo octo added the core label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
collectd 6
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants