New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory plugin: Metrics to not sum up to a stable physical memory size. #4237
Comments
I just recently ran into this issue as part of #4224 and ended up removing the "available" and "slab" metrics for collectd 6. The idea is that all the I think we should:
|
Grouping "random" memory values reported by OS to get "total" is not correct. Some of them can overlap (slab, shared), and you can miss multiple relevant types of memory (for example on Linux, system RAM used by DRM subsystem for GPU allocations, which could in some situations be fairly huge amount, can depending on driver and kernel version be accounted in other reported memory types, or not have any accounted at all, except maybe in debugfs). OS reports memory total separately, and that is what should be reported as total, and used for ratio calculations. (IMHO it's a bit odd that swap usage comes from separate plugin, as logically that's also about memory usage.) |
By (correctly) using OS reported value for total memory amount, slap & available values could be brought back. (They are somewhat useful and removing them is bound to disrupt somebody's workflow: https://m.xkcd.com/1172/) |
Btw. Available is defined in OTEL spec: https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemlinuxmemoryavailable |
Hi, I'm trying to resolve this issue by following the two ways @octo says, just come up with some thoughts, not sure understand the whole thing correctly :-)
We could put it into a separate plugin like what "swap" does. But if we want to add new metrics later, to follow what "swap" does, we may need lots of plugins to make things reasonable, e.g, if we want "HugePages_*", we need a "hugepage" plugin, if we want "Vmalloc*", we need a "vmalloc" plugin. It might be redundant since all of the metrics belong to "memory" usage. Like @eero-t also says. |
To clarify, my suggestion was to put them behind a flag and use different plugin instance. The flag is negotiable. Aggregating all metrics with the same plugin, plugin instance, and type must yield something useful, hence the different plugin instance values.
I'm not sure I fully understand what you're saying. I think all code dealing with Linux' /proc/meminfo should remain in the "memory" plugin. My note was about setting the "plugin instance", which is one of the field in collectd 5 metric identifiers. To follow the provided example, the structure would look like this:
That said, the hugepages and vmem plugins do exist.
I'm not a huge van of the That said, the macro uses a local variable named For relative metrics, we need to answer: relative to what? I'd argue that |
Note that this is a separate metric ( I don't want to banish the "memory available" metric from collectd. But we cannot throw it in with the other memory metrics because these metrics must not be aggregated together. |
Expected behavior
The sum of the
memory/memory-*
metrics should be total amount of physical memory, i.e. the value should not fluctuate.Actual behavior
The "available" and "slab" metrics count memory that is already accounted for by other metrics. The "shared" memory metric is missing.
Steps to reproduce
See graphs in #3916. Some more information is available in #4224.
The text was updated successfully, but these errors were encountered: