memory plugin: Metrics to not sum up to a stable physical memory size. #4237

octo · 2024-01-17T12:28:12Z

Version of collectd: since Report MemAvailable when present in meminfo #3916
Operating system / distribution: Linux
Kernel version (if applicable): dunno

Expected behavior

The sum of the memory/memory-* metrics should be total amount of physical memory, i.e. the value should not fluctuate.

Actual behavior

The "available" and "slab" metrics count memory that is already accounted for by other metrics. The "shared" memory metric is missing.

Steps to reproduce

See graphs in #3916. Some more information is available in #4224.

The text was updated successfully, but these errors were encountered:

octo · 2024-01-17T12:30:00Z

I just recently ran into this issue as part of #4224 and ended up removing the "available" and "slab" metrics for collectd 6.

The idea is that all the memory/memory-* metrics can be grouped in one graph, and these two metrics were violating that assumption. After reading the kernel sources I ended up adding the "shared" metric, because it also originates in the struct sysinfo in the kernel.

I think we should:

Put both, "available" and "slab" behind a flag, if they're not already.
Use a separate "plugin instance" when reporting these metrics, e.g. "memory-available/bytes".

eero-t · 2024-01-17T12:38:44Z

Grouping "random" memory values reported by OS to get "total" is not correct. Some of them can overlap (slab, shared), and you can miss multiple relevant types of memory (for example on Linux, system RAM used by DRM subsystem for GPU allocations, which could in some situations be fairly huge amount, can depending on driver and kernel version be accounted in other reported memory types, or not have any accounted at all, except maybe in debugfs).

OS reports memory total separately, and that is what should be reported as total, and used for ratio calculations.

(IMHO it's a bit odd that swap usage comes from separate plugin, as logically that's also about memory usage.)

eero-t · 2024-01-24T10:14:13Z

By (correctly) using OS reported value for total memory amount, slap & available values could be brought back.

(They are somewhat useful and removing them is bound to disrupt somebody's workflow: https://m.xkcd.com/1172/)

eero-t · 2024-01-31T13:31:44Z

By (correctly) using OS reported value for total memory amount, slap & available values could be brought back.

Btw. Available is defined in OTEL spec: https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemlinuxmemoryavailable

zzzyhtheonly · 2024-02-05T10:27:35Z

Hi,
I'm thinking of adding "Dirty" and "Writeback" (personally use, also willing to send a PR then) in Linux for “memory” plugin, going through I find the same issue in #3916

I'm trying to resolve this issue by following the two ways @octo says, just come up with some thoughts, not sure understand the whole thing correctly :-)

Put both, "available" and "slab" behind a flag, if they're not already.

Maybe we could put all metrics that do not contribute to "total" into another submit function, i.e., we do not use MEMORY_SUBMIT for submitting metrics like "Available", instead we write a new submit function to submit those metrics one by one.
The disadvantage is that ValuesPercentage is not working for those metrics.
I prefer this way since it might make it easier to move on to adding new useful metrics :)
Another way is to make the sum to calculate percentages fixed as "MemTotal".
The disadvantage is it would be hard to add new metrics since we should always figure out if it has the "Cached" and "Shm" issue found in Report MemAvailable when present in meminfo #3916

Use a separate "plugin instance" when reporting these metrics, e.g. "memory-available/bytes".

We could put it into a separate plugin like what "swap" does. But if we want to add new metrics later, to follow what "swap" does, we may need lots of plugins to make things reasonable, e.g, if we want "HugePages_*", we need a "hugepage" plugin, if we want "Vmalloc*", we need a "vmalloc" plugin. It might be redundant since all of the metrics belong to "memory" usage. Like @eero-t also says.

octo · 2024-02-05T17:28:10Z

I'm trying to resolve this issue by following the two ways @octo says,

To clarify, my suggestion was to put them behind a flag and use different plugin instance. The flag is negotiable. Aggregating all metrics with the same plugin, plugin instance, and type must yield something useful, hence the different plugin instance values.

Use a separate "plugin instance" when reporting these metrics, e.g. "memory-available/bytes".

We could put it into a separate plugin like what "swap" does. But if we want to add new metrics later, to follow what "swap" does, we may need lots of plugins to make things reasonable, e.g, if we want "HugePages_", we need a "hugepage" plugin, if we want "Vmalloc", we need a "vmalloc" plugin. It might be redundant since all of the metrics belong to "memory" usage. Like @eero-t also says.

I'm not sure I fully understand what you're saying. I think all code dealing with Linux' /proc/meminfo should remain in the "memory" plugin. My note was about setting the "plugin instance", which is one of the field in collectd 5 metric identifiers.

To follow the provided example, the structure would look like this:

memory/ (plugin instance is empty)
- memory-{free,cached,buffered,shmem,used}
memory-hugepages/ (plugin instance is "hugepages")
- memory-{free,reserved,surplus,free}
memory-vmalloc/ (plugin instance is "vmalloc")
- memory-{used,chunk,free}

That said, the hugepages and vmem plugins do exist.

Maybe we could put all metrics that do not contribute to "total" into another submit function, i.e., we do not use MEMORY_SUBMIT for submitting metrics like "Available", instead we write a new submit function to submit those metrics one by one.
The disadvantage is that ValuesPercentage is not working for those metrics.

I'm not a huge van of the MEMORY_SUBMIT macro – it's way too clever and used to call a very clever function (here: a function with variable arguments). Read: if you want to change the way the memory plugin reports its metrics, be my guest!

That said, the macro uses a local variable named vl as a "template", so setting vl.plugin_instance before using MEMORY_SUBMIT should do what we need.

For relative metrics, we need to answer: relative to what? I'd argue that Committed_AS should be relative to CommitLimit, HugePages_Rsvd should be relative to HugePages_Total, and VmallocUsed should be relative to VmallocTotal. None of them should be relative to MemTotal.

octo · 2024-02-05T17:32:03Z

Btw. Available is defined in OTEL spec: https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemlinuxmemoryavailable

Note that this is a separate metric (system.linux.memory.available vs. system.memory.usage). That is becasue aggregations over all attributes of a given metric should be meaningful. The collectd 5 equivalent would be "aggregating metrics with the same plugin, plugin instance, and type should be meaningful".

I don't want to banish the "memory available" metric from collectd. But we cannot throw it in with the other memory metrics because these metrics must not be aggregated together.

octo added Bug A genuine bug linux only labels Jan 17, 2024

octo mentioned this issue Jan 17, 2024

Report MemAvailable when present in meminfo #3916

Merged

eero-t mentioned this issue Jan 26, 2024

Any release plans? 5.12.1 or 5.13.0 or 6.0.0 ? #4186

Open

zzzyhtheonly mentioned this issue Feb 8, 2024

memory.c: workaround to let ValuesPercentage works again without "available" #4279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory plugin: Metrics to not sum up to a stable physical memory size. #4237

memory plugin: Metrics to not sum up to a stable physical memory size. #4237

octo commented Jan 17, 2024

octo commented Jan 17, 2024 •

edited

eero-t commented Jan 17, 2024 •

edited

eero-t commented Jan 24, 2024

eero-t commented Jan 31, 2024

zzzyhtheonly commented Feb 5, 2024 •

edited

octo commented Feb 5, 2024

octo commented Feb 5, 2024

memory plugin: Metrics to not sum up to a stable physical memory size. #4237

memory plugin: Metrics to not sum up to a stable physical memory size. #4237

Comments

octo commented Jan 17, 2024

Expected behavior

Actual behavior

Steps to reproduce

octo commented Jan 17, 2024 • edited

eero-t commented Jan 17, 2024 • edited

eero-t commented Jan 24, 2024

eero-t commented Jan 31, 2024

zzzyhtheonly commented Feb 5, 2024 • edited

octo commented Feb 5, 2024

octo commented Feb 5, 2024

octo commented Jan 17, 2024 •

edited

eero-t commented Jan 17, 2024 •

edited

zzzyhtheonly commented Feb 5, 2024 •

edited