Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memory statistics in 4.2.0 beta1 are wrong #7591

Closed
mind04 opened this issue Mar 16, 2019 · 10 comments · Fixed by #10161
Closed

The memory statistics in 4.2.0 beta1 are wrong #7591

mind04 opened this issue Mar 16, 2019 · 10 comments · Fixed by #10161

Comments

@mind04
Copy link
Contributor

mind04 commented Mar 16, 2019

  • Program: Authoritative
  • Issue type: Bug

Short description

There is a huge jump in memory usage between alpha1 and beta1 in metronome

image

Total memory usage on the system is lower than the reported PowerDNS usage.

[root@ns2 etc]# free -m
              total        used        free      shared  buff/cache   available
Mem:            991         195         605           6         190         630
Swap:           819           0         819

Environment

  • Operating system: CentOS7
  • Software version: 4.2.0 beta1
  • Software source: self compiled
@mnordhoff
Copy link
Contributor

For me, on Ubuntu 16.04, with repo.powerdns.com master builds, the Metronome number is equal to /proc/PID/status's VmData plus VmStk numbers.

@ahupowerdns
Copy link
Contributor

This is related to #7502 - the precise memory measurements were too slow. We are very much open to suggestions how to get a better number that is not too slow to calculate. Note that many kernels are not slow to gather this number.. but some are. And it caused 750 millisecond slowdowns.

@omoerbeek
Copy link
Member

The new counting includes r/o pages and clean r/w pages. Those pages are assigned to the process. But they can be unassigned without cost. So you can count them either way. statm includes them.

@phonedph1
Copy link
Contributor

Wouldn't resident size in statm instead of data stay consistent with the previous version?

On dnsdist 1.4.0-alpha1 the special/non-special numbers are now quite different.

$ dnsdist -e 'dumpStats()' | grep memory
empty-queries                     0    real-memory-usage         2290548736
latency-slow                      0    special-memory-usage        45023232

@omoerbeek
Copy link
Member

The problem of "how much mem does a program use" in a demand paged virtual memory system having features like shared pages is essentially unanswerable in a value of 1 dimension.

"Resident" does not count pages in swap. "data" includes virtual pages allocated but not touched.
I used https://gist.github.com/omoerbeek/72aa95f7036407722a012be294d442a4
to compare the numbers in various scenarios.

The new measurement shows a different number, yes, but not wrong.
You could regard the name "real-memory-usage" misleading...

@omoerbeek
Copy link
Member

We should export all 5 memory stats available from /proc/PID/statm as metric.

@omoerbeek
Copy link
Member

The new measurement shows a different number, yes, but not wrong.

Well, it turns out it is wrong some of the time. See #10161

@omoerbeek
Copy link
Member

We should export all 5 memory stats available from /proc/PID/statm as metric.

No, that is not a good thing. A few numbers here are either useless (always zero) or wrong. See
https://www.kernel.org/doc/html/latest/filesystems/proc.html statm table.

@aldem
Copy link

aldem commented Jun 6, 2021

Sorry for reopening, but I believe this change (removal of drs from stats) is not the right thing to do.

First, the referenced kernel documentation is outdated - description of statm table mentions ancient pre-3.0 kernels thus unlikely it was ever changed since, and even then "broken" is clarifying that it also includes text (code), not only data+stack. In any case, there is no evidence that "drs" is broken or misbehaving (i.e. shows wrong data) in recent kernels (or even in 3.10 which is used by CentOS 7 - which is history by now anyway) - in my experience, it was always correct - shows requested (allocated) data+stack size.

Second, RSS by itself is meaningless - it shows only amount of RAM which is resident - in particular this means that if memory is aggressively swapped out and stays in swap (for instance, leaked pages which are never reference after initial use) RSS will never expose the problem.

VmData aka drs, on the other hand, shows requested (allocated) memory usage - and while it counts also not (yet) touched pages this is exactly what is needed in most cases - if memory is requested then it is likely will be used eventually - so we know the "aim" of the application. There are exceptions of course (apps using sparse memory) but I doubt that pdns is one of them.

Taking all this into account we could review the original problem - if pdns did request more RAM that available in the system this would only lead to issues a bit later - once it would start to use this memory, which would lead to heavy swapping or even OOM - thus the indication was quite useful.

With all this in mind I believe that both drs and rss should be available in stats.

@omoerbeek
Copy link
Member

Some comments:

  • As for correctness of drs, we have seen reports of being it clearly wrong.
  • depending on the malloc memory (dis)allocations strategy (in particular the question of releasing memory back to the OS) even if drs was correct, it can show many pages released by the application but marked as in-use from the kernel point of view since malloc hangs on to them.
  • I agree that RSS is meaningless on its own. Memory usage always has to be evaluated within context. A context that often is not even visible from the application itself.

I agree that the mem stats should be revisited and extended. smaps_rollup might contain more relevant info, without the costs assopciated with digging through smaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants