Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$mem returns incorrect value for used RAM (compared to htop) #130

Closed
OmegaPhil opened this issue Aug 13, 2015 · 20 comments
Closed

$mem returns incorrect value for used RAM (compared to htop) #130

OmegaPhil opened this issue Aug 13, 2015 · 20 comments

Comments

@OmegaPhil
Copy link

Debian Testing has recently upgraded conky to v1.10.0-1 - I have noticed that $mem is returning ~3GB less used memory than htop, trusting the latter, RAM usage is currently 11365/32225MB, yet $mem reports 8.06GB used?

@ghost
Copy link

ghost commented Aug 14, 2015

Do you have by any chance no_buffers = true enabled in your config?

A better explanation than i can provide: https://unix.stackexchange.com/questions/65835/htop-reporting-much-higher-memory-usage-than-free-or-top (although it is for openSUSE and i don't know whether or not Debian has the "patched" version)

@OmegaPhil
Copy link
Author

When no_buffers is set to no, $mem behaves the same as $memwithbuffers.

Interesting, in top and free, its saying that ~7991MB is used VS conky's 8.19GB VS 11491MB with htop, so that suggests that htop is the one getting it wrong...

@OmegaPhil
Copy link
Author

I have reported a Debian bug for htop (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795776), will see what the verdict is.

@OmegaPhil
Copy link
Author

What do you make of this - htop programmer says his calculation is simple and therefore correct: hishamhm/htop#242

@marcpayne
Copy link
Contributor

On Arch, with htop 1.0.3, free 3.3.11, and conky from git master, here is what I've found. Each program has a different formula for computing the used memory:

Program My Used Memory (GiB) Formula (KiB)
htop 2.54 MemTotal - MemFree - Buffers - Cached
free 2.37 MemTotal - MemFree - Buffers - Cached - SReclaimable - SUnreclaim [1]
conky 2.46 MemTotal - MemFree - Buffers - Cached - SReclaimable + Shmem

Each value comes from /proc/meminfo. I inferred the formula for free by accounting for the discrepancy between free and htop, not by looking at the source code (so I might be incorrect... but my calculations show I'm spot on).

The following snippet from conky's linux.cc (around line 209) gives some rationale for its algorithm:

info.mem = info.memwithbuffers = info.memmax - info.memfree;
info.memeasyfree = info.memfree;
info.swap = info.swapmax - info.swapfree;

/* Reclaimable memory: does not include shared memory, which is part of cached but unreclaimable.
   Includes the reclaimable part of the Slab cache though.
   Note: when shared memory is swapped out, shmem decreases and swapfree decreases - we want this.
*/
info.bufmem = (info.cached - shmem) + info.buffers + sreclaimable;

/* Now (info.mem - info.bufmem) is the *really used* (aka unreclaimable) memory.
   When this value reaches the size of the physical RAM, and swap is full or non-present, OOM happens.
   Therefore this is the value users want to monitor, regarding their RAM.
*/

Hopefully this clarifies what is happening. I can't comment on which formula is "correct" or most accurate because I don't know.

[1] free from procps-ng uses MemTotal - MemFree - Buffers - Cached - Slab, but it turns out that Slab is just SReclaimable + SUnreclaim. If the used memory comes out negative, free uses MemTotal - MemFree. See the source.

@OmegaPhil
Copy link
Author

Thanks for this - looks like it needs a research project from me to see how I want to go forward. Regardless, for key tools (that are trusted to report reality) like free and htop to disagree so wildly is incompetent and unacceptable, but I'm not in a position to be able to push for a solution atm.

@OmegaPhil
Copy link
Author

Right, spent some time with this - so basically for used memory, htop does not subtract slab, conky subtracts the slab but without unreclaimable (which makes sense), and free subtracts the whole slab (which includes unreclaimable and therefore doesn't make sense).

So with this, conky is right and the others are wrong - according to kernel Documentation/filesystems/proc.txt:

Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure

For an official definition of slab (procps sysinfo.c:meminfo also comments that SReclaimable is 'dentry and inode structures')

I'll ask why free includes SReclaimable in its used memory count.

@OmegaPhil
Copy link
Author

Debian bug reported at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799716 (looks like the maintainer is also upstream)

@marcpayne
Copy link
Contributor

@OmegaPhil This is very interesting, and I think your analysis is correct. I look forward to hearing what the free devs say.

Also, I corrected my comment above. I mistakenly wrote "free" in place of "used" in a couple spots. That would have been confusing for anyone watching this discussion.

@OmegaPhil
Copy link
Author

Just an update - I'm back on this task, free was finally fixed at the end of October to ignore SUnreclaim, so now it agrees with conky. I'll see if htop can be fixed.

@OmegaPhil
Copy link
Author

The situation is bad with htop, its been a known issue since at least April 2014, even has a pull request for a partial fix, but has been mainly ignored... have offered to help there but I'm not sure thats going to change things in the short term.

Sorry for doubting conky! Congratulations for having a tool more accurate than the dedicated official ones for this purpose!

@OmegaPhil
Copy link
Author

Hmm, might be premature to announce that accuracy... so, conky includes Shmem in used memory - is this not double counting?

The number of SHMEM pages comes from mm/page_alloc.c:si_meminfo:

val->sharedram = global_page_state(NR_SHMEM);

global_page_state is defined in include/linux/vmstat.h, all its doing is reading ordinal NR_SHMEM from an array.

Looking up what NR_SHMEM is, include/linux/mmzone.h defines in the zone_stat_item enum:

/* shmem pages (included tmpfs/GEM pages) */

Looking into it as a filesystem lead me to this page - apparently the filesystem is kernel-managed and hidden from userspace, so can't really go further there.

Looking into what a process' RSS actually is eventually lead me to include/linux/mm.h:get_mm_rss:

return get_mm_counter(mm, MM_FILEPAGES) + get_mm_counter(mm, MM_ANONPAGES);

Googling on MM_FILEPAGES and MM_ANONPAGES lead me to this interesting patch:

Currently looking at /proc//status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.

This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages instead of MM_FILEPAGES.

So... that suggests that Shmem is already included in the normal process RSS value. So tools such as free obviously already cope with the 'normal' memory processes use, yet they don't take Shmem from meminfo into account - so why is conky using it here?

Edit: conky's src/linux.cc says:

Reclaimable memory: does not include shared memory, which is part of cached but unreclaimable

Does that make sense with what I've found? How can it be part of the cache but be in a process' RSS?

@hishamhm
Copy link

@OmegaPhil, was that "Add shmem resident memory accounting" patch accepted?

As fas as I understand it, since shmem includes tmpfs, it should be the case that:

  • it does not belong to a single process's memory (since tmpfs is mounted as a filesystem, and not tied to one process)
  • is not reclaimable as cache (since the memory in use with files will remain for that purpose until the files are deleted or the tmpfs is unmounted)

(Side note: I do have vague memories that tmpfs does not really lock physical RAM for its entire size, so it might be the case that "free space" in a tmpfs partition is reclaimable, but don't quote me on that.)

Anyway, from my understanding it shouldn't be counted as part of a process' space.

@OmegaPhil
Copy link
Author

Well, something is going on - the patches are currently being discussed, mainly here and here

E.g.:

> + VmShm                         size of resident shmem memory

Needs to say that includes mappings of tmpfs, and needs to say that
it's a subset of VmRSS.  Better placed immediately after VmRSS...

Resident Set Size is considered the true allocated memory for a program?

@hishamhm
Copy link

Resident Set Size is considered the true allocated memory for a program?

There is no simple answer for "this program is taking this much memory", because of shared pages. So it's always a confusing metric for users, that's why the default top UI uses VIRT, RES and SHR, and that's why I kept the UI. SHR is a subset of RES, but SHR memory may or may not be really shared.

As OSes get more advanced with various forms of caching and sharing, things are only getting more complicated, and from the links you said, it's pretty much in flux (and when even Andrew Morton posts "So now this little reader is all confused", imagine how I feel trying to follow that :) )

@OmegaPhil
Copy link
Author

tldr; Cached is wrong, conky is correct, Shmem needs to be removed from Cached.

I went back to looking into Shmem to get a better feeling for shared pages etc - the best resource I've found so far is the old link I gave before relating to kernel v2,4, 2.6 hosted on the kernel site, Understanding The Linux Virtual Memory Manager, the Shared Memory Virtual Filesystem chapter.

The shared memory we are interested in are non file-backed mmap'd pages and ones created via shmget - this 'shared virtual memory filesystem' has been created specially to store these pages and allow their manipulation through a normal filesystem API, even though the relevant data has nothing to do with a real file. Since files aren't involved, the pages can't be viewed as a cache that can be evicted and just read in again. It turns out that the 'shm' filesystem the book refers to is actually tmpfs nowadays - reading the tmpfs documentation, it clearly says it has superceeded shmfs, and that one of its jobs is to be the hidden kernel filesystem for 'shared anonymous mappings and SYSV shared memory'. It also says:

'Since tmpfs lives completely in the page cache and on swap, all tmpfs
pages currently in memory will show up as cached. It will not show up
as shared or something like that'

This is unacceptable as the whole point of Cached is that its evictable data, i.e. 'not really used memory' - so this demonstrates there is a real problem, and meminfo's Shmem does need to be removed from Cached.

So this backs up what conky is doing, introduced in this commit and backed up by the author's blog post where he found that shitloads of data in a tmpfs and very large shared memory usage by postgres was causing OOM issues even when lots of free memory was reported.

@OmegaPhil
Copy link
Author

Just a heads up, but the shmem issue has come back on the procps mailing list:

http://www.freelists.org/post/procps/OmegaPhilxxxxxxxxxxxxx-Bug799716-free-considers-cached-to-include-SUnreclaim,7

Jaromir is saying that only Available has any use for determining free memory, although the associated calculations look pretty dodgy.

avagin pushed a commit to avagin/procps-task-diag that referenced this issue Mar 30, 2016
The previous commit added all of slab into the cache value. The
thing was is cached in this context is something you can get
back and reclaim if under memory pressure.

The problem was slab parameter includes both reclaimable and
unreclaimable values which doesn't make sense in this context.
This commit make cached only use the reclaimable component.

References:
 http://www.freelists.org/post/procps/OmegaPhilxxxxxxxxxxxxx-Bug799716-free-considers-cached-to-include-SUnreclaim
 brndnmtthws/conky#130
 https://bugs.debian.org/799716

Commits:
 6cb75ef
@OmegaPhil
Copy link
Author

@lasers
Copy link
Contributor

lasers commented Aug 4, 2018

$mem returns incorrect value for used RAM (compared to htop) #130

Hi everybody. Housekeeper here. Are we good here to close this issue? [[${mem} / ${memwithbuffers} / ${memmax} :: SWAP ${swap}/${swapmax}] seems identical to htop. Let me know. Thanks.

@OmegaPhil
Copy link
Author

Yep, is fine here with 1.10.8-1 - thanks for the ping, closing.

ugiwgh pushed a commit to ugiwgh/procps-ng that referenced this issue Sep 4, 2018
The previous commit added all of slab into the cache value. The
thing was is cached in this context is something you can get
back and reclaim if under memory pressure.

The problem was slab parameter includes both reclaimable and
unreclaimable values which doesn't make sense in this context.
This commit make cached only use the reclaimable component.

References:
 http://www.freelists.org/post/procps/OmegaPhilxxxxxxxxxxxxx-Bug799716-free-considers-cached-to-include-SUnreclaim
 brndnmtthws/conky#130
 https://bugs.debian.org/799716

Commits:
 05d751c
 6cb75ef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants