memcg: flush rstat for a single tree in memory.stat · bobrik/linux@50b6278

Commit

memcg: flush rstat for a single tree in memory.stat

Commit 11192d9 ("memcg: flush stats only if updated") made memory.stat
to flush the whole cgroup tree, which seems to be too expensive for use
cases like cadvisor, where we iterate through all cgroups and get stats
for each one.

Since commit 2d146aa ("mm: memcontrol: switch to rstat") memory
accounting switched to rstat, similar to cpu accounting. In cpu.stat only
one tree is flushed to read the cpu usage data. This means that for:

* cpu.stat: flush one tree (both cpu and memory)
* memory.stat: flush all cgroups (both cpu and memory)

When cadvisor iterates through every cgroups this means a lot of flushing,
which quickly adds up. Additionally, reading memory.stat and then cpu.stat
for every cgroup is more expensive than reading memory.stat for all of them
and then reading cpu.stat for all of them:

* https://gist.github.com/bobrik/5ba58fb75a48620a1965026ad30a0a13

    completed:  9.55s [manual / cpu-stat + mem-stat]
    completed:  1.52s [manual / mem-stat]
    completed:  0.04s [manual / cpu-stat]

The exact reason for this behavior is unclear and it's probably
a pathological case for our large servers (128 hardware threads)
with deeper than usual cgroup trees.

Having different flushing for reading memory.stat and cpu.stat doesn't
seem right. Let's use the same mechanism for both. This speeds things up:

    completed:  0.20s [manual / cpu-stat + mem-stat]
    completed:  0.12s [manual / mem-stat]
    completed:  0.07s [manual / cpu-stat]

When reading cpu.stat + memory.stat and measuring with bpftrace:

    $ sudo bpftrace -e 'kfunc:cgroup_rstat_flush_locked { if (comm == "derp") { @cgroup_rstat_flush_locked++; } } kfunc:mem_cgroup_css_rstat_flush { if (comm == "derp") { @mem_cgroup_css_rstat_flush++; }}'

Before:

    @cgroup_rstat_flush_locked: 1957
    @mem_cgroup_css_rstat_flush: 1913631

After:

    @cgroup_rstat_flush_locked: 2000
    @mem_cgroup_css_rstat_flush: 3548

The counts are not precise due to data races in bpftrace, but it's clear
that we're doing a lot less work with this patch, which translates in
a much lower runtime.

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>

Loading branch information

bobrik committed Aug 11, 2023

1 parent 830b3c6 commit 50b6278

mm/memcontrol.c

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1556,7 +1556,7 @@ static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
  
    	 *

    	 * Current memory state:

    	 */

    	mem_cgroup_flush_stats();

    	cgroup_rstat_flush(memcg->css.cgroup);

    	for (i = 0; i < ARRAY_SIZE(memory_stats); i++) {

    		u64 size;

0 comments on commit `50b6278`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `50b6278`

Commit

There are no files selected for viewing

0 comments on commit 50b6278

0 comments on commit `50b6278`