Skip to content

Commit

Permalink
memcg: flush rstat for a single tree in memory.stat
Browse files Browse the repository at this point in the history
Commit 11192d9 ("memcg: flush stats only if updated") made memory.stat
to flush the whole cgroup tree, which seems to be too expensive for use
cases like cadvisor, where we iterate through all cgroups and get stats
for each one.

Since commit 2d146aa ("mm: memcontrol: switch to rstat") memory
accounting switched to rstat, similar to cpu accounting. In cpu.stat only
one tree is flushed to read the cpu usage data. This means that for:

* cpu.stat: flush one tree (both cpu and memory)
* memory.stat: flush all cgroups (both cpu and memory)

When cadvisor iterates through every cgroups this means a lot of flushing,
which quickly adds up. Additionally, reading memory.stat and then cpu.stat
for every cgroup is more expensive than reading memory.stat for all of them
and then reading cpu.stat for all of them:

* https://gist.github.com/bobrik/5ba58fb75a48620a1965026ad30a0a13

    completed:  9.55s [manual / cpu-stat + mem-stat]
    completed:  1.52s [manual / mem-stat]
    completed:  0.04s [manual / cpu-stat]

The exact reason for this behavior is unclear and it's probably
a pathological case for our large servers (128 hardware threads)
with deeper than usual cgroup trees.

Having different flushing for reading memory.stat and cpu.stat doesn't
seem right. Let's use the same mechanism for both. This speeds things up:

    completed:  0.20s [manual / cpu-stat + mem-stat]
    completed:  0.12s [manual / mem-stat]
    completed:  0.07s [manual / cpu-stat]

When reading cpu.stat + memory.stat and measuring with bpftrace:

    $ sudo bpftrace -e 'kfunc:cgroup_rstat_flush_locked { if (comm == "derp") { @cgroup_rstat_flush_locked++; } } kfunc:mem_cgroup_css_rstat_flush { if (comm == "derp") { @mem_cgroup_css_rstat_flush++; }}'

Before:

    @cgroup_rstat_flush_locked: 1957
    @mem_cgroup_css_rstat_flush: 1913631

After:

    @cgroup_rstat_flush_locked: 2000
    @mem_cgroup_css_rstat_flush: 3548

The counts are not precise due to data races in bpftrace, but it's clear
that we're doing a lot less work with this patch, which translates in
a much lower runtime.

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
  • Loading branch information
bobrik committed Aug 11, 2023
1 parent 830b3c6 commit 50b6278
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion mm/memcontrol.c
Original file line number Diff line number Diff line change
Expand Up @@ -1556,7 +1556,7 @@ static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize)
*
* Current memory state:
*/
mem_cgroup_flush_stats();
cgroup_rstat_flush(memcg->css.cgroup);

for (i = 0; i < ARRAY_SIZE(memory_stats); i++) {
u64 size;
Expand Down

0 comments on commit 50b6278

Please sign in to comment.