docs: benchmark stage1 systemd resource usage #1788

jonboulle · 2015-11-24T22:08:53Z

Inside the two default stage1 backends (the standard systemd-nspawn-based one and the LKVM one), rkt uses systemd to manage the process lifecycle and provide some other features like logging (via systemd-journald). This means there is some resource usage overhead associated with each pod, beyond just the resource consumption of the constituent apps. This is by design and a known consequence of rkt's daemonless execution model.

For the most part, this overhead should be minimal - on the order of a few megabytes of memory and negligible CPU. But it's important to have a solid understanding of what exactly this overhead is, and to track it over time (e.g. to ensure we don't have any regressions which dramatically increase this overhead, particularly when looking to integrate rkt with cluster manager systems like Kubernetes.

The text was updated successfully, but these errors were encountered:

jonboulle · 2015-11-24T22:09:21Z

/cc @yifan-gu @dchen1107

vishh · 2015-12-02T19:24:25Z

cc @vishh

cgonyeo · 2015-12-08T00:19:03Z

I wrote a small go program that runs rkt and monitors it and its children's cpu and memory usage through proc.

For lack of a better place, I put it in a new repo: https://github.com/dgonyeo/rkt-monitor. If others like it perhaps a subdirectory in the rkt repo would be a better home for it.

Right now some of the logic is kinda hacky (panics everywhere), but it could be cleaned up fairly easily.

It has access to a lot of information, but right now only prints a small subset of it. That could be easily changed, if more detailed reports would be useful.

I imagine some CI could be set up to build/run this against rkt with a handful of ACIs, and flag any PRs that result in systemd, systemd-journal, or rktusing an abnormally high amount of resources.

Sample output:

systemd-journal(6741): seconds alive: 9  avg CPU: 0%  avg Mem: 29 kB  peak Mem: 45 kB
worker(6746): seconds alive: 9  avg CPU: 23%  avg Mem: 238 kB  peak Mem: 360 kB
rkt(6688): seconds alive: 10  avg CPU: 0%  avg Mem: 26 kB  peak Mem: 27 kB
systemd(6740): seconds alive: 9  avg CPU: 0%  avg Mem: 35 kB  peak Mem: 35 kB

alban · 2015-12-08T12:26:52Z

systemd-cgtop can also provide information. I have a quite old systemd on my host Fedora 22, so I am running systemd-cgtop from a debootstrap tree in order to use the last version:

$ sudo systemd-nspawn -D /home/alban/distro-trees/debian/ \
        --share-system \
        --bind /run --bind /var/run \
        systemd-cgtop --depth=4 --order=memory --iterations=1 --batch

Control Group                                          Procs   %CPU   Memory  Input/s Output/s
/                                                        194      -     5.9G        -        -
/machine.slice                                             8      -    19.7M        -        -
/machine.slice/ma...7f\x2d84cd\x2df562edf4c99d.scope       4      -     7.1M        -        -
/machine.slice/ma...7f\x2d9dc4\x2d3c82ca318a9d.scope       4      -     4.8M        -        -
/machine.slice/ma...df562edf4c99d.scope/system.slice       3      -     2.1M        -        -
/machine.slice/ma....scope/system.slice/etcd.service       1      -     1.4M        -        -
/machine.slice/ma...scope/system.slice/redis.service       1      -   768.0K        -        -
/machine.slice/ma...d3c82ca318a9d.scope/system.slice       3      -   108.0K        -        -
/machine.slice/ma...ope/system.slice/busybox.service       2      -   108.0K        -        -
/machine.slice/ma...x2d3c82ca318a9d.scope/init.scope       1      -        -        -        -
/machine.slice/ma...m.slice/systemd-journald.service       1      -        -        -        -
/machine.slice/ma...x2df562edf4c99d.scope/init.scope       1      -        -        -        -
/machine.slice/ma...m.slice/systemd-journald.service       1      -        -        -        -

In this example, I am running etcd and redis in the first pod and busybox in the second pod.

The first pod takes 7.2M memory in total but the applications etcd and redis only take 1.3M and 768.0K respectively. That's 5.1M overhead for this pod.

When several pods are running with the same stage1, surely all the pods are running the same version of systemd, systemd-journald, etc. It would be nice to check that they are loaded only one time in memory, thanks to shared libraries and overlay fs.

When checking the /proc/$SYSTEMD_PID/maps of each pod, I see that it uses the same inodes for e.g. /lib64/libselinux.so.1 across pods if overlay fs is enabled. And when overlay fs is disabled, it uses different inodes. Is there a way to check the memory pages are actually shared?

The flavor host copies the binaries systemd, systemd-journald, etc. in stage1 at each rkt run instead of relying on bind mounts or overlay fs. This is something that could be fixed with bind mounts.

jellonek · 2015-12-08T13:10:44Z

Informational - in next two weeks I will be working on adding Clear Containers bits into kvm flavor (shared base image with systemd/sshd/init/enter/gc binaries and then DAX ) what in result should greatly reduced memory overhead in this flavor.

alban · 2015-12-14T11:39:59Z

About systemd-cgtop: it prints more info when the accounting options are enabled in /etc/systemd/system.conf (+reboot):

DefaultMemoryAccounting=yes

systemd/systemd#2163

jonboulle · 2015-12-16T16:02:19Z

@dgonyeo bump

cgonyeo · 2015-12-16T21:57:28Z

I'm looking at how to use systemd-cgtop now, and the man page for it says that in order to get resource monitoring for a unit, to "add the CPUAccounting=1, MemoryAccounting=1 and BlockIOAccounting=1 settings in the unit files in question". I was planning on launching rkt via systemd-run, and I see that it has a flag, -p, for setting a unit property. This "takes an assignment in the same format as systemctl(1)'s set-property command", which another man page provides the following example for: systemctl set-property foobar.service CPUShares=777. In this example, does anyone know what foobar.service should be in my systemd-run call?

I did try to omit foobar.service, but the following resulted in systemd-cgtop not showing me resource usage:

sudo systemd-run -p CPUAccounting=1 -p MemoryAccounting=1 -p BlockIOAccounting=1 rkt run --insecure-options=image /home/derek/git/rkt-monitor/mem-stresser.aci

alban · 2015-12-16T22:27:10Z

@dgonyeo I think it is easier to enable the accounting options in /etc/systemd/system.conf on the host, and then it is no longer necessary to add it per unit.

cgonyeo · 2015-12-16T22:59:48Z

That does have the caveat that you'll only be able to use this thing on properly configured systems. That's easier for me though, so unless someone knows how to properly configured systemd-run for this I'll just go with that.

cgonyeo · 2015-12-18T01:33:22Z

I've got systemd-cgtop reporting memory/cpu usage now, but some things don't seem quite right to me:

The labels for the columns on the right are: Procs, %CPU, Memory, Input/s, Output/s

/system.slice/run-r558090defc7c407aba72f27d057a8156.service                                             4   94.0    84.4M        -        -
/system.slice/run-r558090defc7c407aba72f27d057a8156.service/system.slice                                2   22.2  1020.0K        -        -
/system.slice/run-r558090defc7c407aba72f27d057a8156.service/system.slice/rkt-log-stresser.service       1   22.2  1020.0K        -        -
/system.slice/run-r558090defc7c407aba72f27d057a8156.service/system.slice/systemd-journald.service       1      -        -        -        -

This is with the log-stresser ACI, that just spits out timestamps in an infinite loop to generate log messages.

On my machine systemd-journald always consumes - of memory, I can't see the usage of systemd in here, and the service is taking up way more memory than the pod itself is. Should I just move forward with remembering all the numbers systemd-cgtop gives me, or is there anything off here?

cgonyeo · 2015-12-22T23:02:11Z

Well until I figure out the systemd-cgtop thing (sorry, I've been slow on that), I just poked the gopsutil implementation again. Since I switched to using gopsutil instead of reading proc myself, I'm a little more confident in these numbers.

I also was unsure whether to display memory usage based on the VMS or RSS of a given process, but based on this SO post I just changed it to RSS.

CPU reporting is also currently broken. I know how to fix it, but it's not worth the time if I'm ditching this implementation soon anyway.

derek@rokot ~/git/rkt-monitor> sudo ./rkt-monitor log-stresser.aci
worker(7005): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 5 mB  peak Mem: 6 mB
rkt(6978): seconds alive: 10  avg CPU: 0.000000%  avg Mem: 2 mB  peak Mem: 2 mB
systemd(7000): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(7001): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 7 mB  peak Mem: 7 mB

jonboulle · 2015-12-23T15:32:35Z

CPU usage should be an easy fix right? IMHO let's get that going so we have a basic sample set of the whole picture.

jonboulle · 2016-01-22T11:15:18Z

ping @dgonyeo

cgonyeo · 2016-01-25T22:31:43Z

Ok, I finally had time to revisit this. I think the numbers are accurate, and I got the CPU reporting fixed:

derek@rokot ~/go/src/github.com/dgonyeo/rkt-monitor> go build && sudo ./rkt-monitor mem-stresser.aci
worker(14111): seconds alive: 9  avg CPU: 96.989163%  avg Mem: 35 mB  peak Mem: 35 mB
rkt(14086): seconds alive: 10  avg CPU: 2.990006%  avg Mem: 364 kB  peak Mem: 364 kB
systemd(14106): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(14107): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB

What else do you guys want to see? Maybe 95th percentile CPU and memory usage? More configuration over what it can print, or how rkt is called?

I'd like to add this to rkt's benchmark tests once we're happy with the state of this, ideally ASAP.

vishh · 2016-01-25T22:39:07Z

Can you describe what the workload does? Ideally, we will need the base
usage, and also usage under stress (heavy logging, or continual restarts,
etc).

On Mon, Jan 25, 2016 at 2:32 PM, Derek Gonyeo notifications@github.com
wrote:

Ok, I finally had time to revisit this. I think the numbers are accurate,
and I got the CPU reporting fixed:

derek@rokot ~/go/src/github.com/dgonyeo/rkt-monitor> go build && sudo ./rkt-monitor mem-stresser.aci
worker(14111): seconds alive: 9 avg CPU: 96.989163% avg Mem: 35 mB peak Mem: 35 mB
rkt(14086): seconds alive: 10 avg CPU: 2.990006% avg Mem: 364 kB peak Mem: 364 kB
systemd(14106): seconds alive: 9 avg CPU: 0.000000% avg Mem: 4 mB peak Mem: 4 mB
systemd-journal(14107): seconds alive: 9 avg CPU: 0.000000% avg Mem: 6 mB peak Mem: 6 mB

What else do you guys want to see? Maybe 95th percentile CPU and memory
usage? More configuration over what it can print, or how rkt is called?

I'd like to add this to rkt's benchmark tests once we're happy with the
state of this, ideally ASAP.

—
Reply to this email directly or view it on GitHub
#1788 (comment).

cgonyeo · 2016-02-10T15:54:09Z

I was imagining something along the lines of a make benchmark, so it'd be easy to see performance metrics on any given setup.

jonboulle · 2016-03-20T18:00:45Z

@dgonyeo can you pick this up for the next release?

cgonyeo · 2016-03-21T18:26:31Z

I'll be working on this this week. I'm going to switch over to using cadvisor (which until a PR is merged, will be a specific branch in @sjpotter's fork.

Scenarios I'm going to get benchmarks for:

An ACI that logs as much as possible
A pod with an absurd number of apps in it
Many pods at once

And I imagine it'll by default run these benchmarks with each stage1 available. If anyone has additional scenarios I should be looking at, please speak up.

jonboulle · 2016-03-24T12:32:45Z

If the rkt-monitor stuff is already working can we just land that as-is and consider moving to cadvisor later after things have settled over there?

cgonyeo · 2016-03-24T21:56:54Z

Made a PR for it: #2324

jonboulle · 2016-04-26T14:02:53Z

@dgonyeo before closing this out - could you produce some example stats please? e.g. against the last release and against master?

cgonyeo · 2016-04-26T22:47:35Z

Certainly.

derek@proton ~> cat /proc/cpuinfo | grep "model name"
model name  : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
model name  : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
model name  : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
model name  : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
derek@proton ~> uname -a                             
Linux proton 4.4.6 #1-NixOS SMP Wed Mar 16 15:43:17 UTC 2016 x86_64 GNU/Linux

v1.4.0

log-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor log-stresser.aci
rkt(18493): seconds alive: 10  avg CPU: 28.314541%  avg Mem: 2 mB  peak Mem: 2 mB
systemd(18515): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(18517): seconds alive: 9  avg CPU: 88.397098%  avg Mem: 7 mB  peak Mem: 7 mB
worker(18521): seconds alive: 9  avg CPU: 7.330367%  avg Mem: 5 mB  peak Mem: 6 mB

mem-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor mem-stresser.aci 
worker(18634): seconds alive: 9  avg CPU: 98.550401%  avg Mem: 318 mB  peak Mem: 555 mB
rkt(18599): seconds alive: 10  avg CPU: 3.583814%  avg Mem: 2 mB  peak Mem: 2 mB
systemd(18628): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(18630): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB

cpu-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor cpu-stresser.aci 
rkt(18706): seconds alive: 10  avg CPU: 3.587050%  avg Mem: 2 mB  peak Mem: 2 mB
systemd(18736): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(18740): seconds alive: 9  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB
worker(18744): seconds alive: 9  avg CPU: 88.937493%  avg Mem: 808 kB  peak Mem: 808 kB

too-many-apps.podmanifest

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor too-many-apps.podmanifest -d 30s
# Identical (aside from PID) worker-binary lines removed
worker-binary(24029): seconds alive: 18  avg CPU: 0.000000%  avg Mem: 840 kB  peak Mem: 840 kB
worker-binary(23691): seconds alive: 19  avg CPU: 0.000000%  avg Mem: 840 kB  peak Mem: 840 kB
rkt(23316): seconds alive: 23  avg CPU: 9.484321%  avg Mem: 5 mB  peak Mem: 20 mB
rkt(23378): seconds alive: 1  avg CPU: 0.000000%  avg Mem: 15 mB  peak Mem: 15 mB
systemd(23408): seconds alive: 19  avg CPU: 1.614304%  avg Mem: 16 mB  peak Mem: 16 mB
systemd-journal(23411): seconds alive: 19  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB

Master (`98da3e5`)

log-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor log-stresser.aci 
rkt(31364): seconds alive: 10  avg CPU: 36.442197%  avg Mem: 4 mB  peak Mem: 19 mB
systemd(31392): seconds alive: 8  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(31394): seconds alive: 8  avg CPU: 87.003957%  avg Mem: 7 mB  peak Mem: 7 mB
worker(31398): seconds alive: 8  avg CPU: 7.628103%  avg Mem: 5 mB  peak Mem: 6 mB

mem-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor mem-stresser.aci 
systemd(31624): seconds alive: 8  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(31626): seconds alive: 8  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB
worker(31630): seconds alive: 8  avg CPU: 96.679234%  avg Mem: 238 mB  peak Mem: 426 mB
rkt(31595): seconds alive: 10  avg CPU: 17.035268%  avg Mem: 4 mB  peak Mem: 19 mB

cpu-stresser.aci

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor cpu-stresser.aci 
systemd(31771): seconds alive: 8  avg CPU: 0.000000%  avg Mem: 4 mB  peak Mem: 4 mB
systemd-journal(31773): seconds alive: 8  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB
worker(31777): seconds alive: 8  avg CPU: 87.497889%  avg Mem: 808 kB  peak Mem: 808 kB
rkt(31745): seconds alive: 10  avg CPU: 16.936711%  avg Mem: 4 mB  peak Mem: 19 mB

too-many-apps.podmanifest

derek@proton ~/go/src/github.com/coreos/rkt/tests/rkt-monitor> sudo ./rkt-monitor too-many-apps.podmanifest -d 30s
# Idential (aside from PID) prepare-app and worker-binary lines removed
rkt(31853): seconds alive: 22  avg CPU: 14.676260%  avg Mem: 5 mB  peak Mem: 20 mB
systemd(31886): seconds alive: 18  avg CPU: 4.109397%  avg Mem: 15 mB  peak Mem: 16 mB
systemd-journal(31889): seconds alive: 18  avg CPU: 0.000000%  avg Mem: 6 mB  peak Mem: 6 mB
prepare-app(31971): seconds alive: 1  avg CPU: 0.000000%  avg Mem: 0 B  peak Mem: 0 B
worker-binary(32147): seconds alive: 17  avg CPU: 0.000000%  avg Mem: 840 kB  peak Mem: 840 kB

cgonyeo · 2016-04-26T22:47:58Z

Want those in the docs anywhere, or is this issue good enough?

jonboulle · 2016-04-27T11:35:46Z

Somewhere, please. Open to suggestions for what it looks like exactly. Some
prior art for example:
https://github.com/coreos/etcd/tree/master/Documentation/benchmarks

On Wed, Apr 27, 2016 at 12:48 AM, Derek Gonyeo notifications@github.com
wrote:

Want those in the docs anywhere, or is this issue good enough?

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1788 (comment)

vishh · 2016-06-01T22:10:36Z

cc @dchen1107

jonboulle added component/stage1 kind/documentation area/performance priority/P1 technology/systemd labels Nov 24, 2015

jonboulle added this to the v0.13.0 milestone Nov 24, 2015

jonboulle assigned cgonyeo Nov 25, 2015

alban mentioned this issue Dec 8, 2015

flavor "host": reduce memory consumption by sharing systemd pages across pods #1849

Open

alban modified the milestones: v0.15.0, v0.14.0 Dec 18, 2015

alban modified the milestones: v0.16.0, v0.15.0 Jan 7, 2016

alban modified the milestones: v1.0.0, v0.16.0 Jan 20, 2016

iaguis modified the milestones: v1.1.0, v1.0.0 Feb 2, 2016

alban modified the milestones: v1.2.0, v1.1.0 Feb 24, 2016

iaguis modified the milestones: v1+, v1.2.0 Mar 18, 2016

jonboulle modified the milestones: v1.3.0, v1+ Mar 20, 2016

jonboulle added the technology/kubernetes label Mar 20, 2016

alban modified the milestones: v1.4.0, v1.3.0 Mar 31, 2016

iaguis modified the milestones: v1.5.0, v1.4.0 Apr 14, 2016

cgonyeo mentioned this issue Apr 27, 2016

docs: added benchmarks folder, benchmarks for v1.4.0 #2520

Merged

s-urbaniak modified the milestones: v1.6.0, v1.5.0 Apr 28, 2016

jonboulle closed this as completed in #2520 Apr 28, 2016

jonboulle modified the milestones: v1.5.0, v1.6.0 Apr 28, 2016

lucab unassigned cgonyeo Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: benchmark stage1 systemd resource usage #1788

docs: benchmark stage1 systemd resource usage #1788

jonboulle commented Nov 24, 2015

jonboulle commented Nov 24, 2015

vishh commented Dec 2, 2015

cgonyeo commented Dec 8, 2015

alban commented Dec 8, 2015

jellonek commented Dec 8, 2015

alban commented Dec 14, 2015

jonboulle commented Dec 16, 2015

cgonyeo commented Dec 16, 2015

alban commented Dec 16, 2015

cgonyeo commented Dec 16, 2015

cgonyeo commented Dec 18, 2015

cgonyeo commented Dec 22, 2015

jonboulle commented Dec 23, 2015

jonboulle commented Jan 22, 2016

cgonyeo commented Jan 25, 2016

vishh commented Jan 25, 2016

cgonyeo commented Feb 10, 2016

jonboulle commented Mar 20, 2016

cgonyeo commented Mar 21, 2016

jonboulle commented Mar 24, 2016

cgonyeo commented Mar 24, 2016

jonboulle commented Apr 26, 2016

cgonyeo commented Apr 26, 2016

cgonyeo commented Apr 26, 2016

jonboulle commented Apr 27, 2016

vishh commented Jun 1, 2016

docs: benchmark stage1 systemd resource usage #1788

docs: benchmark stage1 systemd resource usage #1788

Comments

jonboulle commented Nov 24, 2015

jonboulle commented Nov 24, 2015

vishh commented Dec 2, 2015

cgonyeo commented Dec 8, 2015

alban commented Dec 8, 2015

jellonek commented Dec 8, 2015

alban commented Dec 14, 2015

jonboulle commented Dec 16, 2015

cgonyeo commented Dec 16, 2015

alban commented Dec 16, 2015

cgonyeo commented Dec 16, 2015

cgonyeo commented Dec 18, 2015

cgonyeo commented Dec 22, 2015

jonboulle commented Dec 23, 2015

jonboulle commented Jan 22, 2016

cgonyeo commented Jan 25, 2016

vishh commented Jan 25, 2016

cgonyeo commented Feb 10, 2016

jonboulle commented Mar 20, 2016

cgonyeo commented Mar 21, 2016

jonboulle commented Mar 24, 2016

cgonyeo commented Mar 24, 2016

jonboulle commented Apr 26, 2016

cgonyeo commented Apr 26, 2016

v1.4.0

log-stresser.aci

mem-stresser.aci

cpu-stresser.aci

too-many-apps.podmanifest

Master (98da3e5)

log-stresser.aci

mem-stresser.aci

cpu-stresser.aci

too-many-apps.podmanifest

cgonyeo commented Apr 26, 2016

jonboulle commented Apr 27, 2016

vishh commented Jun 1, 2016

Master (`98da3e5`)