container: cache the seccomp generated bpf #1035

giuseppe · 2022-10-18T22:42:48Z

store the generated seccomp bpf by the checksum of its input data.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

giuseppe · 2022-10-18T22:43:48Z

@alexlarsson @rhatdan FYI

lgtm-com · 2022-10-18T22:50:02Z

This pull request introduces 2 alerts when merging 2794f74 into dcd40da - view on LGTM.com

new alerts:

2 for FIXME comment

lgtm-com · 2022-10-18T23:29:59Z

This pull request introduces 2 alerts when merging 63ba3b6 into dcd40da - view on LGTM.com

new alerts:

2 for FIXME comment

src/libcrun/container.c

alexlarsson · 2022-10-19T08:50:15Z

Overall looks good to me, although I think the cache eviction policy can use some work

lgtm-com · 2022-10-19T14:04:48Z

This pull request introduces 2 alerts when merging 2bc31d8 into 022297a - view on LGTM.com

new alerts:

2 for FIXME comment

lgtm-com · 2022-10-19T14:24:03Z

This pull request introduces 2 alerts when merging 13d1ae7 into 022297a - view on LGTM.com

new alerts:

2 for FIXME comment

mtrmac · 2022-10-19T17:08:07Z

src/libcrun/sha256.c

+#include <config.h>
+
+/* Specification.  */
+#if HAVE_OPENSSL_SHA256


(A drive-by comment without actually reading most of the PR.)

Could this link to a mainstream crypto library instead? I’m not at all saying this code is wrong, but carrying an extra copy is going to be an ongoing cost WRT crypto reviews and the like.

Also, using a library could benefit from platform-specific hardware acceleration, e.g. specialized instructions. (OTOH initializing the library might be a bit slower, sure.)

Is static compilation an option?

That would turn every crypto library security update into a crun security update.

(I’d expect the overhead to be in FIPS status checks and self-integrity checking more than just library size.

I was not sure whether adding more code or using libgcrypt, so I am glad you commented here!

I've just pushed a new version where libgcrypt is linked dynamically

I think this is only being used to create key for the cache, from looking at the code I see no intention of using it for any other task.

I think using library with static linking will just increase the object size.

It might well be the case that this, just using it for a hash key from trusted (?) data, is not even considered cryptography. I didn't analyze the code or the security assumptions in detail, against any set of requirements.

But even if that were true, it’s just easier to not have the code around and not to be asked to justify this.

Also once the code is included, it might get started to be used for other purposes.

@giuseppe @mtrmac just a small doubt and I could be wrong , isn't this going to load entire library at the beginning of the program ? plus it just adds to dependency.

“Loading” a dynamically-linked library is just a mmap; that has the performance cost of a few syscalls, regardless of library size. And if that library is used in any other process, that might not hit the disk at all — while a statically-linked implementation increases the size of crun itself, which could require a larger disk read if crun is not in memory already.

There is some cost in dynamically linking, that primarily depends on the number of variable references (not code references) in the library

A library then might run some initialization code; that might be non-trivial, but that’s also usually because it’s valuable.

If we are doing all of this for performance, it’s certainly worth measuring to avoid surprises.

Without measurement, it’s really anyone’s guess — library cost vs. the cost of using an inefficient portable implementation instead of things like https://en.wikipedia.org/wiki/Intel_SHA_extensions .

OTOH the packaging / review overhead of having to discuss this copy&pasted code is known, and forever.

@mtrmac ah thanks for the detailed explanation. SGTM.

My take in previous comment was to port only small parts of sha256 into crun's src maybe like: https://github.com/redis/redis/blob/unstable/src/sha256.c ( based on assumption that it is not being used for any security purpose )

But I agree that using a library adds lot of other benefits + (it has inbuilt hardware optimizations) with no maintenance and all these things easily outweighs maintaining custom code.

Plus I am not sure how to benchmark memory usage of dynamically loading libgcrypt since most likely its already loaded by other process on my os, which is very likely true for most of the real use-case as well. ( @giuseppe any hints how can i benchmark this ? )

Thanks again. :)

re static liking:
compilers are generally smart enough to erase dead code. So static linking does not equate to the an increase of the size of all objects of the library.

re benchmarks:

compare the sizes of statically linked with dynamically linked one

compare performance of the two with hot and cold caches

for memory usage, you can use valgrind

at least on Fedora, we are already linking to libgcrypt through libsystemd, so it won't really affect us.

$ ldd /lib64/libsystemd.so.0 linux-vdso.so.1 (0x00007ffd1cf64000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f035d5b3000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f035d500000) liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f035d4dc000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f035d4d2000) libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f035d395000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f035d375000) libc.so.6 => /lib64/libc.so.6 (0x00007f035d000000) /lib64/ld-linux-x86-64.so.2 (0x00007f035d6df000) libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f035d34d000) $ ldd /usr/bin/crun linux-vdso.so.1 (0x00007ffd2439f000) libcriu.so.2 => /lib64/libcriu.so.2 (0x00007f18a8ec8000) libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f18a8deb000) libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007f18a8dcb000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f18a8dc1000) libyajl.so.2 => /lib64/libyajl.so.2 (0x00007f18a8db5000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f18a8d95000) libc.so.6 => /lib64/libc.so.6 (0x00007f18a8a00000) libprotobuf-c.so.1 => /lib64/libprotobuf-c.so.1 (0x00007f18a8d87000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f18a8d5c000) libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f18a8ca9000) liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f18a8c85000) libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f18a88c3000) /lib64/ld-linux-x86-64.so.2 (0x00007f18a8f67000) libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f18a8c5d000)

rhatdan · 2022-10-19T18:42:42Z

Where is this storing the seccomp cache? Not in temporary storage?
Can we pre-populate the storage so that on first boot the compiled version exists?

giuseppe · 2022-10-19T19:52:53Z

Where is this storing the seccomp cache? Not in temporary storage?

it is stored in the rundir, so /run/crun/.seccomp-cache for root, /run/user/%UID/crun/.seccomp-cache for unprivileged users.

Can we pre-populate the storage so that on first boot the compiled version exists?

The main issue is that crun doesn't have any persistent storage.

In addition, there are few variables that can cause it to change (crun version, libseccomp version, flags used), so getting it right to prepopulate the storage requires some work.

rhatdan · 2022-10-19T20:15:06Z

But we still pay the price on the first crun run.

alexlarsson · 2022-10-20T06:04:41Z

What if we store some precomputed seccomp profiles in /usr/lib/crun/... somewhere and have an early systemd unit that just copies them to /run ?

alexlarsson · 2022-10-20T06:10:19Z

We can even use the C option in a systemd tmpfile.d snippet:
https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#

alexlarsson · 2022-10-20T06:23:15Z

src/libcrun/seccomp.c

+            continue;
+
+          /* The cache file is already used, it is pointless to free it.  */
+          if (st.st_nlink > 2)


Shouldn't this be nlink >= 2?

obviously :/ I'll fix it

alexlarsson · 2022-10-20T06:28:09Z

src/libcrun/seccomp.c

+
+/* Heuristic to avoid the cache directory grows indefinitely.  The inode size for
+   the directory is proportional to the number of entries it contains.  */
+#define MAX_CACHE_DIR_INODE_SIZE 1024


I did an experiment, and on a tmpfs the size increased by 20 each time a file was added. That makes this 50 files.
I have no idea how common the use of different seccomp rules are though, so i have no idea what would make a good cache size.

I couldn't find a better (and still cheap) condition to trigger the cleanup.

I thought about using $RUNDIR/crun/.seccomp-cache/$DIGEST/data instead of $RUNDIR/crun/.seccomp-cache/$DIGEST so that we could see immediately how many entries there are in .seccomp-cache looking at the st_nlink, but it is a bit more difficult to handle since all the atomic operations would require rename/renameat tricks.

Its probably fine, I was mostly worried about whether this is a good default size or not.

another option could be to check how much bigger that $RUNDIR it is. Since the $RUNDIR size depends on the number of active containers, something like threshold = max(MAX_CACHE_DIR_INODE_SIZE, $RUNDIR.st_size * 3)

tweaked to:

#define MIN_CACHE_DIR_INODE_SIZE 1024 static inline off_t get_cache_dir_inode_max_size (off_t size_rundir) { return size_rundir * 3 + MIN_CACHE_DIR_INODE_SIZE; }

so we account for the number of containers as well.

alexlarsson · 2022-10-20T06:30:34Z

src/libcrun/seccomp.c

+  (void) err;
+  out[0] = 0;
+#endif
+  return 0;


I haven't fully analyzed this, but this not returning an error here, or in the early NULL checks for container->container_def->linux->seccomp scares me. It feels like we may accidentally rely on the output digest even though it is not written to.

the only caller of this function is find_in_cache that will return early when the checksum is an empty string:

ret = calculate_seccomp_checksum (ctx->container, ctx->options, ctx->checksum, err); if (UNLIKELY (ret < 0)) return ret; if (is_empty_string (ctx->checksum)) return 0;

I'll make it clearer and use a different argument instead of relying on the checksum return value.

giuseppe · 2022-10-20T07:56:50Z

What if we store some precomputed seccomp profiles in /usr/lib/crun/... somewhere and have an early systemd unit that just copies them to /run ?

we could teach crun to look up into this directory and copy the files if needed. We will then need a way to generate the bpf filters. It could be a different command, e.g. crun make-cache

giuseppe · 2022-10-20T08:28:24Z

pushed a new version.

Now the cache is maintained under $RUNDIR/.cache/seccomp so an external tool can just copy the .cache directory.

To simplify the task, crun --version shows the rundir:

$ ./crun --version | grep ^rundir:
rundir: /run/user/1000/crun

$ sudo ./crun --version | grep ^rundir:
rundir: /run/crun

giuseppe · 2022-10-21T14:21:07Z

@rhatdan @flouthoc @alexlarsson PTAL

flouthoc

Implementation LGTM, Do we need to document cache behavior, cache directory and eviction policy in crun's man page ?

@alexlarsson @rhatdan PTAL

@giuseppe I think this needs rebase as well

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2022-10-21T15:02:04Z

rebased.

Do we need to document cache behavior, cache directory and eviction policy in crun's man page ?

I think these are implementation details that should not be part of the user documentation

alexlarsson · 2022-10-24T07:54:52Z

src/libcrun/seccomp.c

+  PROCESS_STRING (seccomp->default_action);
+  for (i = 0; i < seccomp->flags_len; i++)
+    PROCESS_STRING (seccomp->flags[i]);
+  for (i = 0; i < seccomp->architectures_len; i++)


I wonder if this is enough, or if we also want to hash the current architecture? Say we seccomp->archtetures is ["i386"]. Will this give the same result when run on a i386 arch and a x86_64 arch?

agreed, better be safe. Would hashing the result of uname(2) be enough?

pushed a new version where we also hash uname(2)

alexlarsson · 2022-10-24T07:56:54Z

Other than the arch comment, this LGTM

store the generated seccomp bpf by the checksum of input data in the OCI configuration file. The cache is maintained under $RUNDIR/.cache/seccomp. Each file is named after its sha256 checksum, so that we can avoid the overhead of generating the bpf from the json configuration by calculating its checksum and seeing if it is already cached. If a file has the sticky bit set, then it is not evicted from the cache. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

rhatdan · 2022-10-24T11:29:14Z

LGTM

giuseppe force-pushed the seccomp-cache branch from 2794f74 to 63ba3b6 Compare October 18, 2022 23:23

flouthoc reviewed Oct 19, 2022

View reviewed changes

src/libcrun/container.c Outdated Show resolved Hide resolved

alexlarsson reviewed Oct 19, 2022

View reviewed changes

src/libcrun/container.c Outdated Show resolved Hide resolved

giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 5618c34 to 2bc31d8 Compare October 19, 2022 13:57

giuseppe force-pushed the seccomp-cache branch from 2bc31d8 to 13d1ae7 Compare October 19, 2022 14:17

mtrmac reviewed Oct 19, 2022

View reviewed changes

giuseppe force-pushed the seccomp-cache branch from 13d1ae7 to 928682b Compare October 19, 2022 18:32

giuseppe changed the title ~~[WIP] container: cache the seccomp generated bpf~~ container: cache the seccomp generated bpf Oct 19, 2022

giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 52db957 to c7ca2fe Compare October 19, 2022 18:38

giuseppe force-pushed the seccomp-cache branch from c7ca2fe to 97192f0 Compare October 19, 2022 19:52

alexlarsson reviewed Oct 20, 2022

View reviewed changes

giuseppe force-pushed the seccomp-cache branch from 97192f0 to 9f44a45 Compare October 20, 2022 08:27

giuseppe marked this pull request as ready for review October 20, 2022 09:46

giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 95c7cf4 to 4ec0398 Compare October 20, 2022 12:30

flouthoc approved these changes Oct 21, 2022

View reviewed changes

giuseppe added 8 commits October 21, 2022 17:00

crun: display rundir in --version output

0cffffe

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

exec: set context

81d3b16

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

configure.ac: add check for libgcrypt

dd310aa

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

container: compute seccomp options earlier

6861b2a

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

seccomp: add functions to calculate checksum

a2de8fb

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

container: move open_seccomp_output to seccomp

7a66ccc

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

seccomp: move copy bpf to seccomp

8cfcc8f

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

seccomp: use relative paths to open bpf

74d097b

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the seccomp-cache branch from 4ec0398 to 22f0352 Compare October 21, 2022 15:01

alexlarsson reviewed Oct 24, 2022

View reviewed changes

giuseppe force-pushed the seccomp-cache branch from 22f0352 to 3ebaba3 Compare October 24, 2022 10:43

rhatdan merged commit 5023776 into containers:main Oct 24, 2022

container: cache the seccomp generated bpf #1035

container: cache the seccomp generated bpf #1035

Conversation

giuseppe commented Oct 18, 2022

giuseppe commented Oct 18, 2022

lgtm-com bot commented Oct 18, 2022

lgtm-com bot commented Oct 18, 2022

alexlarsson commented Oct 19, 2022

lgtm-com bot commented Oct 19, 2022

lgtm-com bot commented Oct 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flouthoc Oct 19, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vrothberg Oct 20, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhatdan commented Oct 19, 2022

giuseppe commented Oct 19, 2022

rhatdan commented Oct 19, 2022

alexlarsson commented Oct 20, 2022

alexlarsson commented Oct 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseppe commented Oct 20, 2022

giuseppe commented Oct 20, 2022

giuseppe commented Oct 21, 2022

flouthoc left a comment

Choose a reason for hiding this comment

giuseppe commented Oct 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexlarsson commented Oct 24, 2022

rhatdan commented Oct 24, 2022

flouthoc Oct 19, 2022 •

edited

vrothberg Oct 20, 2022 •

edited