New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container: cache the seccomp generated bpf #1035
Conversation
@alexlarsson @rhatdan FYI |
This pull request introduces 2 alerts when merging 2794f74 into dcd40da - view on LGTM.com new alerts:
|
2794f74
to
63ba3b6
Compare
This pull request introduces 2 alerts when merging 63ba3b6 into dcd40da - view on LGTM.com new alerts:
|
Overall looks good to me, although I think the cache eviction policy can use some work |
5618c34
to
2bc31d8
Compare
This pull request introduces 2 alerts when merging 2bc31d8 into 022297a - view on LGTM.com new alerts:
|
2bc31d8
to
13d1ae7
Compare
This pull request introduces 2 alerts when merging 13d1ae7 into 022297a - view on LGTM.com new alerts:
|
src/libcrun/sha256.c
Outdated
#include <config.h> | ||
|
||
/* Specification. */ | ||
#if HAVE_OPENSSL_SHA256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A drive-by comment without actually reading most of the PR.)
Could this link to a mainstream crypto library instead? I’m not at all saying this code is wrong, but carrying an extra copy is going to be an ongoing cost WRT crypto reviews and the like.
Also, using a library could benefit from platform-specific hardware acceleration, e.g. specialized instructions. (OTOH initializing the library might be a bit slower, sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is static compilation an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would turn every crypto library security update into a crun security update.
(I’d expect the overhead to be in FIPS status checks and self-integrity checking more than just library size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure whether adding more code or using libgcrypt, so I am glad you commented here!
I've just pushed a new version where libgcrypt is linked dynamically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only being used to create key
for the cache, from looking at the code I see no intention of using it for any other task.
I think using library with static linking will just increase the object size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might well be the case that this, just using it for a hash key from trusted (?) data, is not even considered cryptography. I didn't analyze the code or the security assumptions in detail, against any set of requirements.
But even if that were true, it’s just easier to not have the code around and not to be asked to justify this.
Also once the code is included, it might get started to be used for other purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giuseppe @mtrmac just a small doubt and I could be wrong , isn't this going to load entire library at the beginning of the program ? plus it just adds to dependency.
- “Loading” a dynamically-linked library is just a
mmap
; that has the performance cost of a few syscalls, regardless of library size. And if that library is used in any other process, that might not hit the disk at all — while a statically-linked implementation increases the size of crun itself, which could require a larger disk read if crun is not in memory already. - There is some cost in dynamically linking, that primarily depends on the number of variable references (not code references) in the library
- A library then might run some initialization code; that might be non-trivial, but that’s also usually because it’s valuable.
If we are doing all of this for performance, it’s certainly worth measuring to avoid surprises.
Without measurement, it’s really anyone’s guess — library cost vs. the cost of using an inefficient portable implementation instead of things like https://en.wikipedia.org/wiki/Intel_SHA_extensions .
OTOH the packaging / review overhead of having to discuss this copy&pasted code is known, and forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac ah thanks for the detailed explanation. SGTM.
My take in previous comment was to port only small parts of sha256
into crun
's src maybe like: https://github.com/redis/redis/blob/unstable/src/sha256.c ( based on assumption that it is not being used for any security purpose )
But I agree that using a library adds lot of other benefits + (it has inbuilt hardware optimizations) with no maintenance and all these things easily outweighs maintaining custom code.
Plus I am not sure how to benchmark memory usage of dynamically loading libgcrypt
since most likely its already loaded by other process on my os, which is very likely true for most of the real use-case as well. ( @giuseppe any hints how can i benchmark this ? )
Thanks again. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re static liking:
compilers are generally smart enough to erase dead code. So static linking does not equate to the an increase of the size of all objects of the library.
re benchmarks:
- compare the sizes of statically linked with dynamically linked one
- compare performance of the two with hot and cold caches
- for memory usage, you can use valgrind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least on Fedora, we are already linking to libgcrypt through libsystemd, so it won't really affect us.
$ ldd /lib64/libsystemd.so.0
linux-vdso.so.1 (0x00007ffd1cf64000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f035d5b3000)
libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f035d500000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f035d4dc000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007f035d4d2000)
libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f035d395000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f035d375000)
libc.so.6 => /lib64/libc.so.6 (0x00007f035d000000)
/lib64/ld-linux-x86-64.so.2 (0x00007f035d6df000)
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f035d34d000)
$ ldd /usr/bin/crun
linux-vdso.so.1 (0x00007ffd2439f000)
libcriu.so.2 => /lib64/libcriu.so.2 (0x00007f18a8ec8000)
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f18a8deb000)
libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007f18a8dcb000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007f18a8dc1000)
libyajl.so.2 => /lib64/libyajl.so.2 (0x00007f18a8db5000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f18a8d95000)
libc.so.6 => /lib64/libc.so.6 (0x00007f18a8a00000)
libprotobuf-c.so.1 => /lib64/libprotobuf-c.so.1 (0x00007f18a8d87000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f18a8d5c000)
libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f18a8ca9000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f18a8c85000)
libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f18a88c3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f18a8f67000)
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f18a8c5d000)
13d1ae7
to
928682b
Compare
52db957
to
c7ca2fe
Compare
Where is this storing the seccomp cache? Not in temporary storage? |
c7ca2fe
to
97192f0
Compare
it is stored in the rundir, so
The main issue is that crun doesn't have any persistent storage. In addition, there are few variables that can cause it to change (crun version, libseccomp version, flags used), so getting it right to prepopulate the storage requires some work. |
But we still pay the price on the first crun run. |
What if we store some precomputed seccomp profiles in /usr/lib/crun/... somewhere and have an early systemd unit that just copies them to /run ? |
We can even use the |
src/libcrun/seccomp.c
Outdated
continue; | ||
|
||
/* The cache file is already used, it is pointless to free it. */ | ||
if (st.st_nlink > 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be nlink >= 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously :/ I'll fix it
src/libcrun/seccomp.c
Outdated
|
||
/* Heuristic to avoid the cache directory grows indefinitely. The inode size for | ||
the directory is proportional to the number of entries it contains. */ | ||
#define MAX_CACHE_DIR_INODE_SIZE 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an experiment, and on a tmpfs the size increased by 20 each time a file was added. That makes this 50 files.
I have no idea how common the use of different seccomp rules are though, so i have no idea what would make a good cache size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a better (and still cheap) condition to trigger the cleanup.
I thought about using $RUNDIR/crun/.seccomp-cache/$DIGEST/data
instead of $RUNDIR/crun/.seccomp-cache/$DIGEST
so that we could see immediately how many entries there are in .seccomp-cache looking at the st_nlink, but it is a bit more difficult to handle since all the atomic operations would require rename/renameat tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably fine, I was mostly worried about whether this is a good default size or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option could be to check how much bigger that $RUNDIR
it is. Since the $RUNDIR
size depends on the number of active containers, something like threshold = max(MAX_CACHE_DIR_INODE_SIZE, $RUNDIR.st_size * 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tweaked to:
#define MIN_CACHE_DIR_INODE_SIZE 1024
static inline off_t
get_cache_dir_inode_max_size (off_t size_rundir)
{
return size_rundir * 3 + MIN_CACHE_DIR_INODE_SIZE;
}
so we account for the number of containers as well.
(void) err; | ||
out[0] = 0; | ||
#endif | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully analyzed this, but this not returning an error here, or in the early NULL checks for container->container_def->linux->seccomp scares me. It feels like we may accidentally rely on the output digest even though it is not written to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only caller of this function is find_in_cache
that will return early when the checksum is an empty string:
ret = calculate_seccomp_checksum (ctx->container, ctx->options, ctx->checksum, err);
if (UNLIKELY (ret < 0))
return ret;
if (is_empty_string (ctx->checksum))
return 0;
I'll make it clearer and use a different argument instead of relying on the checksum return value.
we could teach crun to look up into this directory and copy the files if needed. We will then need a way to generate the bpf filters. It could be a different command, e.g. |
97192f0
to
9f44a45
Compare
pushed a new version. Now the cache is maintained under To simplify the task,
|
95c7cf4
to
4ec0398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM, Do we need to document cache behavior, cache directory and eviction policy in crun's man page ?
@alexlarsson @rhatdan PTAL
@giuseppe I think this needs rebase as well
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
4ec0398
to
22f0352
Compare
rebased.
I think these are implementation details that should not be part of the user documentation |
PROCESS_STRING (seccomp->default_action); | ||
for (i = 0; i < seccomp->flags_len; i++) | ||
PROCESS_STRING (seccomp->flags[i]); | ||
for (i = 0; i < seccomp->architectures_len; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is enough, or if we also want to hash the current architecture? Say we seccomp->archtetures is ["i386"]. Will this give the same result when run on a i386 arch and a x86_64 arch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, better be safe. Would hashing the result of uname(2)
be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a new version where we also hash uname(2)
Other than the arch comment, this LGTM |
store the generated seccomp bpf by the checksum of input data in the OCI configuration file. The cache is maintained under $RUNDIR/.cache/seccomp. Each file is named after its sha256 checksum, so that we can avoid the overhead of generating the bpf from the json configuration by calculating its checksum and seeing if it is already cached. If a file has the sticky bit set, then it is not evicted from the cache. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
22f0352
to
3ebaba3
Compare
LGTM |
store the generated seccomp bpf by the checksum of its input data.
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com