Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container: cache the seccomp generated bpf #1035

Merged
merged 9 commits into from Oct 24, 2022

Conversation

giuseppe
Copy link
Member

store the generated seccomp bpf by the checksum of its input data.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@giuseppe
Copy link
Member Author

@alexlarsson @rhatdan FYI

@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2022

This pull request introduces 2 alerts when merging 2794f74 into dcd40da - view on LGTM.com

new alerts:

  • 2 for FIXME comment

@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2022

This pull request introduces 2 alerts when merging 63ba3b6 into dcd40da - view on LGTM.com

new alerts:

  • 2 for FIXME comment

src/libcrun/container.c Outdated Show resolved Hide resolved
@alexlarsson
Copy link

Overall looks good to me, although I think the cache eviction policy can use some work

@giuseppe giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 5618c34 to 2bc31d8 Compare October 19, 2022 13:57
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2022

This pull request introduces 2 alerts when merging 2bc31d8 into 022297a - view on LGTM.com

new alerts:

  • 2 for FIXME comment

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2022

This pull request introduces 2 alerts when merging 13d1ae7 into 022297a - view on LGTM.com

new alerts:

  • 2 for FIXME comment

#include <config.h>

/* Specification. */
#if HAVE_OPENSSL_SHA256
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A drive-by comment without actually reading most of the PR.)

Could this link to a mainstream crypto library instead? I’m not at all saying this code is wrong, but carrying an extra copy is going to be an ongoing cost WRT crypto reviews and the like.

Also, using a library could benefit from platform-specific hardware acceleration, e.g. specialized instructions. (OTOH initializing the library might be a bit slower, sure.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is static compilation an option?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would turn every crypto library security update into a crun security update.

(I’d expect the overhead to be in FIPS status checks and self-integrity checking more than just library size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure whether adding more code or using libgcrypt, so I am glad you commented here!

I've just pushed a new version where libgcrypt is linked dynamically

Copy link
Collaborator

@flouthoc flouthoc Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only being used to create key for the cache, from looking at the code I see no intention of using it for any other task.

I think using library with static linking will just increase the object size.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might well be the case that this, just using it for a hash key from trusted (?) data, is not even considered cryptography. I didn't analyze the code or the security assumptions in detail, against any set of requirements.

But even if that were true, it’s just easier to not have the code around and not to be asked to justify this.

Also once the code is included, it might get started to be used for other purposes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe @mtrmac just a small doubt and I could be wrong , isn't this going to load entire library at the beginning of the program ? plus it just adds to dependency.

  • “Loading” a dynamically-linked library is just a mmap; that has the performance cost of a few syscalls, regardless of library size. And if that library is used in any other process, that might not hit the disk at all — while a statically-linked implementation increases the size of crun itself, which could require a larger disk read if crun is not in memory already.
  • There is some cost in dynamically linking, that primarily depends on the number of variable references (not code references) in the library
  • A library then might run some initialization code; that might be non-trivial, but that’s also usually because it’s valuable.

If we are doing all of this for performance, it’s certainly worth measuring to avoid surprises.

Without measurement, it’s really anyone’s guess — library cost vs. the cost of using an inefficient portable implementation instead of things like https://en.wikipedia.org/wiki/Intel_SHA_extensions .

OTOH the packaging / review overhead of having to discuss this copy&pasted code is known, and forever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac ah thanks for the detailed explanation. SGTM.

My take in previous comment was to port only small parts of sha256 into crun's src maybe like: https://github.com/redis/redis/blob/unstable/src/sha256.c ( based on assumption that it is not being used for any security purpose )

But I agree that using a library adds lot of other benefits + (it has inbuilt hardware optimizations) with no maintenance and all these things easily outweighs maintaining custom code.

Plus I am not sure how to benchmark memory usage of dynamically loading libgcrypt since most likely its already loaded by other process on my os, which is very likely true for most of the real use-case as well. ( @giuseppe any hints how can i benchmark this ? )

Thanks again. :)

Copy link
Member

@vrothberg vrothberg Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re static liking:
compilers are generally smart enough to erase dead code. So static linking does not equate to the an increase of the size of all objects of the library.

re benchmarks:

  • compare the sizes of statically linked with dynamically linked one
  • compare performance of the two with hot and cold caches
  • for memory usage, you can use valgrind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least on Fedora, we are already linking to libgcrypt through libsystemd, so it won't really affect us.

$ ldd /lib64/libsystemd.so.0
	linux-vdso.so.1 (0x00007ffd1cf64000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f035d5b3000)
	libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f035d500000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f035d4dc000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f035d4d2000)
	libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f035d395000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f035d375000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f035d000000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f035d6df000)
	libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f035d34d000)

$ ldd /usr/bin/crun 
	linux-vdso.so.1 (0x00007ffd2439f000)
	libcriu.so.2 => /lib64/libcriu.so.2 (0x00007f18a8ec8000)
	libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f18a8deb000)
	libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007f18a8dcb000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f18a8dc1000)
	libyajl.so.2 => /lib64/libyajl.so.2 (0x00007f18a8db5000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f18a8d95000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f18a8a00000)
	libprotobuf-c.so.1 => /lib64/libprotobuf-c.so.1 (0x00007f18a8d87000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f18a8d5c000)
	libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f18a8ca9000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f18a8c85000)
	libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f18a88c3000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f18a8f67000)
	libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f18a8c5d000)

@giuseppe giuseppe changed the title [WIP] container: cache the seccomp generated bpf container: cache the seccomp generated bpf Oct 19, 2022
@giuseppe giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 52db957 to c7ca2fe Compare October 19, 2022 18:38
@rhatdan
Copy link
Member

rhatdan commented Oct 19, 2022

Where is this storing the seccomp cache? Not in temporary storage?
Can we pre-populate the storage so that on first boot the compiled version exists?

@giuseppe
Copy link
Member Author

Where is this storing the seccomp cache? Not in temporary storage?

it is stored in the rundir, so /run/crun/.seccomp-cache for root, /run/user/%UID/crun/.seccomp-cache for unprivileged users.

Can we pre-populate the storage so that on first boot the compiled version exists?

The main issue is that crun doesn't have any persistent storage.

In addition, there are few variables that can cause it to change (crun version, libseccomp version, flags used), so getting it right to prepopulate the storage requires some work.

@rhatdan
Copy link
Member

rhatdan commented Oct 19, 2022

But we still pay the price on the first crun run.

@alexlarsson
Copy link

What if we store some precomputed seccomp profiles in /usr/lib/crun/... somewhere and have an early systemd unit that just copies them to /run ?

@alexlarsson
Copy link

We can even use the C option in a systemd tmpfile.d snippet:
https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html#

continue;

/* The cache file is already used, it is pointless to free it. */
if (st.st_nlink > 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be nlink >= 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously :/ I'll fix it


/* Heuristic to avoid the cache directory grows indefinitely. The inode size for
the directory is proportional to the number of entries it contains. */
#define MAX_CACHE_DIR_INODE_SIZE 1024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an experiment, and on a tmpfs the size increased by 20 each time a file was added. That makes this 50 files.
I have no idea how common the use of different seccomp rules are though, so i have no idea what would make a good cache size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a better (and still cheap) condition to trigger the cleanup.

I thought about using $RUNDIR/crun/.seccomp-cache/$DIGEST/data instead of $RUNDIR/crun/.seccomp-cache/$DIGEST so that we could see immediately how many entries there are in .seccomp-cache looking at the st_nlink, but it is a bit more difficult to handle since all the atomic operations would require rename/renameat tricks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably fine, I was mostly worried about whether this is a good default size or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option could be to check how much bigger that $RUNDIR it is. Since the $RUNDIR size depends on the number of active containers, something like threshold = max(MAX_CACHE_DIR_INODE_SIZE, $RUNDIR.st_size * 3)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tweaked to:

#define MIN_CACHE_DIR_INODE_SIZE 1024

static inline off_t
get_cache_dir_inode_max_size (off_t size_rundir)
{
  return size_rundir * 3 + MIN_CACHE_DIR_INODE_SIZE;
}

so we account for the number of containers as well.

(void) err;
out[0] = 0;
#endif
return 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully analyzed this, but this not returning an error here, or in the early NULL checks for container->container_def->linux->seccomp scares me. It feels like we may accidentally rely on the output digest even though it is not written to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only caller of this function is find_in_cache that will return early when the checksum is an empty string:

  ret = calculate_seccomp_checksum (ctx->container, ctx->options, ctx->checksum, err);
  if (UNLIKELY (ret < 0))
    return ret;

  if (is_empty_string (ctx->checksum))
    return 0;

I'll make it clearer and use a different argument instead of relying on the checksum return value.

@giuseppe
Copy link
Member Author

What if we store some precomputed seccomp profiles in /usr/lib/crun/... somewhere and have an early systemd unit that just copies them to /run ?

we could teach crun to look up into this directory and copy the files if needed. We will then need a way to generate the bpf filters. It could be a different command, e.g. crun make-cache

@giuseppe
Copy link
Member Author

pushed a new version.

Now the cache is maintained under $RUNDIR/.cache/seccomp so an external tool can just copy the .cache directory.

To simplify the task, crun --version shows the rundir:

$ ./crun --version | grep ^rundir:
rundir: /run/user/1000/crun

$ sudo ./crun --version | grep ^rundir:
rundir: /run/crun

@giuseppe giuseppe marked this pull request as ready for review October 20, 2022 09:46
@giuseppe giuseppe force-pushed the seccomp-cache branch 2 times, most recently from 95c7cf4 to 4ec0398 Compare October 20, 2022 12:30
@giuseppe
Copy link
Member Author

@rhatdan @flouthoc @alexlarsson PTAL

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM, Do we need to document cache behavior, cache directory and eviction policy in crun's man page ?

@alexlarsson @rhatdan PTAL

@giuseppe I think this needs rebase as well

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

rebased.

Do we need to document cache behavior, cache directory and eviction policy in crun's man page ?

I think these are implementation details that should not be part of the user documentation

PROCESS_STRING (seccomp->default_action);
for (i = 0; i < seccomp->flags_len; i++)
PROCESS_STRING (seccomp->flags[i]);
for (i = 0; i < seccomp->architectures_len; i++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is enough, or if we also want to hash the current architecture? Say we seccomp->archtetures is ["i386"]. Will this give the same result when run on a i386 arch and a x86_64 arch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, better be safe. Would hashing the result of uname(2) be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a new version where we also hash uname(2)

@alexlarsson
Copy link

Other than the arch comment, this LGTM

store the generated seccomp bpf by the checksum of input data in the
OCI configuration file.

The cache is maintained under $RUNDIR/.cache/seccomp.  Each file is
named after its sha256 checksum, so that we can avoid the overhead of
generating the bpf from the json configuration by calculating its
checksum and seeing if it is already cached.

If a file has the sticky bit set, then it is not evicted from the cache.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Oct 24, 2022

LGTM

@rhatdan rhatdan merged commit 5023776 into containers:main Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants