-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch: Rebase to upstream 1.1.12 #2
Conversation
Add checking of downloaded tarball checksum. In case it doesn't match the hardcoded value, the error is like this: libseccomp-2.5.4.tar.gz: FAILED sha256sum: WARNING: 1 computed checksum did NOT match In case the checksum for a particular version is not specified in the script, the error will look like this: ./script/seccomp.sh: line 29: SECCOMP_SHA256[${ver}]: unbound variable In case the the hardcoded value in the file is of wrong format/length, we'll get: sha256sum: 'standard input': no properly formatted SHA256 checksum lines found In any of these cases, the script aborts (due to set -e). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 95f1e2e18872de54a17d64b2d808255463ee3d93) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Kir Kolyshkin (2): script/seccomp.sh: check tarball sha256 Dockerfile,scripts/release: bump libseccomp to v2.5.4 LGTMs: AkihiroSuda cyphar Closes opencontainers#3481
Remove upper bound in integer sanity check to not restrict the number of socket-activated sockets passed in. Closes opencontainers#3488 Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com> (cherry picked from commit 03a210d) Signed-off-by: Erik Sjölund <erik.sjolund@gmail.com>
…3489 [1.1] libcontainer: relax getenv_int sanity check
systemd emits very loud warnings when the path specified doesn't exist (which can be the case for some of our default rules). We don't need the ruleset we give systemd to be completely accurate (we discard some kinds of wildcard rules anyway) so we can safely skip adding these. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
…s-nonexistent-files [1.1] cgroups: systemd: skip adding device paths that don't exist
Perform some basic checks for CHANGELOG.md. In particular, check for - missing periods; - extra spaces at EOL; - non-ASCII characters. Fix the issues found. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* Changelog for v1.1.3. * Fixed 1.1.2 release date. * Fixed the order of footnotes. Note that backport (rather than original) PRs are listed as references, since this makes it easier to cross-check against the git log. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Kir Kolyshkin (3): VERSION: back to development VERSION: release 1.1.3 ci: add basic checks for CHANGELOG.md LGTMs: thaJeztah cyphar Closes opencontainers#3490
Due to a bug in commit 9c44407, when the user and mount namespaces are used, and the bind mount is followed by the cgroup mount in the spec, the cgroup is mounted using the bind mount's mount fd. This can be reproduced with podman 4.1 (when configured to use runc): $ podman run --uidmap 0:100:10000 quay.io/libpod/testimage:20210610 mount Error: /home/kir/git/runc/runc: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount /proc/self/fd/11:/sys/fs/cgroup/systemd (via /proc/self/fd/12), flags: 0x20502f: operation not permitted: OCI permission denied or manually with the spec mounts containing something like this: { "destination": "/etc/resolv.conf", "type": "bind", "source": "/userdata/resolv.conf", "options": [ "bind" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "rprivate", "nosuid", "noexec", "nodev", "relatime", "ro" ] } The issue was not found earlier since it requires using userns, and even then mount fd is ignored by mountToRootfs, except for bind mounts, and all the bind mounts have mountfd set, except for the case of cgroup v1's /sys/fs/cgroup which is internally transformed into a bunch of bind mounts. This is a minimal fix for the issue, suitable for backporting. A test case is added which reproduces the issue without the fix applied. Fixes: 9c44407 ("Open bind mount sources from the host userns") Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit d370e3c) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Kir Kolyshkin (1): libct: fix mounting via wrong proc fd LGTMs: AkihiroSuda cyphar Closes opencontainers#3511
The following failure was observed in CI (on centos-stream-8 in integration-cgroup suite): not ok 42 runc delete (from function `fail' in file tests/integration/helpers.bash, line 338, in test file tests/integration/delete.bats, line 30) `[ "$output" = "" ] || fail "cgroup not cleaned up correctly: $output"' failed .... cgroup not cleaned up correctly: /sys/fs/cgroup/pids/system.slice/tmp-bats\x2drun\x2d68012-runc.IPOypI-state-testbusyboxdelete-runc.zriC8C.mount /sys/fs/cgroup/cpu,cpuacct/system.slice/tmp-bats\x2drun\x2d68012-runc.IPOypI-state-testbusyboxdelete-runc.zriC8C.mount ... Apparently, this is a cgroup systemd creates for a mount unit which appears then runc does internal /proc/self/exe bind-mount. The test case should not take it into account. The second problem with this test is it does not check that cgroup actually exists when the container is running (so checking that it was removed after makes less sense). For example, in rootless mode the cgroup might not have been created. Fix the find arguments to look for a specific cgroup name, and add a check that these arguments are correct (i.e. the cgroup is found when the container is running). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 728571c) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
A couple of test cases in delete.bats check that a particular cgroup exists (or doesn't exist) using find. This is now resulting in errors like these: find: ‘/sys/fs/cgroup/blkio/azsec’: Permission denied find: ‘/sys/fs/cgroup/blkio/azsec_clamav’: Permission denied find: ‘/sys/fs/cgroup/cpu,cpuacct/azsec’: Permission denied find: ‘/sys/fs/cgroup/cpu,cpuacct/azsec_clamav’: Permission denied find: ‘/sys/fs/cgroup/memory/azsec’: Permission denied find: ‘/sys/fs/cgroup/memory/azsec_clamav’: Permission denied leading to test case failures. Apparently, GHA runs something else on a test box, so we get this. To fix, ignore non-zero exit code from find, and redirect its stderr to /dev/null. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a partial backport of commit 6e1d476 from main branch. Instead of specifying path to criu binary, use whatever is found in $PATH. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Older criu builds fail to work properly on CentOS Stream 9 due to changes in glibc's rseq. Skip criu tests if an older criu version is found. Fixes: opencontainers#3532 Cherry picked from commit 4fd4af5. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Kir Kolyshkin (4): CI: workaround CentOS Stream 9 criu issue tests/int: don't use --criu [1.1] ci: fix delete.bats for GHA tests/int: runc delete: fix flake, enable for rootless LGTMs: AkihiroSuda cyphar Closes opencontainers#3538
Signed-off-by: guodong <guodong9211@gmail.com>
guodong (1): [1.1] libct/nsenter: switch to sane_kill() LGTMs: AkihiroSuda cyphar Closes opencontainers#3536
When starting a new container, and the very last step of executing of a user process fails (last lines of (*linuxStandardInit).Init), it is too late to print a proper error since both the log pipe and the init pipe are closed. This is partially mitigated by using exec.LookPath() which is supposed to say whether we will be able to execute or not. Alas, it fails to do so when the binary to be executed resides on a filesystem mounted with noexec flag. A workaround would be to use access(2) with X_OK flag. Alas, it is not working when runc itself is a setuid (or setgid) binary. In this case, faccessat2(2) with AT_EACCESS can be used, but it is only available since Linux v5.8. So, use faccessat2(2) with AT_EACCESS if available. If not, fall back to access(2) for non-setuid runc, and do nothing for setuid runc (as there is nothing we can do). Note that this check if in addition to whatever exec.LookPath does. Fixes opencontainers#3520 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 957d97b) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[1.1] Fix error from runc run on noexec fs
When golang 1.19 is used to build unit tests on 386, it fails like this: sudo -E PATH="$PATH" -- make GOARCH=386 CGO_ENABLED=1 localunittest <...> go test -timeout 3m -tags "seccomp" -v ./... <...> # github.com/opencontainers/runc/libcontainer/capabilities.test runtime/cgo(.text): unknown symbol __stack_chk_fail_local in pcrel runtime/cgo(.text): unknown symbol __stack_chk_fail_local in pcrel runtime/cgo(.text): unknown symbol __stack_chk_fail_local in pcrel runtime/cgo(.text): unknown symbol __stack_chk_fail_local in pcrel runtime/cgo(.text): unknown symbol __stack_chk_fail_local in pcrel runtime/cgo(.text): relocation target __stack_chk_fail_local not defined runtime/cgo(.text): relocation target __stack_chk_fail_local not defined The fix is to add CGO_CFLAGS=-fno-stack-protector. See also: - docker-library/golang#426 - https://go.dev/issue/52919 - https://go.dev/issue/54313 - https://go-review.googlesource.com/c/go/+/421935 Cherry picked from commit 589a9d5. Conflict in .github/workflows/test.yml due to missing commit dafcacb. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[1.1] ci/gha: fix cross-386 job vs go 1.19
Recently released codespell 2.2 adds some more false positives, such as: ./Makefile:78: ro ==> to, row, rob, rod, roe, rot ./Makefile:88: ro ==> to, row, rob, rod, roe, rot ./notify_socket.go:51: ro ==> to, row, rob, rod, roe, rot ./LICENSE:128: complies ==> compiles ./go.sum:59: BU ==> BY ./types/features/features.go:17: ro ==> to, row, rob, rod, roe, rot ./libcontainer/rootfs_linux.go:52: ro ==> to, row, rob, rod, roe, rot ./libcontainer/rootfs_linux.go:166: ro ==> to, row, rob, rod, roe, rot .... ./tests/integration/cgroup_delegation.bats:38: inh ==> in ... To fix: - exclude go.sum; - add ro and complies to the list of ignored words; - s/inh/inherit in cgroup_delegation.bats. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit df9e32b) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
…l-2.2 [1.1] ci: fix for codespell 2.2
A regression reported for runc v1.1.3 says that "runc exec -t" fails after doing "systemctl daemon-reload": > exec failed: unable to start container process: open /dev/pts/0: operation not permitted: unknown Apparently, with commit 7219387 we are no longer adding "DeviceAllow=char-pts rwm" rule (because os.Stat("char-pts") returns ENOENT). The bug can only be seen after "systemctl daemon-reload" because runc also applies the same rules manually (by writing to devices.allow for cgroup v1), and apparently reloading systemd leads to re-applying the rules that systemd has (thus removing the char-pts access). The fix is to do os.Stat only for "/dev" paths. Also, emit a warning that the path was skipped. Since the original idea was to emit less warnings, demote the level to debug. Note this also fixes the issue of not adding "m" permission for block-* and char-* devices. A test case is added, which reliably fails before the fix on both cgroup v1 and v2. This is a backport of commit 58b1374 to release-1.1 branch. Fixes: opencontainers#3551 Fixes: 7219387 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[1.1] Fix failed exec after systemctl daemon-reload (regression in 1.1.3)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Akihiro Suda (2): VERSION: back to development Release 1.1.4 LGTMs: kolyshkin thaJeztah cyphar Closes opencontainers#3564
(This is a cherry-pick of 6fa8d06.) Given we've had several bugs in this behaviour that have now been fixed, add an integration test that makes sure that you can start a container that joins all of the namespaces of a second container. The only namespace we do not join is the mount namespace, because joining a namespace that has been pivot_root'd leads to a bunch of errors. In principle, removing everything from config.json that requires a mount _should_ work, but the root.path configuration is mandatory and we cannot just ignore setting up the rootfs in the namespace joining scenario (if the user has configured a different rootfs, we need to use it or error out, and there's no reasonable way of checking if if the rootfs paths are the same that doesn't result in spaghetti logic). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a cherry-pick of ebcef3e.) It turns out that the error added in commit 09822c3 ("configs: disallow ambiguous userns and timens configurations") causes issues with containerd and CRIO because they pass both userns mappings and a userns path. These configurations are broken, but to avoid the regression in this one case, output a warning to tell the user that the configuration is incorrect but we will continue to use it if and only if the configured mappings are identical to the mappings of the provided namespace. Fixes: 09822c3 ("configs: disallow ambiguous userns and timens configurations") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a cherry-pick of 482e563.) Using ints for all of our mapping structures means that a 32-bit binary errors out when trying to parse /proc/self/*id_map: failed to cache mappings for userns: failed to parse uid_map of userns /proc/1/ns/user: parsing id map failed: invalid format in line " 0 0 4294967295": integer overflow on token 4294967295 This issue was unearthed by commit 1912d59 ("*: actually support joining a userns with a new container") but the underlying issue has been present since the docker/libcontainer days. In theory, switching to uint32 (to match the spec) instead of int64 would also work, but keeping everything signed seems much less error-prone. It's also important to note that a mapping might be too large for an int on 32-bit, so we detect this during the mapping. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
[1.1] *: fix several issues with userns path handling
Signed-off-by: lfbzhm <lifubang@acmcoder.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: lfbzhm <lifubang@acmcoder.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
lfbzhm (2): VERSION: back to development VERSION: release 1.1.11 LGTMs: AkihiroSuda cyphar
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Aleksa Sarai (2): keyring: update AkihiroSuda key expiry keyring: update cyphar@cyphar.com key expiry LGTMs: AkihiroSuda lifubang
(This is a cherry-pick of 937ca10.) Signed-off-by: hang.jiang <hang.jiang@daocloud.io> Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If a file descriptor of a directory in the host's mount namespace is leaked to runc init, a malicious config.json could use /proc/self/fd/... as a working directory to allow for host filesystem access after the container runs. This can also be exploited by a container process if it knows that an administrator will use "runc exec --cwd" and the target --cwd (the attacker can change that cwd to be a symlink pointing to /proc/self/fd/... and wait for the process to exec and then snoop on /proc/$pid/cwd to get access to the host). The former issue can lead to a critical vulnerability in Docker and Kubernetes, while the latter is a container breakout. We can (ab)use the fact that getcwd(2) on Linux detects this exact case, and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we just do os.Getwd() after chdir we can easily detect this case and error out. In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc init", making this exploitable. On runc main it just so happens that the leaked /sys/fs/cgroup gets clobbered and thus this is only consistently exploitable for runc 1.1. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Co-developed-by: lifubang <lifubang@acmcoder.com> Signed-off-by: lifubang <lifubang@acmcoder.com> [refactored the implementation and added more comments] Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a partial backport of a minor change included in commit dac4171.) This mirrors the logic in standard_init_linux.go, and also ensures that we do not call exec.LookPath in the final execve step. While this is okay for regular binaries, it seems exec.LookPath calls os.Getenv which tries to emit a log entry to the test harness when running in "go test" mode. In a future patch (in order to fix CVE-2024-21626), we will close all of the file descriptors immediately before execve, which would mean the file descriptor for test harness logging would be closed at execve time. So, moving exec.LookPath earlier is necessary. Ref: dac4171 ("runc-dmz: reduce memfd binary cloning cost with small C binary") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If we leak a file descriptor referencing the host filesystem, an attacker could use a /proc/self/fd magic-link as the source for execve to execute a host binary in the container. This would allow the binary itself (or a process inside the container in the 'runc exec' case) to write to a host binary, leading to a container escape. The simple solution is to make sure we close all file descriptors immediately before the execve(2) step. Doing this earlier can lead to very serious issues in Go (as file descriptors can be reused, any (*os.File) reference could start silently operating on a different file) so we have to do it as late as possible. Unfortunately, there are some Go runtime file descriptors that we must not close (otherwise the Go scheduler panics randomly). The only way of being sure which file descriptors cannot be closed is to sneakily go:linkname the runtime internal "internal/poll.IsPollDescriptor" function. This is almost certainly not recommended but there isn't any other way to be absolutely sure, while also closing any other possible files. In addition, we can keep the logrus forwarding logfd open because you cannot execve a pipe and the contents of the pipe are so restricted (JSON-encoded in a format we pick) that it seems unlikely you could even construct shellcode. Closing the logfd causes issues if there is an error returned from execve. In mainline runc, runc-dmz protects us against this attack because the intermediate execve(2) closes all of the O_CLOEXEC internal runc file descriptors and thus runc-dmz cannot access them to attack the host. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
We auto-close this file descriptor in the final exec step, but it's probably a good idea to not possibly leak the file descriptor to "runc init" (we've had issues like this in the past) especially since it is a directory handle from the host mount namespace. In practice, on runc 1.1 this does leak to "runc init" but on main the handle has a low enough file descriptor that it gets clobbered by the ForkExec of "runc init". OPEN_TREE_CLONE would let us protect this handle even further, but the performance impact of creating an anonymous mount namespace is probably not worth it. Also, switch to using an *os.File for the handle so if it goes out of scope during setup (i.e. an error occurs during setup) it will get cleaned up by the GC. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly leaking file descriptors to "runc init", it seems prudent to make sure we proactively prevent this in the future. The solution is to simply mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc init". For libcontainer library users, this could result in unrelated files being marked as O_CLOEXEC -- however (for the same reason we are doing this for runc), for security reasons those files should've been marked as O_CLOEXEC anyway. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
We close the logfd before execve so there's no need to special case it. In addition, it turns out that (*os.File).Fd() doesn't handle the case where the file was closed and so it seems suspect to use that kind of check. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This is a security fix for CVE-2024-21626. See the advisory[1] for more details. Aleksa Sarai (6): init: don't special-case logrus fds libcontainer: mark all non-stdio fds O_CLOEXEC before spawning init cgroup: plug leaks of /sys/fs/cgroup handle init: close internal fds before execve setns init: do explicit lookup of execve argument early init: verify after chdir that cwd is inside the container Hang Jiang (1): Fix File to Close [1]: GHSA-xr7r-f8xq-vfvv Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 LGTMs: cyphar AkihiroSuda kolyshkin lifubang
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Leandro Motta Barros <leandro@balena.io> Change-type: patch
Signed-off-by: Leandro Motta Barros <leandro@balena.io> Change-type: patch
panic(err) | ||
} | ||
|
||
logrus.SetLevel(logrus.Level(level)) |
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
strconv.Atoi
return nil, errors.New("invalid range: " + r) | ||
} | ||
for i := start; i <= end; i++ { | ||
bits.SetBit(bits, int(i), 1) |
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
strconv.ParseUint
if err != nil { | ||
return nil, err | ||
} | ||
bits.SetBit(bits, int(val), 1) |
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
strconv.ParseUint
this is not the way to go |
Pull request was closed
to mitigate CVE-2024-21626
Internal Balena Context