Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc incompatibility: crun hangs when sendmsg is in the SCMP_ACT_NOTIFY list #1002

Closed
AkihiroSuda opened this issue Sep 3, 2022 · 3 comments · Fixed by #1004
Closed

runc incompatibility: crun hangs when sendmsg is in the SCMP_ACT_NOTIFY list #1002

AkihiroSuda opened this issue Sep 3, 2022 · 3 comments · Fixed by #1004

Comments

@AkihiroSuda
Copy link

AkihiroSuda commented Sep 3, 2022

runc (v1.1.4) accepts the following .linux.seccomp configuration (sendmsg is in the SCMP_ACT_NOTIFY list), but crun (v1.5, also tested v0.19) just hangs.

    "seccomp": {
      "defaultAction": "SCMP_ACT_ALLOW",
      "listenerPath": "/tmp/foo.sock",
      "syscalls": [
        {
          "names": [
            "sendmsg"
          ],
          "action": "SCMP_ACT_NOTIFY"
        }
      ]
    }

Reproduction steps:

$ runc spec --rootless
$ vi config.json
(Add the seccomp config above)
$ ~/gopath/src/github.com/opencontainers/runc/contrib/cmd/seccompagent/seccompagent -socketfile=/tmp/foo.sock

(In another terminal)

$ runc run test-sendmsg
/ # exit
$ crun run test-sendmsg
(hangs)

Host: Ubuntu 22.04, kernel 5.15 (x86_64)


This is a blocker for running https://github.com/rootless-containers/bypass4netns with crun.

@AkihiroSuda
Copy link
Author

AkihiroSuda commented Sep 3, 2022

Possible solution (similar to runc's implementation):

  • Create a pipe, and fork the process before calling libcrun/seccomp.c:syscall_seccomp().
  • The parent process calls syscall_seccomp(), gets the seccomp FD num as the return value, and then writes the FD num to the pipe, assuming write(2) is not on the SCMP_ACT_NOTIFY list. (If write(2) is on the list, maybe invoke some invalid syscall and raise SIGSYS to notify the child? Or maybe SIGSEGV?)
  • The child process reads the FD num from the pipe, and calls pidfd_getfd() to get the actual seccomp FD. Then send the FD to the listenerPath using send_fd_to_socket_with_payload().
  • The parent process waits for the child to exit, assuming wait4(-ish) is not on the SCMP_ACT_NOTIFY list (What if wait4 is on the list?)

The drawbacks are complexity and dependency on kernel >= 5.6.

giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

giuseppe commented Sep 5, 2022

thanks for the report: I've opened a PR: #1004

@giuseppe
Copy link
Member

giuseppe commented Sep 5, 2022

it uses a helper process that shares the fd table so it doesn't need pidfd_getfd() and the communication between processes is done with shared memory. No syscalls are invoked by the main process and you can notify every syscall

giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/crun that referenced this issue Sep 5, 2022
use a helper process to send the listener fd to the child process, in
this way the sendmssg syscall happens from an environment that is not
seccomp filtered.

The helper process shares the fd table with the main process, so there
is no need to send the fd around, and no syscalls are invoked between
the seccomp filter setup and the sendmssg, so that all syscalls can
use SCMP_ACT_NOTIFY without any risk of deadlocking the runtime itself.

Closes: containers#1002

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants