Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.0 backport] apparmor: Allow confined runc to kill containers #47831

Merged
merged 1 commit into from
May 16, 2024

Conversation

vvoland
Copy link
Contributor

@vvoland vvoland commented May 14, 2024

/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor
v4.0.0. This change breaks stopping of containers, because the profile
assigned to containers doesn't accept signals from the "runc" peer.
AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.

In the case of Docker, this regression is hidden by the fact that
dockerd itself sends SIGKILL to the running container after runc fails
to stop it. It is still a regression, because graceful shutdowns of
containers via "docker stop" are no longer possible, as SIGTERM from
runc is not delivered to them. This can be seen in logs from dockerd
when run with debug logging enabled and also from tracing signals with
killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):

Test commands:

root@cloudimg:~# docker run -d --name test redis
ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
root@cloudimg:~# docker stop test

Relevant syslog messages (with wrapped long lines):

Apr 23 20:45:26 cloudimg kernel: audit:
  type=1400 audit(1713905126.444:253): apparmor="DENIED"
  operation="signal" class="signal" profile="docker-default" pid=9289
  comm="runc" requested_mask="receive" denied_mask="receive"
  signal=kill peer="runc"
Apr 23 20:45:36 cloudimg dockerd[9030]:
  time="2024-04-23T20:45:36.447016467Z"
  level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL"
  container=ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
  error="context deadline exceeded"

Killsnoop output after "docker stop ...":

root@cloudimg:~# killsnoop-bpfcc
TIME      PID      COMM             SIG  TPID     RESULT
20:51:00  9631     runc             3    9581     -13
20:51:02  9637     runc             9    9581     -13
20:51:12  9030     dockerd          9    9581     0

This change extends the docker-default profile with rules that allow
receiving signals from processes that run confined with either runc or
crun profile (crun[4] is an alternative OCI runtime that's also confined
in AppArmor >= v4.0.0, see [1]). It is backward compatible because the
peer value is a regular expression (AARE) so the referenced profile
doesn't have to exist for this profile to successfully compile and load.

Note that the runc profile has an attachment to /usr/sbin/runc. This is
the path where the runc package in Debian/Ubuntu puts the binary. When
the docker-ce package is installed from the upstream repository[3], runc
is installed as part of the containerd.io package at /usr/bin/runc.
Therefore it's still running unconfined and has no issues sending
signals to containers.

[1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936
[2] https://github.com/iovisor/bcc/blob/master/tools/killsnoop.py
[3] https://download.docker.com/linux/ubuntu
[4] https://github.com/containers/crun

Signed-off-by: Tomáš Virtus nechtom@gmail.com

- Description for the changelog

apparmor: Allow confined runc to kill containers

/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor
v4.0.0. This change breaks stopping of containers, because the profile
assigned to containers doesn't accept signals from the "runc" peer.
AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.

In the case of Docker, this regression is hidden by the fact that
dockerd itself sends SIGKILL to the running container after runc fails
to stop it. It is still a regression, because graceful shutdowns of
containers via "docker stop" are no longer possible, as SIGTERM from
runc is not delivered to them. This can be seen in logs from dockerd
when run with debug logging enabled and also from tracing signals with
killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):

  Test commands:

    root@cloudimg:~# docker run -d --name test redis
    ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
    root@cloudimg:~# docker stop test

  Relevant syslog messages (with wrapped long lines):

    Apr 23 20:45:26 cloudimg kernel: audit:
      type=1400 audit(1713905126.444:253): apparmor="DENIED"
      operation="signal" class="signal" profile="docker-default" pid=9289
      comm="runc" requested_mask="receive" denied_mask="receive"
      signal=kill peer="runc"
    Apr 23 20:45:36 cloudimg dockerd[9030]:
      time="2024-04-23T20:45:36.447016467Z"
      level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL"
      container=ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46
      error="context deadline exceeded"

  Killsnoop output after "docker stop ...":

    root@cloudimg:~# killsnoop-bpfcc
    TIME      PID      COMM             SIG  TPID     RESULT
    20:51:00  9631     runc             3    9581     -13
    20:51:02  9637     runc             9    9581     -13
    20:51:12  9030     dockerd          9    9581     0

This change extends the docker-default profile with rules that allow
receiving signals from processes that run confined with either runc or
crun profile (crun[4] is an alternative OCI runtime that's also confined
in AppArmor >= v4.0.0, see [1]). It is backward compatible because the
peer value is a regular expression (AARE) so the referenced profile
doesn't have to exist for this profile to successfully compile and load.

Note that the runc profile has an attachment to /usr/sbin/runc. This is
the path where the runc package in Debian/Ubuntu puts the binary. When
the docker-ce package is installed from the upstream repository[3], runc
is installed as part of the containerd.io package at /usr/bin/runc.
Therefore it's still running unconfined and has no issues sending
signals to containers.

[1] https://gitlab.com/apparmor/apparmor/-/commit/2594d936
[2] https://github.com/iovisor/bcc/blob/master/tools/killsnoop.py
[3] https://download.docker.com/linux/ubuntu
[4] https://github.com/containers/crun

Signed-off-by: Tomáš Virtus <nechtom@gmail.com>
(cherry picked from commit 5ebe2c0)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit b049257 into moby:23.0 May 16, 2024
90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants