Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seccomp may cause "Failed to mount API filesystems, freezing." #25290

Closed
ghost opened this issue Aug 1, 2016 · 11 comments
Closed

seccomp may cause "Failed to mount API filesystems, freezing." #25290

ghost opened this issue Aug 1, 2016 · 11 comments

Comments

@ghost
Copy link

ghost commented Aug 1, 2016

Output of docker version:

RHEL Host

Client:
Version:      1.12.0
API version:  1.24
Go version:   go1.6.3
Git commit:   8eab29e
Built:
OS/Arch:      linux/amd64

Server:
Version:      1.12.0
API version:  1.24
Go version:   go1.6.3
Git commit:   8eab29e
Built:
OS/Arch:      linux/amd64

Arch Linux Host

Client:
Version:      1.11.2
API version:  1.23
Go version:   go1.6.2
Git commit:   b9f10c9
Built:        Tue Jun 21 00:43:14 2016
OS/Arch:      linux/amd64

Server:
Version:      1.11.2
API version:  1.23
Go version:   go1.6.2
Git commit:   b9f10c9
Built:        Tue Jun 21 00:43:14 2016
OS/Arch:      linux/amd64

Output of docker info:

Containers: x
Running: x
Paused: 0
Stopped: x
Images: 1276
Server Version: 1.12.0
Storage Driver: devicemapper
Pool Name: docker-253:4-2097153-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 32.1 GB
Data Space Total: 107.4 GB
Data Space Available: 14.32 GB
Metadata Space Used: 104 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.043 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-XXXX (XXX)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-XXX
Operating System: Red Hat Enterprise Linux Server 7.x
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51 GiB
Name: XXX
ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:5000
127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Image: feduxorg/centos
  • Image contains systemd 208 for starting processes
  • Image OS: CentOS 7.x
  • Host OS: RHEL 7.x, but also Arch Linux

Steps to reproduce the issue:

  1. docker pull feduxorg/centos
  2. docker run -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos
   [!!!!!!] Failed to mount API filesystems, freezing.

Describe the results you received:

systemd stops starting the system. Instead the following error message is shown.

[!!!!!!] Failed to mount API filesystems, freezing.

Describe the results you expected:

I expect to see the normal systemd startup sequence.

systemd 208 running in system mode. (+PAM -LIBWRAP -AUDIT +SELINUX -IMA +SYSVINIT -LIBCRYPTSETUP -GCRYPT -ACL -XZ)
Detected virtualization 'docker'.

Welcome to CentOS Linux 7 (Core)!
[...]

Additional information you deem important (e.g. issue happens only occasionally):

The problem seems to be somehow related to seccomp. Maybe the issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=1322508 is not fixed (yet) (upstream)?

If I run docker with no options, it fails:

# Fails
docker run -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos

If set seccomp options explicitly or run it as privileged process it works.

# Works
docker run --security-opt seccomp:unconfined -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup  feduxorg/centos

# Works
docker run --privileged=true -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos
@thaJeztah
Copy link
Member

systemd requires certain privileges to run, which are blocked by default in a container for security. To run systemd, you need to allow SYS_ADMIN capabilities, and disable seccomp (or use a custom profile). See the discussion on #22285 with more information.

This is not a bug, but by design: processes inside the container should not automatically get these privileges, as they make the container less secure; if a process (such as systemd) requires these privileges, you need to pass them at runtime.

I'm closing this issue, but feel free to continue the discussion

@ghost
Copy link
Author

ghost commented Aug 1, 2016

@thaJeztah Thanks for sharing. Reading that other issue makes it easier for me to understand the reasons here.

What would have been helpful, are doc pages explaining how to use systemd as PID 1 in docker images correctly and when it makes sense to use such a setup. Is there any repo for this?

I didn't find any documentation using the search engine on the docker site.

@justincormack
Copy link
Contributor

You only need to add SYS_ADMIN as of 1.12 this will take care of the
seccomp adjustments. For previous releases you need to disable seccomp too
or use a custom profile.

On 1 Aug 2016 3:10 p.m., "Sebastiaan van Stijn" notifications@github.com
wrote:

systemd requires certain privileges to run, which are blocked by default
in a container for security. To run systemd, you need to allow SYS_ADMIN
capabilities, and disable seccomp (or use a custom profile). See the
discussion on #22285 #22285 with
more information.

This is not a bug, but by design: processes inside the container should
not automatically get these privileges, as they make the container less
secure; if a process (such as systemd) requires these privileges, you need
to pass them at runtime.

I'm closing this issue, but feel free to continue the discussion


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#25290 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAdcPL7c89jb9PHFPJoLvRkl6SK5k9uhks5qbfBZgaJpZM4JZjnR
.

@thaJeztah
Copy link
Member

Thanks for adding that, @justincormack

What would have been helpful, are doc pages explaining how to use systemd as PID 1 in docker images correctly and when it makes sense to use such a setup. Is there any repo for this?

We currently don't have documentation about running systemd in a container; we do have some information about running supervisord here, so if some one would contribute documentation for that, I think that we could add it to the documentation

and when it makes sense to use such a setup

In most cases, running a process manager really isn't needed; remember; a container is to run a process, not a virtual machine. As can be seen from this discussion; allowing systemd to run, requires lowering the security of the container as a whole, so it's worth considering if you need systemd for your use case. Opinions on this vary, so let's just say "it depends" 😄

@ghost
Copy link
Author

ghost commented Aug 1, 2016

Opinions on this vary, so let's just say "it depends"

@thaJeztah Yeah, you're absolute right. That's why I suggested just to add a section about reasons

  • reasons for not using
  • reasons for using

systemd in a docker container. Maybe just add some use cases - as given in the other issue - and leave the decision to the user if it makes sense or not for his/her particular use case.

if some one would contribute documentation for that, I think that we could add it to the documentation

Is the following something you would accept? If you yes I'm going to create a PR for that and give it a bit more time to mature.

Control and configure services in a Docker Container

There are some use cases where using a supervising PID 1 inside a container makes sense. In particular if you need a zombie reaper. In general we suggest to avoid using a process supervisor as it makes your setup more complex, but there might be uses cases where it makes sense to use "systemd" as PID 1.

Reasons NOT to use "systemd" as PID 1

  • Lowers security of a container

    "systemd" requires some linux security relevant capabilities - SYS_ADMIN - which are not required for each dockerized application. More capalities = More power to the container = More problems if an attacker takes over your container

  • Another tool

    Using more and more tools makes an infrastructure more complex. It's harder to find errors.

  • Separation of infrastructure

    If you put together all applications in one container, a bug / an attack might affect all other applications within the container. Remember a container is not a virtual Machine.

Reasons to use "systemd" as PID 1

  • Same tool for the same job

    As your staff might be trained to use "systemd" for controlling services on operating system level you might want it to control the services in your containers as well

  • Multiple applications required for your service

    Given you have got a web application which runs some tasks in background, you might consider those job runners as part of your application: crond, background worker.

How to use "systemd" as PID 1

Create the image

FROM centos
MAINTAINER test@example.com

# Make systemd aware of being run in container
ENV container docker

# Reduce the amount of space wasted by journals
RUN sed -ir "s/#SystemMaxUse=.*/SystemMaxUse=50M/" /etc/systemd/journald.conf

# Make sure dbus is not killed
RUN yum install -y dbus \
  && sed -i -e "s/OOMScoreAdjust/# OOMScoreAdjust/" /usr/lib/systemd/system/dbus.service

# Add services units and enable them
ADD my-app.service /etc/systemd/system/
ADD my-worker.service /etc/systemd/system/
RUN systemctl enable my-app
RUN systemctl enable my-worker

# Export those available to run systemd as init daemon
VOLUME ["/sys/fs/cgroup", "/run", "/tmp"]

# Run systemd
CMD ["/usr/sbin/init"]

# To shutdown systemd in your container correctly we need to use "RTMIN+3"
STOPSIGNAL "RTMIN+3"

Now build the image.

docker build -t my-centos .

Run the image

As of "Docker" > 1.12 pass the --cap-add SYS_ADMIN-flag to docker run.

docker run --cap-add SYS_ADMIN -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

@ghost
Copy link
Author

ghost commented Aug 2, 2016

@justincormack I cannot confirm this for 1.12.0:

Fails:

docker run --cap-add SYS_ADMIN -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

Succeeds:

docker run --security-opt seccomp:unconfined -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

@jamshid
Copy link
Contributor

jamshid commented Aug 28, 2016

@justincormack, I see same behavior as @dg-ratiodata. seccomp:unconfined is still needed with docker 1.12.
CentOS/sig-cloud-instance-images#54 (comment)

Somewhere I've seen that -v /sys/fs/cgroup:/sys/fs/cgroup :ro is only needed with docker servers that are themselves running systemd. Otherwise /sys/fs/cgroup can just be a regular volume, not mapped to the server file system.

Btw I wish I understood why this succeeds:

$ docker run  -ti --cap-add SYS_ADMIN --security-opt seccomp:unconfined -p 80:80 local/c7-systemd-httpd bash -c "/usr/sbin/init"

but running init interactively fails:

$ docker run  -ti --cap-add SYS_ADMIN --security-opt seccomp:unconfined -p 80:80 local/c7-systemd-httpd bash
[root@e0ae35935b8e /]# /usr/sbin/init 
Couldn't find an alternative telinit implementation to spawn.

@jamshid
Copy link
Contributor

jamshid commented Aug 29, 2016

Note I'm using Docker for Mac which is still docker 1.12.0. The need for seccomp:unconfined is supposed to be fixed in 1.12.1 by #25567.

@cleverlzc
Copy link

cleverlzc commented May 17, 2018

I am also facing this problem.
Luckily, now I have solved it by add --privileged option, when execute the docker run xxx.

@thaJeztah
Copy link
Member

@cleverlzc I'd discourage using --privileged, and use --cap-add SYS_ADMIN instead; --privileged basically disables all security that containers provide. Make sure you're running an up-to-date version of docker though

@cleverlzc
Copy link

Okay,thanks very much for your attention and advice. @thaJeztah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants