seccomp may cause "Failed to mount API filesystems, freezing." #25290

ghost · 2016-08-01T12:58:40Z

Output of docker version:

RHEL Host

Client:
Version:      1.12.0
API version:  1.24
Go version:   go1.6.3
Git commit:   8eab29e
Built:
OS/Arch:      linux/amd64

Server:
Version:      1.12.0
API version:  1.24
Go version:   go1.6.3
Git commit:   8eab29e
Built:
OS/Arch:      linux/amd64

Arch Linux Host

Client:
Version:      1.11.2
API version:  1.23
Go version:   go1.6.2
Git commit:   b9f10c9
Built:        Tue Jun 21 00:43:14 2016
OS/Arch:      linux/amd64

Server:
Version:      1.11.2
API version:  1.23
Go version:   go1.6.2
Git commit:   b9f10c9
Built:        Tue Jun 21 00:43:14 2016
OS/Arch:      linux/amd64

Output of docker info:

Containers: x
Running: x
Paused: 0
Stopped: x
Images: 1276
Server Version: 1.12.0
Storage Driver: devicemapper
Pool Name: docker-253:4-2097153-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 32.1 GB
Data Space Total: 107.4 GB
Data Space Available: 14.32 GB
Metadata Space Used: 104 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.043 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.107-XXXX (XXX)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-XXX
Operating System: Red Hat Enterprise Linux Server 7.x
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51 GiB
Name: XXX
ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:5000
127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Image: feduxorg/centos
Image contains systemd 208 for starting processes
Image OS: CentOS 7.x
Host OS: RHEL 7.x, but also Arch Linux

Steps to reproduce the issue:

docker pull feduxorg/centos
docker run -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos

   [!!!!!!] Failed to mount API filesystems, freezing.

Describe the results you received:

systemd stops starting the system. Instead the following error message is shown.

[!!!!!!] Failed to mount API filesystems, freezing.

Describe the results you expected:

I expect to see the normal systemd startup sequence.

systemd 208 running in system mode. (+PAM -LIBWRAP -AUDIT +SELINUX -IMA +SYSVINIT -LIBCRYPTSETUP -GCRYPT -ACL -XZ)
Detected virtualization 'docker'.

Welcome to CentOS Linux 7 (Core)!
[...]

Additional information you deem important (e.g. issue happens only occasionally):

The problem seems to be somehow related to seccomp. Maybe the issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=1322508 is not fixed (yet) (upstream)?

If I run docker with no options, it fails:

# Fails
docker run -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos

If set seccomp options explicitly or run it as privileged process it works.

# Works
docker run --security-opt seccomp:unconfined -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup  feduxorg/centos

# Works
docker run --privileged=true -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup feduxorg/centos

The text was updated successfully, but these errors were encountered:

thaJeztah · 2016-08-01T13:10:36Z

systemd requires certain privileges to run, which are blocked by default in a container for security. To run systemd, you need to allow SYS_ADMIN capabilities, and disable seccomp (or use a custom profile). See the discussion on #22285 with more information.

This is not a bug, but by design: processes inside the container should not automatically get these privileges, as they make the container less secure; if a process (such as systemd) requires these privileges, you need to pass them at runtime.

I'm closing this issue, but feel free to continue the discussion

ghost · 2016-08-01T13:24:21Z

@thaJeztah Thanks for sharing. Reading that other issue makes it easier for me to understand the reasons here.

What would have been helpful, are doc pages explaining how to use systemd as PID 1 in docker images correctly and when it makes sense to use such a setup. Is there any repo for this?

I didn't find any documentation using the search engine on the docker site.

justincormack · 2016-08-01T13:25:42Z

You only need to add SYS_ADMIN as of 1.12 this will take care of the
seccomp adjustments. For previous releases you need to disable seccomp too
or use a custom profile.

On 1 Aug 2016 3:10 p.m., "Sebastiaan van Stijn" notifications@github.com
wrote:

systemd requires certain privileges to run, which are blocked by default
in a container for security. To run systemd, you need to allow SYS_ADMIN
capabilities, and disable seccomp (or use a custom profile). See the
discussion on #22285 #22285 with
more information.

This is not a bug, but by design: processes inside the container should
not automatically get these privileges, as they make the container less
secure; if a process (such as systemd) requires these privileges, you need
to pass them at runtime.

I'm closing this issue, but feel free to continue the discussion

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#25290 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAdcPL7c89jb9PHFPJoLvRkl6SK5k9uhks5qbfBZgaJpZM4JZjnR
.

thaJeztah · 2016-08-01T13:37:58Z

Thanks for adding that, @justincormack

What would have been helpful, are doc pages explaining how to use systemd as PID 1 in docker images correctly and when it makes sense to use such a setup. Is there any repo for this?

We currently don't have documentation about running systemd in a container; we do have some information about running supervisord here, so if some one would contribute documentation for that, I think that we could add it to the documentation

and when it makes sense to use such a setup

In most cases, running a process manager really isn't needed; remember; a container is to run a process, not a virtual machine. As can be seen from this discussion; allowing systemd to run, requires lowering the security of the container as a whole, so it's worth considering if you need systemd for your use case. Opinions on this vary, so let's just say "it depends" 😄

ghost · 2016-08-01T14:26:14Z

Opinions on this vary, so let's just say "it depends"

@thaJeztah Yeah, you're absolute right. That's why I suggested just to add a section about reasons

reasons for not using
reasons for using

systemd in a docker container. Maybe just add some use cases - as given in the other issue - and leave the decision to the user if it makes sense or not for his/her particular use case.

if some one would contribute documentation for that, I think that we could add it to the documentation

Is the following something you would accept? If you yes I'm going to create a PR for that and give it a bit more time to mature.

Control and configure services in a Docker Container

There are some use cases where using a supervising PID 1 inside a container makes sense. In particular if you need a zombie reaper. In general we suggest to avoid using a process supervisor as it makes your setup more complex, but there might be uses cases where it makes sense to use "systemd" as PID 1.

Reasons NOT to use "systemd" as PID 1

Lowers security of a container

"systemd" requires some linux security relevant capabilities - SYS_ADMIN - which are not required for each dockerized application. More capalities = More power to the container = More problems if an attacker takes over your container
Another tool

Using more and more tools makes an infrastructure more complex. It's harder to find errors.
Separation of infrastructure

If you put together all applications in one container, a bug / an attack might affect all other applications within the container. Remember a container is not a virtual Machine.

Reasons to use "systemd" as PID 1

Same tool for the same job

As your staff might be trained to use "systemd" for controlling services on operating system level you might want it to control the services in your containers as well
Multiple applications required for your service

Given you have got a web application which runs some tasks in background, you might consider those job runners as part of your application: crond, background worker.

How to use "systemd" as PID 1

Create the image

FROM centos
MAINTAINER test@example.com

# Make systemd aware of being run in container
ENV container docker

# Reduce the amount of space wasted by journals
RUN sed -ir "s/#SystemMaxUse=.*/SystemMaxUse=50M/" /etc/systemd/journald.conf

# Make sure dbus is not killed
RUN yum install -y dbus \
  && sed -i -e "s/OOMScoreAdjust/# OOMScoreAdjust/" /usr/lib/systemd/system/dbus.service

# Add services units and enable them
ADD my-app.service /etc/systemd/system/
ADD my-worker.service /etc/systemd/system/
RUN systemctl enable my-app
RUN systemctl enable my-worker

# Export those available to run systemd as init daemon
VOLUME ["/sys/fs/cgroup", "/run", "/tmp"]

# Run systemd
CMD ["/usr/sbin/init"]

# To shutdown systemd in your container correctly we need to use "RTMIN+3"
STOPSIGNAL "RTMIN+3"

Now build the image.

docker build -t my-centos .

Run the image

As of "Docker" > 1.12 pass the --cap-add SYS_ADMIN-flag to docker run.

docker run --cap-add SYS_ADMIN -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

ghost · 2016-08-02T08:33:02Z

@justincormack I cannot confirm this for 1.12.0:

Fails:

docker run --cap-add SYS_ADMIN -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

Succeeds:

docker run --security-opt seccomp:unconfined -t --rm --name server-1 -v /sys/fs/cgroup:/sys/fs/cgroup :ro my-centos

jamshid · 2016-08-28T05:59:08Z

@justincormack, I see same behavior as @dg-ratiodata. seccomp:unconfined is still needed with docker 1.12.
CentOS/sig-cloud-instance-images#54 (comment)

Somewhere I've seen that -v /sys/fs/cgroup:/sys/fs/cgroup :ro is only needed with docker servers that are themselves running systemd. Otherwise /sys/fs/cgroup can just be a regular volume, not mapped to the server file system.

Btw I wish I understood why this succeeds:

$ docker run  -ti --cap-add SYS_ADMIN --security-opt seccomp:unconfined -p 80:80 local/c7-systemd-httpd bash -c "/usr/sbin/init"

but running init interactively fails:

$ docker run  -ti --cap-add SYS_ADMIN --security-opt seccomp:unconfined -p 80:80 local/c7-systemd-httpd bash
[root@e0ae35935b8e /]# /usr/sbin/init 
Couldn't find an alternative telinit implementation to spawn.

jamshid · 2016-08-29T17:33:28Z

Note I'm using Docker for Mac which is still docker 1.12.0. The need for seccomp:unconfined is supposed to be fixed in 1.12.1 by #25567.

cleverlzc · 2018-05-17T03:07:50Z

I am also facing this problem.
Luckily, now I have solved it by add --privileged option， when execute the docker run xxx.

thaJeztah · 2018-05-17T09:15:17Z

@cleverlzc I'd discourage using --privileged, and use --cap-add SYS_ADMIN instead; --privileged basically disables all security that containers provide. Make sure you're running an up-to-date version of docker though

cleverlzc · 2018-05-18T01:11:54Z

Okay，thanks very much for your attention and advice. @thaJeztah

GordonTheTurtle added the version/1.12 label Aug 1, 2016

thaJeztah closed this as completed Aug 1, 2016

thaJeztah added area/systemd area/security/seccomp labels Aug 1, 2016

felipevolpone mentioned this issue Sep 20, 2017

Failed to determine whether /sys is a mount point: Operation not permitted freeipa/freeipa-container#160

Closed

chesty mentioned this issue Dec 11, 2017

Services do not come up after restarting container (after succesfull installation) freeipa/freeipa-container#135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seccomp may cause "Failed to mount API filesystems, freezing." #25290

seccomp may cause "Failed to mount API filesystems, freezing." #25290

ghost commented Aug 1, 2016 •

edited by ghost

thaJeztah commented Aug 1, 2016

ghost commented Aug 1, 2016 •

edited by ghost

justincormack commented Aug 1, 2016

thaJeztah commented Aug 1, 2016

ghost commented Aug 1, 2016

ghost commented Aug 2, 2016

jamshid commented Aug 28, 2016 •

edited

jamshid commented Aug 29, 2016

cleverlzc commented May 17, 2018 •

edited

thaJeztah commented May 17, 2018

cleverlzc commented May 18, 2018

seccomp may cause "Failed to mount API filesystems, freezing." #25290

seccomp may cause "Failed to mount API filesystems, freezing." #25290

Comments

ghost commented Aug 1, 2016 • edited by ghost

thaJeztah commented Aug 1, 2016

ghost commented Aug 1, 2016 • edited by ghost

justincormack commented Aug 1, 2016

thaJeztah commented Aug 1, 2016

ghost commented Aug 1, 2016

Control and configure services in a Docker Container

Reasons NOT to use "systemd" as PID 1

Reasons to use "systemd" as PID 1

How to use "systemd" as PID 1

Create the image

Run the image

ghost commented Aug 2, 2016

jamshid commented Aug 28, 2016 • edited

jamshid commented Aug 29, 2016

cleverlzc commented May 17, 2018 • edited

thaJeztah commented May 17, 2018

cleverlzc commented May 18, 2018

ghost commented Aug 1, 2016 •

edited by ghost

ghost commented Aug 1, 2016 •

edited by ghost

jamshid commented Aug 28, 2016 •

edited

cleverlzc commented May 17, 2018 •

edited