Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: Persistent Selinux Policy Changes Result In Docker Not Starting #700

Closed
fifofonix opened this issue Dec 16, 2020 · 11 comments
Closed

Comments

@fifofonix
Copy link

Latest 'testing' stream release is resulting in docker service not starting.

I haven't investigated in much detail at all yet but this is unusual, and could be affecting others, so reporting this issue early.

Logging into a newly auto-updated node I see the following services not started:

Last login: Wed Dec 16 16:46:24 2020 from 172.24.126.209
[systemd]
Failed Units: 7
  docker.service
  docker-tls-tcp.socket
  docker.socket

Output of journalctl -b -u docker:

-- Logs begin at Mon 2020-11-30 01:31:34 UTC, end at Wed 2020-12-16 17:04:21 UTC. --
Dec 16 16:45:57 d-pdm-c2-3 systemd[1]: Starting Docker Application Container Engine...
Dec 16 16:45:59 d-pdm-c2-3 dockerd[934]: time="2020-12-16T16:45:59.676785620Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Dec 16 16:45:59 d-pdm-c2-3 dockerd[934]: time="2020-12-16T16:45:59.677886135Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Dec 16 16:45:59 d-pdm-c2-3 dockerd[934]: time="2020-12-16T16:45:59.929469562Z" level=warning msg="runtime v1 is deprecated since containerd v1.4, consider using runtime v2"
Dec 16 16:46:00 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:00.503230017Z" level=error msg="failed to enable plugin" error="OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:59: mounting \"/etc/resolv.conf\" to rootfs at \"/etc/resolv.conf\" caused: stat /etc/resolv.conf: no such file or directory: unknown" id=cf478c703549a114c9d88765a2c39905e08ec4418c17fafca60f4df580c108a1
Dec 16 16:46:01 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:01.729407072Z" level=warning msg="Your kernel does not support cgroup rt period"
Dec 16 16:46:01 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:01.730359229Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Dec 16 16:46:01 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:01.730574497Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Dec 16 16:46:01 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:01.730585187Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Dec 16 16:46:03 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:03.160085927Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b7eb9c712d8e08bc9e23e5c736c79b513dde3ee21b2955b12d98e9de1940ceff 0c02630136241c938f97d060bd6ba19f4207f68ec4f4807220d4c2601421cd89], retrying...."
Dec 16 16:46:03 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:03.667931920Z" level=warning msg="failed to deactivate service binding for container dev-pdm-****.0dzf61tv3fll0iqp0vj1z3tpw.65vm31fh0crhhkixmmdebjiip" error="No such container: dev-pdm-****_svc.0dzf61tv3fll0iqp0vj1z3tpw.65vm31fh0crhhkixmmdebjiip" module=node/agent/worker/taskmanager node.id=0dzf61tv3fll0iqp0vj1z3tpw service.id=olnxiyyqdej0ogktr7rzqicek task.id=65vm31fh0crhhkixmmdebjiip
Dec 16 16:46:03 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:03.670062686Z" level=warning msg="failed to deactivate service binding for container dev-pdm-****" error="No such container: dev-pdm-****.1.mr6m18hl8tx0j5bmrsmjxb7th" module=node/agent/worker/taskmanager node.id=0dzf61tv3fll0iqp0vj1z3tpw service.id=2yfc9hjsu39welospul0h7j5e task.id=mr6m18hl8tx0j5bmrsmjxb7th
Dec 16 16:46:03 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:03.672397758Z" level=warning msg="failed to deactivate service binding for container dev-pdm-****.0dzf61tv3fll0iqp0vj1z3tpw.yzeqlocemcuz1fritikcpxml8" error="No such container: dev-pdm-****.0dzf61tv3fll0iqp0vj1z3tpw.yzeqlocemcuz1fritikcpxml8" module=node/agent/worker/taskmanager node.id=0dzf61tv3fll0iqp0vj1z3tpw service.id=9o2qp0dwlm4slnxvrrdfd7d3x task.id=yzeqlocemcuz1fritikcpxml8
Dec 16 16:46:03 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:03.764228639Z" level=error msg="fatal task error" error="node is missing network attachments, ip addresses may be exhausted" module=node/agent/worker/taskmanager node.id=0dzf61tv3fll0iqp0vj1z3tpw service.id=9o2qp0dwlm4slnxvrrdfd7d3x task.id=ca3ibg3nspmc58m8e4gwbjhur
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: time="2020-12-16T16:46:04.018753031Z" level=error msg="Container not cleaned up from containerd from previous run" container=8a44e9b65563075465faa84b74b3f8498fe750b3b7d94757cfbd55b0e9a46759 error="id already in use"
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55bdff82444b]
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: goroutine 323 [running]:
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/pkg/plugins.(*Client).callWithRetry(0x0, 0x55be00c57986, 0x16, 0x55be01492ee0, 0xc000e1a3c0, 0xc000c7a001, 0x0, 0x0, 0x0, 0xc00057f201, ...)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/pkg/plugins/client.go:160 +0x1eb
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/pkg/plugins.(*Client).CallWithOptions(0x0, 0x55be00c57986, 0x16, 0x0, 0x0, 0x55be00fc6bc0, 0xc000fe2e40, 0x0, 0x0, 0x0, ...)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/pkg/plugins/client.go:113 +0x125
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/pkg/plugins.(*Client).Call(0x0, 0x55be00c57986, 0x16, 0x0, 0x0, 0x55be00fc6bc0, 0xc000fe2e40, 0xc00071bce8, 0x55bdff33aeba)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/pkg/plugins/client.go:102 +0x8d
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/daemon/logger.(*logPluginProxy).Capabilities(0xc000c8a050, 0xc0008aed80, 0x2, 0xc000c7a000)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/daemon/logger/proxy.go:81 +0x82
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/daemon/logger.makePluginCreator.func1(0xc0008c1440, 0xc0008be380, 0x40, 0xc0000ae1e0, 0x59, 0xc0008ba6e0, 0x12, 0x55be0235b548, 0x0, 0x0, ...)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/daemon/logger/plugin.go:93 +0x38e
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/container.(*Container).StartLogger(0xc000564480, 0xc000c2c180, 0x55be014508c0, 0xc00071c1a8, 0x55bdff33aeba)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/container/container.go:403 +0x278
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/container.(*Container).startLogging(0xc000564480, 0x6, 0xc000c2c120)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/container/container.go:614 +0x52
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/container.(*Container).InitializeStdio(0xc000564480, 0xc0008ac000, 0x7fe4505c7d98, 0xc000c2c060, 0x7fe4505c7d98, 0xc000c2c120)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/container/container.go:649 +0x31
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/libcontainerd/remote.(*client).createIO(0xc000aad2c0, 0xc000fe4c80, 0xc000120500, 0x40, 0x55be00c3463a, 0x4, 0xc000e67aa0, 0xc000c8ada0, 0x0, 0x0, ...)
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]:         /builddir/build/BUILD/docker-ce-4484c46d9d1a2d10b8fc662923ad586daeedb04f/components/engine/_build/src/github.com/docker/docker/libcontainerd/remote/client.go:624 +0x97
Dec 16 16:46:04 d-pdm-c2-3 dockerd[780]: github.com/docker/docker/libcontainerd/remote.(*client).Start.func2(0xc000120500, 0x40, 0xc000c8aec0, 0x55be014edb00, 0xc0001aa020, 0xc000120500)

Output of rpm-ostree status:

State: idle
Deployments:
● ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 33.20201201.2.1 (2020-12-15T23:14:36Z)
                    Commit: 14870c1a241e01036c5abc9eb69ca23aceca2cf9efc2423057a88d8edfb5c6a7
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31

  ostree://fedora:fedora/x86_64/coreos/testing
                   Version: 33.20201201.2.0 (2020-12-02T13:20:32Z)
                    Commit: 60a778505110e5bc4a4ed2b7925e9c0853147c39d25dcd2eaebf981c9e0a4a39
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
@fifofonix
Copy link
Author

So indeed per the error log there is no /etc/resolv.conf post upgrade and DNS resolution is impacted accordingly.

Output of ifconfig appears normal, ie. network interfaces have acquired IP addresses as expected, and when resolvectl status is executed main external IP address acquired via DHCP shows appropriate DNS servers and domains - albeit in a different format to that on the prior version of FCOS.

dustymabe added a commit to dustymabe/fedora-coreos-streams that referenced this issue Dec 16, 2020
dustymabe added a commit to coreos/fedora-coreos-streams that referenced this issue Dec 16, 2020
@dustymabe
Copy link
Member

thanks @fifofonix - we're pausing the rollout for now while we investigate

@fifofonix
Copy link
Author

No problem. Let me know whether there is anything you want me to do my end if it might help you guys. I put a hold my end on updates anyway pending resolution.

@dustymabe
Copy link
Member

Want to hop in #fedora-coreos on freenode?

@fifofonix
Copy link
Author

Apologies, but I am embarrassed to say, especially given my age, that I have never used IRC before so there would I am sure be a learning curve.

@dustymabe
Copy link
Member

No worries. I'm trying to go through and reproduce but I'm not able to immediately do it by just starting docker and running containers.

  • Do you mind sharing your entire log from the last boot? sudo journalctl -b0.
  • Does /etc/resolv.conf exist at all? It should be a symlink: /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
  • If /etc/resolv.conf does exist what are the contents?

If you can provide a simple reproducer (i.e. a few docker command to run on 33.20201201.2.0 and then upgrade to see the problem) that would be golden!

@dustymabe
Copy link
Member

dustymabe commented Dec 16, 2020

Apologies, but I am embarrassed to say, especially given my age, that I have never used IRC before so there would I am sure be a learning curve.

I just thought of https://webchat.freenode.net/ - It makes it pretty darn simple though #fedora-coreos requires you to be registered. You can try dustymabe in #fedora-devel.

@fifofonix
Copy link
Author

fifofonix commented Dec 16, 2020

$ sudo find / -name resolv.conf | grep -v containers | grep -v overlay
find: File system loop detected; ‘/sys/kernel/debug/energy_model’ is part of the same file system loop as ‘/sys/kernel/debug’.
find: File system loop detected; ‘/sysroot/ostree/deploy/fedora-coreos/deploy/14870c1a241e01036c5abc9eb69ca23aceca2cf9efc2423057a88d8edfb5c6a7.0’ is part of the same file system loop as ‘/’.
/run/NetworkManager/resolv.conf
/run/systemd/resolve/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/plugins/cf478c703549a114c9d88765a2c39905e08ec4418c17fafca60f4df580c108a1/rootfs/etc/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/network/files/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/network/files/lb_y3m20epx198o2rtvace5oh5o0/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/network/files/lb_zmwxcy0nc7pfxh7n7pwmtdi5o/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/network/files/lb_8pty2vt3q1zlig3a5m2b281ps/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/var/lib/docker/network/files/lb_u3cig9dteibblsnnqoj3tkwwg/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/deploy/60a778505110e5bc4a4ed2b7925e9c0853147c39d25dcd2eaebf981c9e0a4a39.0/usr/etc/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/deploy/60a778505110e5bc4a4ed2b7925e9c0853147c39d25dcd2eaebf981c9e0a4a39.0/usr/lib/systemd/resolv.conf
/sysroot/ostree/deploy/fedora-coreos/deploy/60a778505110e5bc4a4ed2b7925e9c0853147c39d25dcd2eaebf981c9e0a4a39.0/etc/resolv.conf
/usr/lib/systemd/resolv.conf
/var/lib/docker/plugins/cf478c703549a114c9d88765a2c39905e08ec4418c17fafca60f4df580c108a1/rootfs/etc/resolv.conf
/var/lib/docker/network/files/resolv.conf
/var/lib/docker/network/files/lb_y3m20epx198o2rtvace5oh5o0/resolv.conf
/var/lib/docker/network/files/lb_zmwxcy0nc7pfxh7n7pwmtdi5o/resolv.conf
/var/lib/docker/network/files/lb_8pty2vt3q1zlig3a5m2b281ps/resolv.conf
/var/lib/docker/network/files/lb_u3cig9dteibblsnnqoj3tkwwg/resolv.conf

@dustymabe
Copy link
Member

short update:

We unfortnately have a known issue where if you make persistent modifications to the SELinux policy (the policy is stored in a binary file in /etc/) it means that future updates to the policy won't get applied (see coreos/rpm-ostree#27). We need to work on fixing that.

We shipped an SELinux policy update in 33.20201201.2.1 and @fifofonix didn't get the update that allowed for the resolv.conf symlink to get created (see #679) so he was seeing an SELinux denial and the /etc/resolv.conf symlink chain was broken, which led to docker failing.

@fifofonix fifofonix changed the title Testing: Potential Upgrade Issue Testing: Persistent Selinux Policy Changes Result In Docker Not Starting Dec 16, 2020
@dustymabe
Copy link
Member

dustymabe commented Dec 17, 2020

Adding a bit more information... This particular problem was ultimately due to resolv.conf being a broken symlink because of an SELinux denial:

systemd-resolved[755]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
audit[755]: AVC avc:  denied  { create } for  pid=755 comm="systemd-resolve" name=".#stub-resolv.confc418434d59d7d93a" scontext=system_u:system_r:systemd_resolved_t:s0 tcontext=system_u:object_r:systemd_resolved_var_run_t:s0 tclass=lnk_file permissive=0

However there is a broader class of problems with local SELinux policy modifications which I'll try to address in #701 and also with documentation.

@dustymabe
Copy link
Member

Since we have #701 to cover the remaining work left to do and new documentation about the issue I'm going to close this one out.

Thanks @fifofonix for helping us identify the issue so we could point people in the right direction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants