Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i386 unstable and testing fail. #97

Closed
rdebath opened this issue Jul 14, 2020 · 12 comments
Closed

i386 unstable and testing fail. #97

rdebath opened this issue Jul 14, 2020 · 12 comments

Comments

@rdebath
Copy link

rdebath commented Jul 14, 2020

On an amd64 host run this...

# docker run --platform=i386 -it --rm  debian:sid
root@488201dc21cf:/# apt update && apt upgrade
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main i386 Packages [8227 kB]
Fetched 8373 kB in 7s (1239 kB/s)
...
Checking init scripts...
Nothing to restart.
sleep: cannot read realtime clock: Operation not permitted
dpkg: error processing package libc6:i386 (--configure):
 installed libc6:i386 package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 libc6:i386
E: Sub-process /usr/bin/dpkg returned an error code (1)
root@488201dc21cf:/# 

Currently buster and bullseye are still working fine.

@tianon
Copy link
Contributor

tianon commented Jul 14, 2020

Looks like the postinst script for one of the packages being upgraded is trying to do something the default security policy for the container forbids.

This is why https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get specifically recommends against using apt-get upgrade in containers (the packages that come pre-installed in the image are much more likely to try to do privileged things during upgrade than most others, given their low-level nature).

@rdebath
Copy link
Author

rdebath commented Jul 14, 2020

That's all nice and easy; problem is the "privileged" instructions are sleep and touch.
For example ...

$ sudo debootstrap  '--arch=i386' '--variant=minbase' '--components=main,contrib,non-free' unstable chroot
I: Retrieving InRelease
I: Checking Release signature
...
I: Unpacking the base system...
I: Base system installed successfully.
$ sudo docker build -t testimage -f -<Dockerfile.min chroot
Sending build context to Docker daemon  201.9MB
Step 1/3 : FROM scratch
 --->
Step 2/3 : COPY / /
 ---> 720086fe6dad
Step 3/3 : CMD ["bash"]
 ---> Running in 99eb75e3b237
Removing intermediate container 99eb75e3b237
 ---> 5146b0da94ad
Successfully built 5146b0da94ad
Successfully tagged testimage:latest
$ docker run -it --rm testimage
root@c54966f75dbd:/# sleep 2
sleep: cannot read realtime clock: Operation not permitted
root@c54966f75dbd:/# touch xyzzy
touch: setting times of 'xyzzy': Operation not permitted
root@c54966f75dbd:/# 

@rdebath rdebath changed the title i386 unstable fails to update i386 unstable fails. Jul 14, 2020
@tianon
Copy link
Contributor

tianon commented Jul 14, 2020

Doh, my bad, sorry! That's a hairy one. Did libc6 on i386 get updated to use some newer syscalls that are blocked by the default seccomp profile or something? (Does it work if you do --security-opt seccomp:unconfined?)

@tianon
Copy link
Contributor

tianon commented Jul 14, 2020

Confirmed it does work with --security-opt seccomp:unconfined, so this is probably a bug in Docker itself needing newer i386 syscalls in the default seccomp profile, I think?

@tianon
Copy link
Contributor

tianon commented Jul 14, 2020

Did an strace from my host and found the following:

[pid 24079] syscall_0x197(0, 0, 0xffce724c, 0xffce725c, 0xffce724c, 0xffce725c) = -1 EPERM (Operation not permitted)

Which is not great, so I redid the same strace from within the affected container (using sid's strace), and got the following better output:

[pid   223] clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0xffb954ac) = -1 EPERM (Operation not permitted)

cc @thaJeztah @justincormack

@tianon
Copy link
Contributor

tianon commented Jul 14, 2020

Ah, sorry should've done my research on moby/moby first; it appears it's "fixed" in moby/moby#40739, backported to 19.03 in moby/moby#40769, but not quite in a release yet.

@rdebath
Copy link
Author

rdebath commented Jul 15, 2020

Thanks for that; I didn't know about --security-opt seccomp:unconfined.
I also hadn't noticed that the Github open issues count would change to have a 'k' in it, talk about "volatile".

So I'll be waiting for a 19.03.XX. Hmm, 3.5 months so far.

@thaJeztah
Copy link

The change from moby/moby#40739 should be in the Docker 19.03.9 (and up) release, but I'm also seeing the issue on docker 19.03.12, when running docker on kernel 4.19 (docker desktop). If I run the same on a Ubuntu 20.04 machine (kernel 5.4.0-29-generic, libseccomp2 version 2.4.3-1ubuntu1), it works.

I suspect the problem here is the libseccomp version on the host; these syscalls were added in kernel 5.x, and if the host is running an older kernel (and older version of libseccomp), then libseccomp doesn't know about them, and it defaults to blocking unknown syscalls (see moby/moby#40734)

@rdebath
Copy link
Author

rdebath commented Jul 15, 2020

It looks like the package libseccomp2=2.4.3-1+b1 from Debian testing is okay too. I don't expect the kernel version to matter (I'm using 4.19).

But the Buster backport libseccomp2=2.4.1-2~bpo10+1 is not good enough.

Of course, IMO, Docker should be using a profile where unknown system calls return ENOSYS.

@thaJeztah
Copy link

It's possible to provide your own profile to use instead of the default.

IIRC, the reason for blocking unknown syscalls by default is to prevent situations where a new, potentially harmful, syscall is added and the profile does not yet take that syscall into account (to either block it or to explicitly allow it).

@rdebath
Copy link
Author

rdebath commented Jul 15, 2020

@thaJeztah I didn't say that unknown system calls should be allowed. They I agree they should be blocked.
It's just that Docker isn't the normal unchanging binary that the standard security profiles are designed for. There are in fact three states that Docker wants:

  • Syscall is okay
  • Syscall is a problem and must be blocked.
  • Syscall will always return ENOSYS (AFAIK) and so should do that.

If the kernel knows differently about the last one some sort of logging might be needed; but allowing it is not an option.

Thinking about it I don't know that such a profile is possible; but I would expect it to be so because newer versions of syscalls tend to have more/better functionality, so it's very reasonable that the old one is fine but the new one is unsafe and should be blocked quietly so the C library will always fall back.

But this is getting off topic!

@tianon
Copy link
Contributor

tianon commented Nov 24, 2020

To bring this back around to a close, this is an issue with the host, specifically either/both of not quite new enough Docker and/or not quite new enough libseccomp. Specifically:

  • Docker version 19.03.9 or newer
  • libseccomp version 2.4.2 or newer

The latter is probably newer than most host distribution's stable channels have available, although in many cases available via alternative channels (such as Debian's backports).

Unfortunately, this is not something we can fix in the image. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants