Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LXD 5.0.3/5.19 fail to start nested LXC container with lxc.rootfs.options = ro #12473

Closed
6 tasks
morphis opened this issue Oct 27, 2023 · 11 comments
Closed
6 tasks
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@morphis
Copy link
Contributor

morphis commented Oct 27, 2023

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • The output of "lxc info" or if that fails:
    • Kernel version: 6.2.0-35-generic #35~22.04.1-Ubuntu
    • LXC version: 5.0.3
    • LXD version: git-7f8a581 (from 5.0/edge)
    • Storage backend in use: zfs

Issue description

Starting with what is currently in 5.0/edge and 5.19 we see in Anbox Cloud our nested Android container failing to start. The problem cannot be reproduced with the same kernel and host OS and LXD 5.0.2 which makes it highly likely that a regression has been introduced in either LXC or LXD.

The log of the nested LXC container:

lxc 20231026175936.914 TRACE    mount_utils - ../subprojects/lxc/src/lxc/mount_utils.c:can_use_bind_mounts:607 - Kernel supports bind mounts in the new mount api
lxc 20231026175936.914 ERROR    mount_utils - ../subprojects/lxc/src/lxc/mount_utils.c:__fd_bind_mount:382 - Operation not permitted - Failed to change mount attributes
lxc 20231026175936.914 ERROR    dir - ../subprojects/lxc/src/lxc/storage/dir.c:dir_mount:195 - Operation not permitted - Failed to mount "/var/lib/anbox/rootfs" onto "/var/lib/anbox/lxc/rootfs"
lxc 20231026175936.914 ERROR    conf - ../subprojects/lxc/src/lxc/conf.c:lxc_mount_rootfs:1433 - Failed to mount rootfs "/var/lib/anbox/rootfs" onto "/var/lib/anbox/lxc/rootfs" with options "ro"
lxc 20231026175936.914 ERROR    conf - ../subprojects/lxc/src/lxc/conf.c:lxc_setup_rootfs_prepare_root:3993 - Failed to setup rootfs for

Attaching with strace on the process using liblxc to start the container I can see that the mount_setattr syscall receives EPERM:

1432 mount_setattr(20, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=MOUNT_ATTR_NOATIME|MOUNT_ATTR_STRICTATIME|0x40, propagation=0 /* MS_??? */, userns_fd=0}, 32) = -1 EPERM (Operation not permitted)

Doing the same on a container running on LXD 5.0.2 I see the same syscall succeeding:

2348 mount_setattr(20, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=MOUNT_ATTR_NOATIME|MOUNT_ATTR_STRICTATIME|0x40, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0

The configuration of the nested container looks like this:

lxc.container.conf ``` lxc.mount.auto = proc:rw sys:rw cgroup:mixed:force lxc.hook.mount = /usr/lib/x86_64-linux-gnu/anbox/lxc/mount-hook lxc.autodev = 1 lxc.pty.max = 1024 lxc.tty.max = 0 lxc.uts.name = localhost lxc.signal.halt = SIGPWR lxc.hook.version = 1 lxc.hook.post-stop = /usr/bin/anbox call-hook --id=0 --hook=stop lxc.hook.start-host = /usr/bin/anbox call-hook --id=0 --hook=start-host lxc.rootfs.path = /var/lib/anbox/rootfs lxc.rootfs.options = ro lxc.environment = PATH=/product/bin:/apex/com.android.runtime/bin:/apex/com.android.art/bin:/system_ext/bin:/system/bin:/system/xbin:/odm/bin:/vendor/bin:/vendor/xbin lxc.init.cmd = /vendor/anbox/bin/anbox-init --api-level=31 --max-uid=4000000 --enable-cgroup-emulation lxc.log.level = 0 lxc.log.file = /var/lib/anbox/logs/container.log lxc.console.logfile = /var/lib/anbox/logs/console.log lxc.console.rotate = 1 lxc.prlimit.nproc = 6679 lxc.prlimit.nofile = 32768 lxc.prlimit.nice = 40:40 lxc.proc.oom_score_adj = -900 lxc.net.0.type = veth lxc.net.0.flags = up lxc.net.0.name = eth0 lxc.net.0.link = anbox0 lxc.net.1.type = veth lxc.net.1.flags = down lxc.net.1.name = vw0 lxc.net.1.link = anbox0 lxc.apparmor.profile = anbox_container lxc.seccomp.allow_nesting = 1 lxc.seccomp.profile = /usr/lib/x86_64-linux-gnu/anbox/seccomp/container.sc lxc.seccomp.notify.proxy = unix:/run/anbox-container-seccomp.socket lxc.seccomp.notify.cookie = 0 lxc.idmap = u 0 100000 2500 lxc.idmap = g 0 100000 2500 lxc.idmap = u 2500 1000 1 lxc.idmap = g 2500 1000 1 lxc.idmap = u 2501 102501 3997499 lxc.idmap = g 2501 102501 3997499 lxc.mount.entry = /var/lib/anbox/cache /var/lib/anbox/rootfs/cache none bind,create=dir 0 0 lxc.mount.entry = /var/lib/anbox/data /var/lib/anbox/rootfs/data none bind,create=dir,rbind,nosuid,nodev 0 0 lxc.mount.entry = /run/user/1000/anbox/sockets/qemu_pipe /var/lib/anbox/rootfs/dev/qemu_pipe none bind,create=file 0 0 lxc.mount.entry = /run/user/1000/anbox/sockets/anbox_bridge /var/lib/anbox/rootfs/dev/anbox_bridge none bind,create=file 0 0 lxc.mount.entry = /run/user/1000/anbox/sockets/anbox_audio /var/lib/anbox/rootfs/dev/anbox_audio none bind,create=file 0 0 lxc.mount.entry = /run/user/1000/anbox/input /var/lib/anbox/rootfs/dev/input none bind,create=dir 0 0 lxc.mount.entry = /run/user/1000/anbox/sockets/compositor /var/lib/anbox/rootfs/dev/anbox_compositor none bind,create=file 0 0 lxc.mount.entry = /run/user/1000/anbox/anbox.xml /var/lib/anbox/rootfs/vendor/etc/permissions/anbox.xml none bind,create=file 0 0 lxc.mount.entry = /dev/fuse /var/lib/anbox/rootfs/dev/fuse none bind,create=file 0 0 lxc.mount.entry = /dev/net/tun /var/lib/anbox/rootfs/dev/tun none bind,create=file 0 0 lxc.mount.entry = /dev/binderfs/binder0 /var/lib/anbox/rootfs/dev/binder none bind,create=file 0 0 lxc.mount.entry = /dev/binderfs/binder1 /var/lib/anbox/rootfs/dev/vndbinder none bind,create=file 0 0 lxc.mount.entry = /dev/binderfs/binder2 /var/lib/anbox/rootfs/dev/hwbinder none bind,create=file 0 0 lxc.mount.entry = /usr/lib/x86_64-linux-gnu/anbox/android-bin /var/lib/anbox/rootfs/vendor/anbox/bin none bind,create=dir 0 0 lxc.mount.entry = /dev/dri-android/renderD128 /var/lib/anbox/rootfs/dev/dri/renderD128 none bind,create=file 0 0 lxc.mount.entry = /dev/android_sync /var/lib/anbox/rootfs/dev/sw_sync none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/cpuinfo /var/lib/anbox/rootfs/proc/cpuinfo none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/diskstats /var/lib/anbox/rootfs/proc/diskstats none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/loadavg /var/lib/anbox/rootfs/proc/loadavg none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/meminfo /var/lib/anbox/rootfs/proc/meminfo none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/stat /var/lib/anbox/rootfs/proc/stat none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/swaps /var/lib/anbox/rootfs/proc/swaps none bind,create=file 0 0 lxc.mount.entry = /var/lib/lxcfs/proc/uptime /var/lib/anbox/rootfs/proc/uptime none bind,create=file 0 0 lxc.mount.entry = /var/lib/anbox/state/default.prop /var/lib/anbox/rootfs/vendor/build.prop none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/power_supply /var/lib/anbox/rootfs//sys/class/power_supply none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/system_cpu_available /var/lib/anbox/rootfs//sys/devices/system/cpu/online none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/system_cpu_available /var/lib/anbox/rootfs//sys/devices/system/cpu/present none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/system_cpu_available /var/lib/anbox/rootfs//sys/devices/system/cpu/possible none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_cmdline /var/lib/anbox/rootfs/proc/cmdline none bind,ro 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_vm_mmap_rnd_bits /var/lib/anbox/rootfs/proc/sys/vm/mmap_rnd_bits none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_vm_mmap_rnd_compat_bits /var/lib/anbox/rootfs/proc/sys/vm/mmap_rnd_compat_bits none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_kptr_restrict /var/lib/anbox/rootfs/proc/sys/kernel/kptr_restrict none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_sched_schedstats /var/lib/anbox/rootfs/proc/sys/kernel/sched_schedstats none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_hung_task_timeout_secs /var/lib/anbox/rootfs/proc/sys/kernel/hung_task_timeout_secs none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_panic_on_oops /var/lib/anbox/rootfs/proc/sys/kernel/panic_on_oops none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_sched_child_runs_first /var/lib/anbox/rootfs/proc/sys/kernel/sched_child_runs_first none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_randomize_va_space /var/lib/anbox/rootfs/proc/sys/kernel/randomize_va_space none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_vm_mmap_min_addr /var/lib/anbox/rootfs/proc/sys/vm/mmap_min_addr none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_sched_rt_runtime_us /var/lib/anbox/rootfs/proc/sys/kernel/sched_rt_runtime_us none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_sched_rt_period_us /var/lib/anbox/rootfs/proc/sys/kernel/sched_rt_period_us none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/proc_sys_kernel_sysrq /var/lib/anbox/rootfs/proc/sys/kernel/sysrq none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/sys_power_wakeup_count /var/lib/anbox/rootfs/sys/power/wakeup_count none bind,rw 0 0 lxc.mount.entry = /var/lib/anbox/state/sys_power_state /var/lib/anbox/rootfs/sys/power/state none bind,rw 0 0 ```

The rootfs is setup as a set of bind mounts

├─/var/lib/anbox/rootfs                data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/android-system]
│                                                                                                  zfs         rw,relatime,xattr,posixacl
│ ├─/var/lib/anbox/rootfs/vendor       data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/android-vendor]
│ │                                                                                                zfs         rw,relatime,xattr,posixacl
│ ├─/var/lib/anbox/rootfs/data         data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/data]
│ │                                                                                                zfs         rw,relatime,xattr,posixacl
│ └─/var/lib/anbox/rootfs/cache        data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/cache]
│                                                                                                  zfs         rw,relatime,xattr,posixacl
├─/var/lib/anbox/android-system/vendor data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/android-vendor]
│                                                                                                  zfs         rw,relatime,xattr,posixacl
├─/var/lib/anbox/data                  data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/data]
│                                                                                                  zfs         rw,relatime,xattr,posixacl
├─/var/lib/anbox/android-system/data   data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/data]
│                                                                                                  zfs         rw,relatime,xattr,posixacl
├─/var/lib/anbox/android-system/cache  data/work/lxd2/containers/spread-111-anbox-generic[/rootfs/var/lib/anbox/cache]

Steps to reproduce

I was not yet able to reproduce the problem outside of Anbox Cloud. Please contact me to receive access to a system for debugging.

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@mihalicyn mihalicyn self-assigned this Oct 27, 2023
@tomponline tomponline added this to the lxd-5.0.3 milestone Oct 27, 2023
@tomponline tomponline added the Bug Confirmed to be a bug label Oct 27, 2023
@mihalicyn
Copy link
Member

Thanks for so detailed report and provided access to the VM with reproducer, Simon!

I did some tracing yesterday, and found interesting that the mount_setattr syscall fails on can_change_locked_flags check. And it means that for some reason some flags on the mount became locked. I'm still on the way to understand what could have changed to cause this effect.

@mihalicyn
Copy link
Member

Suspicious change is 6a88b8a

We did an experiment with Simon and changed rootfs mount flags from "ro" to "ro,noatime" to be consistent with:

local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/rootfs type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/rootfs/vendor type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/android-system/vendor type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/data type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/rootfs/data type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/android-system/data type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/rootfs/cache type zfs (rw,noatime,xattr,posixacl)
local/containers/ams-cku0huodp8e108su1ee0 on /var/lib/anbox/android-system/cache type zfs (rw,noatime,xattr,posixacl)

As we can see from this mount list zfs was mounted with "noatime" mount option. Then, during the LXC container setup LXC tries to use /var/lib/anbox/rootfs mount as a rootfs inside the container. But we have lxc.rootfs.options = ro specified, which means that the final mount should have ro as only mount option. Which means that we need to change noatime flag to atime flag. And here... can_change_locked_flags function comes into play:

static bool can_change_locked_flags(struct mount *mnt, unsigned int mnt_flags)
{
	unsigned int fl = mnt->mnt.mnt_flags;

//...

	if ((fl & MNT_LOCK_ATIME) &&
	    ((fl & MNT_ATIME_MASK) != (mnt_flags & MNT_ATIME_MASK))) // <<<< BOOM
		return false;

	return true;
}

atime mount attribute was locked, because ZFS mount was originally mounted in one user namespace, but then mount was moved to another one. This cross-userns mount copy makes specific mount attributes to be locked. So, after that you can't change the mount that was mounted with noatime to be atime. At the same time you can't change the mount that was mounted as ro to be rw, BUT you CAN change the mount that was mounted as rw to be ro!

Explanation is that after this fix 6a88b8a ZFS started to be mounted with noatime flag, but before that it was not. And this behavior change reveals this security limitation.

mihalicyn added a commit to mihalicyn/lxd that referenced this issue Nov 1, 2023
…ption"

This reverts commit a56e5c5.

Related to canonical#12473

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@tomponline
Copy link
Member

tomponline commented Nov 2, 2023

@mihalicyn can we close this now that #12492 is merged for stable-5.0 so as not introduce new behavior in the LTS, but for main branch is it still a bug or we can close?

@mihalicyn
Copy link
Member

For the main branch @morphis will fix Anbox side to make it work properly in any configuration. I think that before closing this we need to make a new 5.0-stable/edge snap so Simon will be able to verify that issue is fixed.

@tomponline
Copy link
Member

It should be in 5.0/edge by now to be tested.

@tomponline
Copy link
Member

tomponline commented Nov 2, 2023

Ah except the doc tests are blocking the build on stable-5.0 branch https://github.com/canonical/lxd/actions/runs/6731001457

Run make doc-lint
doc/.sphinx/.markdownlint/doc-lint.sh
Failed!
.tmp/doc/api-extensions.md:2091: MD012 Multiple consecutive blank lines
.tmp/doc/howto/instances_routed_nic_vm.md:55: MD047 File should end with a single newline character
.tmp/doc/howto/network_ipam.md:39: MD012 Multiple consecutive blank lines
make: *** [Makefile:151: doc-lint] Error 1

Further documentation is available for these failures:
 - MD012: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md012---multiple-consecutive-blank-lines
 - MD002: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md002---first-header-should-be-a-top-level-header
 - MD034: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md034---bare-url-used
 - MD047: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md047---file-should-end-with-a-single-newline-character
 - MD004: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md004---unordered-list-style
 - MD005: https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md005---inconsistent-indentation-for-list-items-at-the-same-level
 - MD032: [https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md032---lists-should-be-](https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md#md032---lists-should-be-surrounded-by-blank-lines)

@ru-fu is there a PR from main branch we need to back port to fix this? If not, would you mind seeing what needs done to get thus unblocked? Thanks

@ru-fu
Copy link
Contributor

ru-fu commented Nov 3, 2023

@ru-fu is there a PR from main branch we need to back port to fix this? If not, would you mind seeing what needs done to get thus unblocked? Thanks

That should be #12378
Let me know if that doesn't fix all of it.

@tomponline
Copy link
Member

Thanks! Please could you open a PR to stable-5.0 branch?

@ru-fu
Copy link
Contributor

ru-fu commented Nov 3, 2023

Sure: #12497

@tomponline
Copy link
Member

@mihalicyn @morphis @simondeziel please can you let me know if fixed in 5.0/edge, thanks.

@morphis
Copy link
Contributor Author

morphis commented Jan 5, 2024

@tomponline Yes, this is fixed in 5.0/edge. We also landed detection logic in Anbox Cloud to make things work for 5.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

4 participants