Skip to content

Use mount_setattr() on newer kernels, instead of walking mount hierarchy the hard way #754

@smcv

Description

@smcv

bubblewrap currently uses the traditional mount(2) API for everything. This is not a great interface for container'y things, and attempts to improve its performance end up being a lot of code (#629, #630).

In particular, bubblewrap wants to be able to mount an entire directory hierarchy with options, like for example bwrap ... --ro-bind /foo /foo, which recursively mounts /oldroot/foo onto /newroot/foo with nosuid, nodev, ro (where /oldroot is the host's root directory, and /newroot is a temporary new root directory that we will later pivot into to run the container). But this is not actually possible at the kernel level when using the old mount(2) API. Instead, you have to do it in two steps:

  1. The equivalent of mount -o rbind /oldroot/foo /newroot/foo
  2. Iterate over the mount table looking for each mount point below /newroot/foo (including /newroot/foo itself), and remount them all with the equivalent of nosuid,nodev,ro

where the second step has O(n) complexity.

Similarly bwrap --bind mounts with the equivalent of nosuid,nodev, and even bwrap --dev-bind mounts with nosuid.

Remounting a hierarchy of mount points nosuid, nodev and/or ro in this way is very inefficient: it ends up as O(n²) in the number of --bind-like arguments (#384).

Remounting a hierarchy of mount points nosuid, nodev and/or ro is also not robust: it can fail if a race condition is hit (#650). #535 will mitigate this, but will not fully fix it.

If someone can propose a PR extending bubblewrap to use mount_setattr() to replace the way we change the attributes of a directory hierarchy, I think that would probably be a good improvement. If the kernel is too old, it'll fail with something like ENOSYS or EINVAL and we can fall back to how it's currently done. There's a sketch of this in #650 (comment), but I haven't tested it or verified its correctness.

A possible follow-up would be to increase the minimum kernel version to one that implements mount_setattr() (Linux 5.12 if I'm reading man pages correctly), but that would mean abandoning the ability to run bubblewrap (and therefore Flatpak, the Steam Runtime, etc.) on older distros like RHEL 8 and Ubuntu 20.04, which is something we have historically tried hard not to do. If that is done, then I think it should be a separate PR and a separate issue.

Another possible follow-up would be to use the rest of the "new mount API" family, for example implementing --bind-fd by using open_tree(2), mount_setattr(2), mount_move(2). Again, that should be a separate issue.

[Tracked as steamrt/tasks#1006 elsewhere]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions