Volume mounting ownership inconsistent between Docker & Podman #22571

BarDweller · 2024-05-01T20:44:10Z

Issue Description

Volume ownership within Podman is different to Docker, and has unexpected behaviors that are detrimental to designing container images that will work on both platforms.

The issue is explained with a test rig at https://github.com/BarDweller/mountperms

Steps to reproduce the issue

The issue is explained with a test rig at https://github.com/BarDweller/mountperms

Describe the results you received

Mounted volume ownership can ignore ownership of mountpoint directories in the container image, and does not behave consistently with Docker when mounting volumes to non-existing mountpoints.

Describe the results you expected

Mounted volume ownership should be predictable, and ideally match Dockers.

podman info output

$ podman info
host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc39.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 97.46
    systemPercent: 0.57
    userPercent: 1.97
  cpus: 2
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: workstation
    version: "39"
  eventLogger: journald
  freeLocks: 2031
  hostname: PODMANVM
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.7.5-200.fc39.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 4063780864
  memTotal: 10923556864
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc39.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.fc39.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.fc39.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc39.x86_64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc39.x86_64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 1941434368
  swapTotal: 1989144576
  uptime: 578h 8m 19.00s (Approximately 24.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  127.0.0.1:5000:
    Blocked: false
    Insecure: true
    Location: 127.0.0.1:5000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: 127.0.0.1:5000
    PullFromMirror: ""
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/ajo1/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/ajo1/.local/share/containers/storage
  graphRootAllocated: 19769851904
  graphRootUsed: 19056824320
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 25
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/ajo1/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695838680
  BuiltTime: Wed Sep 27 14:18:00 2023
  GitCommit: ""
  GoVersion: go1.21.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Luap99 · 2024-05-02T12:44:01Z

Can you test with a newer podman?

mheon · 2024-05-02T13:55:00Z

Behavior looks basically identical with newer code, we haven't substantially changed our logic for volume init in a while.

I'll take this one, but I need to do some research on exactly what Docker does before I start coding - do they unconditionally chown/chmod a volume on mount into a container, or is it only the first time it is mounted into a specific container?

BarDweller · 2024-05-02T21:24:20Z

From the tests I've run so far..

The ownership/permissions apply only to the mountpoint, not to the volume content. Eg, if an image has a directory /mountpoint owned by usera:groupa then, regardless of the executing userid of a container based on the image, and regardless of if the volume has been mounted previously to other containers based on different images that gave the volume different ownership, any containers based from that image with a volume mounted at /mountpoint will have /mountpoint within the container owned by usera:groupa

If the image does not contain the directory /mountpoint then with docker, mounting a fresh volume at /mountpoint results in /mountpoint being owned by root within the container. If the volume has been mounted previously to another container, then the ownership of /mountpoint is taken from the last time it was mounted. (Eg, mount a fresh volume to /ownedbya in a container based on an image where /ownedbya was owned by usera:groupa, and then subsequently mount that same volume to a new container where the mountpoint does not exist in the image, and you find the mountpoint /doesnotexist will have ownership usera:groupa)

If a volume is mounted to successive containers, with docker, the mountpoint of the volume in each successive container is based on those rules. Eg, the ownership always obeys the intent of the image of each container. With podman, currently the volume maintains ownership from the first time it was ever mounted.

I haven't done any testing wrt behavior if a volume is mounted concurrently to multiple containers with conflicting ownership/permissions.. I would strongly expect that each container would have their own mountpoint with the ownership/permission dictated by their respective images. In the case where the image doesn't have the mountpoint, I'd expect that to be undefined ;)

Content in the volume is never affected by any of this, if you create files in a volume as root/usera/userb then the files retain their ownership/permissions regardless of which mountpoint they are mounted to. There might be some fun here with setgid on mountpoints in images, and how that behaves if two containers request conflicting ownerships for the same volume..

This is done to better match Docker's behavior. Our current behavior is to only chown a volume once, the first time it is mounted into a container; that doesn't match what Docker does. Docker chowns every time a container mounts a volume, to ensure it is always accessible to the last container to mount it. Fortunately, this is a relatively easy fix. We have a bool in volume state as to whether a volume needs to be chowned, which we set to false when the volume was first chowned; so just stop doing that. I was really hoping to eliminate the NeedsChown bool entirely, but things are a bit of a mess. There are some cases where we do want to completely inhibit chowning, like when the user explicitly requests that a volume be a specific UID and GID. Unfortunately, those are both ints, not *int, so we can't tell whether they were actually set, and the solution we were using was to force the NeedsChown bool to false - so we really need to keep it around. And even if we could change volume UID/GID to pointers, we're setting them in places we really shouldn't be - for example, container anonymous volumes set UID/GID to current container UID/GID, despite the fact that the volume will immediately chown itself from that to match the directory it's mounting over? In short, the current code's a mess and I don't see a way to fix it without breaking changes. Fixes containers#22571 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

mheon · 2024-05-16T13:59:20Z

@BarDweller Any chance you can test #22727

When an empty volume is mounted into a container, Docker will chown that volume appropriately for use in the container. Podman does this as well, but there are differences in the details. In Podman, a chown is presently a one-and-done deal; in Docker, it will continue so long as the volume remains empty. Mount into a dozen containers, but never add content, the chown occurs every time. The chown is also linked to copy-up; it will always occur when a copy-up occurred, despite the volume now not being empty. This PR changes our logic to (mostly) match Docker's. For some reason, the chowning also stops if the volume is chowned to root at any point. This feels like a Docker bug, but as they say, bug for bug compatible. In retrospect, using bools for NeedsChown and NeedsCopyUp was a mistake. Docker isn't actually tracking this stuff; they're just doing a copy-up and permissions change unconditionally as long as the volume is empty. They also have the two linked as one operation, seemingly, despite happening at very different times during container init. Replicating that in our stateful system is nontrivial, hence the need for the new CopiedUp field. Basically, we never want to chown a volume with contents in it, except if that data is a result of a copy-up that resulted from mounting into the current container. Tracking who did the copy-up is the easiest way to do this. Fixes containers#22571 Signed-off-by: Matthew Heon <matthew.heon@pm.me>

BarDweller added the kind/bug Categorizes issue or PR as related to a bug. label May 1, 2024

mheon self-assigned this May 2, 2024

mheon mentioned this issue May 16, 2024

Always chown volumes when mounting into a container #22727

Merged

openshift-merge-bot bot closed this as completed in #22727 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume mounting ownership inconsistent between Docker & Podman #22571

Volume mounting ownership inconsistent between Docker & Podman #22571

BarDweller commented May 1, 2024

Luap99 commented May 2, 2024

mheon commented May 2, 2024

BarDweller commented May 2, 2024

mheon commented May 16, 2024

Volume mounting ownership inconsistent between Docker & Podman #22571

Volume mounting ownership inconsistent between Docker & Podman #22571

Comments

BarDweller commented May 1, 2024

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented May 2, 2024

mheon commented May 2, 2024

BarDweller commented May 2, 2024

mheon commented May 16, 2024