Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman failed to destroy BTRFS snapshot on container delete #3963

Closed
SwitchedToGitlab opened this issue Sep 6, 2019 · 29 comments
Closed

Podman failed to destroy BTRFS snapshot on container delete #3963

SwitchedToGitlab opened this issue Sep 6, 2019 · 29 comments

Comments

@SwitchedToGitlab
Copy link

@SwitchedToGitlab SwitchedToGitlab commented Sep 6, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

bug

Description

ROOTLESS Podman fails to delete BTRFS subvolumes when building an image or deleting a container. This causes a cascade of errors, such as container name re-use errors, as podman believes the container was removed when using podman ps -a however when attempting to re-run the podman run command the user will receive an name re-use error message.

Using SUDO this works just fine.

I would very much assume that this is a configuration issue on my part somewhere, as without privilege elevation using sudo I cannot delete the specified BTRFS by hand using btrfs su delete <path to subvolume> either.

Thanks in advance for your help.

Steps to reproduce the issue:

  1. Build a rootless image using the BTRFS driver.

  2. Get error message listed below.

Describe the results you received:

ERRO[4128] error deleting build container "de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906": Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted 
Error: Failed to destroy btrfs snapshot /home/sbrady/.local/share/containers/storage/btrfs/subvolumes for 561da8272542ab2a71977655a51b5d20c20627fd3a917165d1fe89b0370f4f93: operation not permitted

Describe the results you expected:

BTRFS subvolumes to be deleted on container deletion.

Additional information you deem important (e.g. issue happens only occasionally):

Consistent regardless of the image.

Output of podman version:

podman version 1.5.1

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.9
  podman version: 1.5.1
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: conmon-2.0.0-2.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.0, commit: unknown'
  Distribution:
    distribution: '"opensuse-tumbleweed"'
    version: "20190904"
  MemFree: 1173884928
  MemTotal: 8254943232
  OCIRuntime:
    package: runc-1.0.0~rc8-1.4.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8
      spec: 1.0.1-dev
  SwapFree: 2141966336
  SwapTotal: 2147483648
  arch: amd64
  cpus: 4
  eventlogger: file
  hostname: rocinante
  kernel: 5.2.11-1-default
  os: linux
  rootless: true
  uptime: 2h 31m 17.3s (Approximately 0.08 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
store:
  ConfigFile: /home/sbrady/.config/containers/storage.conf
  ContainerStore:
    number: 18
  GraphDriverName: btrfs
  GraphOptions: null
  GraphRoot: /home/sbrady/.local/share/containers/storage
  GraphStatus:
    Build Version: 'Btrfs v5.2.1 '
    Library Version: "102"
  ImageStore:
    number: 16
  RunRoot: /var/run/user/1000/containers
  VolumePath: /home/sbrady/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.5.1-1.1.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Bare metal install on Intel i7 and spinning rust HDD.

Pastes of Storage.conf and libpod.conf

➜ la ~/.config/containers
total 40K
-rw-r--r-- 1 sbrady users 4.4K Sep  6 14:23 libpod.conf
-rw-r--r-- 1 sbrady users  205 Aug  9 13:30 mounts.conf
drwxr-xr-x 1 sbrady users   14 Aug  9 13:30 oci/
-rw-r--r-- 1 sbrady users  256 Aug  9 13:30 policy.json
-rw-r--r-- 1 sbrady users 1.1K Aug  9 13:30 registries.conf
drwxr-xr-x 1 sbrady users    0 Aug  9 13:30 registries.d/
-rw-r--r-- 1 sbrady users  12K Aug  9 13:30 seccomp.json
-rw-r--r-- 1 sbrady users 5.0K Sep  6 13:21 storage.conf
/home/sbrady/.local/share/containers
├── cache
│   └── blob-info-cache-v1.boltdb
└── storage
    ├── btrfs
    │   └── subvolumes
    ├── btrfs-containers
    │   ├── 1e96b0289a1d5ac651f5c521d325dd2911f73b2a6dffbcbfb2e982310ffbaf05
    │   ├── 30e2a4ea384aa36e4a9d5313a89a47efca248f89a6e134f71f0aa952b536b51c
    │   ├── 30f3976b8ab272bd229a770b0f0e9807ad8b00798178a6732909da3899308935
    │   ├── 3886c1a160034a4c7cae0c59b1a3ee93e0371b837bae4072336b2df16cd2a4cc
    │   ├── 3dd0a9edede5dd4fa3a5333665fcb69f45235e5456bb781863f39e41fe0047b7
    │   ├── 4f8e7486fbc706acaa2675f3cb8afa32a5e2b742810fea560c740ff7ddadfb86
    │   ├── 6e0b92b86a79fb1e8cc6d6c68f7c8e82d9eb3ef96b758c5d3b0a850d5ecdd30a
    │   ├── 7051dd05a2a3c0712b92a3d5277b17beedb221c29c7859f7279e778300cc0239
    │   ├── 812bfe158ec304e077e6d2e05eb7d3f9f01631e3bc7c6fc49c01d205117bba8f
    │   ├── 9e8c3f9cb6b6346f808a89e417d3b365f737cd849ae4e8c41b388f6740640be3
    │   ├── b10003959017ff33e909df830fb7acbd87c081bfcf53f997aba6c7afe5040ded
    │   ├── c20462222981918a27e00267ba8ba0118c1064b9d2728b529c1ff0c9d75cb238
    │   ├── c8bf5970068ed878adadacc10a8e04fd3471f16fa5f95750f501bc9c50cf596b
    │   ├── ca09086e22391e95198dd7aa5abc34f168cd64132488e74042c3aaa7860f162c
    │   ├── containers.json
    │   ├── containers.lock
    │   ├── d2878a44fb5ceed65a97f2a3c98f50cc7ec1f4ba02b672d8708278f9c9d1c2b9
    │   ├── de8886e87ab7c7e667426e84a096695f4d434fe8ed42149fb157e7b9a398b906
    │   ├── e0780ebf9f35bb28407986c191257acdb529d9aa198ca2e5132f06390ac3bf0c
    │   ├── e7315b2231cfaefb7991b403ffb18d60d26e25eb1950a090155b3c3be776ea19
    │   └── fea6040e4ffbb6fcf85570aee5475fb2a03853a540b8aa89be2d945c6af64290
    ├── btrfs-images
    │   ├── 0868e92e943cba2ce2ed3b5705d9dcd4adeec9da9088ab69b3c44af199072b3c
    │   ├── 0ed5811d6d9c68658a20eb354b1917bbf5af162c773eb4edf6168fae00ff09f6
    │   ├── 172b73ede26844c52a903bb4e905f18ecbdc1227a6e32b86eb66b60ce999224f
    │   ├── 18ffcb379eccf2d03f71066d45bf3d9c6078c8dc2de843eb41361f22aa8e8430
    │   ├── 23b52ed766eb03c4151be9e41a0eb2fdce003c910982665b7b92a468bca1c3b3
    │   ├── 337ab92e6b8b755823c3363b11948f13441bcb2811988617793642dd1c5c0ac9
    │   ├── 5e09dd17175ecdff0b478333a6d5f444ea2314e5f9114a5443d7fc9fec86b834
    │   ├── 69c54a0cfa733e6fdb478b5612127feec51153ce066cff4a32f1fdce84bb8af6
    │   ├── 6d038d18f5017765cfcbb2262ae3933429e5be0c64f5d70130de8c788791e1d7
    │   ├── a8c54eebc7056cd3dfccee64c28c12d7653fbe2b1fe61159f2c08fcfb15110b0
    │   ├── b14710b9d573f363bbbad56f0ff69e79b5f229b83daaecaf25d9856a84308df7
    │   ├── b151cdb91db489ee8ab7ba84839dd420164692b3031020c8ded00436421715b6
    │   ├── cac4ae0c405ea55bed5402512d66abadac6ea01c31e650ffd86af31560ad98bc
    │   ├── ced8a8fe165881fdef10a647838751d98e0a3d7aba06316f44bdf28f86e23d25
    │   ├── e6da4025fb017e4e79e7339c5953f4c1aa247c11fe7862920d53ef8879341cfa
    │   ├── fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
    │   ├── images.json
    │   └── images.lock
    ├── btrfs-layers
    │   ├── 06b85792928655f9c05298c2104b509396a9ae72fe22f739bb004a2ff87f99d0.tar-split.gz
    │   ├── 30fa407a2912badb23e133597472bec5d233e439a497d96751f5dcfb8617894e.tar-split.gz
    │   ├── 3f194c10a561e3b694e9602b088a4e516eb19ee468992bb5cc6845c717b06b49.tar-split.gz
    │   ├── 43316bbb040759a859d32501f20dd865db10b6ee62659fd09aead1a920097982.tar-split.gz
    │   ├── 46e841fc16afde233e84ec806cb528e3d0ae0c8ead03f074aa7699f67b8f1b4c.tar-split.gz
    │   ├── 51899d997ab4c1f790759deeeaaebb03d3742c34236e5a85499eb302cea6fb7b.tar-split.gz
    │   ├── 8297fa3a5e5f1eb097422189d1f7d9046dfcc658008fe395d8b27e93a5954691.tar-split.gz
    │   ├── 866d3ed87bbbc6beb540cc52cb79f8efaec0f3a8b7cf554f65c5df8fce524ec0.tar-split.gz
    │   ├── 9dae2a8870ddfae25701156bc993448175a4cf5bbc0f537e8563e286a5a37385.tar-split.gz
    │   ├── a17440e364015841337f4b80d2a42cf79936cba443ab941525785a7e35f0c0c6.tar-split.gz
    │   ├── a2f24cc38b696cc363f1270684da6d5ae40c5b086f6f6978462e7c7f551e4341.tar-split.gz
    │   ├── af0b15c8625bb1938f1d7b17081031f649fd14e6b233688eea3c5483994a66a3.tar-split.gz
    │   ├── b6b761c5afcb8b69e34a26df1ce16be5d16b3088c838060dcc2528cc3d4cdd5f.tar-split.gz
    │   ├── d548e5ae588bc66a2250ed8312145394cb9704ecdf50153fe608b93ec3c15a0f.tar-split.gz
    │   ├── layers.json
    │   └── layers.lock
    ├── cache
    │   └── blob-info-cache-v1.boltdb
    ├── libpod
    │   └── bolt_state.db
    ├── mounts
    ├── storage.lock
    ├── tmp
    └── volumes
        ├── brave-storage-volume
        └── test
@mheon
Copy link
Collaborator

@mheon mheon commented Sep 6, 2019

Sounds like c/storage cleanup - @nalind Agree?

@mheon mheon added the kind/bug label Sep 6, 2019
@mheon
Copy link
Collaborator

@mheon mheon commented Sep 6, 2019

Actually, wait - how are we using the btrfs driver with rootless?

@giuseppe PTAL

@cmurf
Copy link

@cmurf cmurf commented Sep 8, 2019

It's a bit confusing at the moment. For historical reasons, btrfs subvolume create is permitted unprivileged, but btrfs subvolume delete requires root privilege unless the file system was mounted with option user_subvol_rm_allowed.

However, recently, rmdir(2) will delete a btrfs subvolume, without privilege escalation required, so long as the user otherwise has privilege for the subvolume and contents. I haven't benchmarked a fully populated set of subvolumes to compare btrfs subvolume delete vs rm -rf - there could be a difference if the latter causes recursive rm of files/dirs and then rmdir on the subvolume, whereas btrfs sub del calls BTRFS_IOC_SNAP_DESTROY ioctl which generally exits quickly, and cleanup happens in the background.

@rhatdan
Copy link
Member

@rhatdan rhatdan commented Sep 8, 2019

Ok, I don't truly understand what is going on here. But are we doing something wrong in the storage driver that we can fix to make rootless podman with btrfs work? Or should we simply block rootless podman with BTRFS because it will never work correctly and force users to use fuse-overlay?

@cmurf
Copy link

@cmurf cmurf commented Sep 8, 2019

If kernel 4.18+ use rm -rf
That will work as if the subvolume is a directory. Just like on any other file system, it will check every file and directory for the proper permissions, and unlinkat() each one in turn.

Optimization 1: If root, use btrfs subvolume delete.
Optimization 2: If rootless, check if the btrfs volume is mounted with option user_subvol_rm_allowed and if so then use btrfs subvolume delete

btrfs subvolume delete avoids all the traversal and recursive unlinkat(), so it's way faster.

For kernels 4.17 and older, then you could check for user_subvol_rm_allowed mount option and then permit rootless podman, otherwise disallow it.

@mheon
Copy link
Collaborator

@mheon mheon commented Sep 8, 2019

We've never actually tested the btrfs driver on rootless before, and it was never written with rootless support in mind. However, it seems like everything except cleanup is already working. Given that we seem to have a way forward there (at least on newer kernels), this sounds like a reasonable fix.

@cmurf
Copy link

@cmurf cmurf commented Sep 8, 2019

FWIW, user_subvol_rm_allowed exists since 2010, circa kernel 2.6.38. When enabled, there is only a check of subvolume owner/perm, not contents. If the user has the proper privilege for just the subvolume, btrfs subvolume delete will exit 0, everything inside instantly goes bye bye.

I'm not sure what use case would exist where a user owns the subvolume but doesn't own the contents, but...

@giuseppe
Copy link
Member

@giuseppe giuseppe commented Sep 9, 2019

I've never tried the btrfs driver and I am surprised it works with rootless.

@cmurf would it be enough if we attempt to rm -rf if the volume deletion fails with EPERM?

Would you like to work on a patch for that?

@cmurf
Copy link

@cmurf cmurf commented Sep 9, 2019

@giuseppe actually it's a good idea to try the optimized case first, since it's way faster, and then use rm -rf as the fallback. It'll take me longer to find the file to work on than it'll take an actual competent person to just patch it.

@SwitchedToGitlab
Copy link
Author

@SwitchedToGitlab SwitchedToGitlab commented Sep 12, 2019

For users experiencing this issue, what storage driver should those running Podman rootless use before BTRFS is supported?

@mheon
Copy link
Collaborator

@mheon mheon commented Sep 12, 2019

We'd recommend fuse-overlayfs

@SwitchedToGitlab
Copy link
Author

@SwitchedToGitlab SwitchedToGitlab commented Sep 12, 2019

I'm good with closing this is as "unsupported" if you'd like to track this as a pending feature elsewhere, but I'll leave that up to the dev team. Thanks for your quick responses.

@SwitchedToGitlab
Copy link
Author

@SwitchedToGitlab SwitchedToGitlab commented Sep 12, 2019

Apologies if I'm stating the obvious, but as Podman is creating the BTRFS subvolume it could easily tag the subvolume as USER_SUBVOL_RM_ALLOWED on creation, right? I'm not a Go programmer but I an read it, and it appears that support for this option could be added consistently with the other BTRFS options used on subvolume creation in func parseoptions in c/storage/drivers/btrfs/btrfs.go. This would allow the non-root user to delete BTRFS subbvolumes on supported kernels. This should cover most cases, and a verbose error message may cover the edge cases with a simple addition to code.

@mheon
Copy link
Collaborator

@mheon mheon commented Sep 12, 2019

I think that's definitely possible. The biggest issue right now is probably finding someone to work on it - most people are on fuse-overlayfs, so fixing the btrfs driver is lower priority on the storage side.

@cmurf
Copy link

@cmurf cmurf commented Sep 12, 2019

user_subvol_rm_allowed is a Btrfs mount option, not a subvolume property. It's also not a per mount point option - once you use it, it applies to all other mounts for this file system. e.g.

[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,subvolid=468,subvol=/home)
[chris@fmac ~]$ sudo mount -o remount,user_subvol_rm_allowed /home
[chris@fmac ~]$ mount | grep btrfs
/dev/sda6 on / type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=496,subvol=/root)
/dev/sda6 on /boot type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=581,subvol=/boot)
/dev/sda6 on /home type btrfs (rw,noatime,seclabel,ssd,space_cache=v2,user_subvol_rm_allowed,subvolid=468,subvol=/home)
[chris@fmac ~]$ 

With this mount option set, the user must have privilege for just the subvolume they want to delete, contents aren't checked for privileges. That lack of checking content ownership is why user_subvol_rm_allowed isn't the default behavior, but you can certainly add it to your fstab and make it a persistent mount option.

@SwitchedToGitlab
Copy link
Author

@SwitchedToGitlab SwitchedToGitlab commented Sep 12, 2019

@cmurf Yep, makes total sense now... I just found all this out for troubleshooting. I can confirm Podman works as expect with BTRFS as a storage driver when running in rootless mode.

For troubleshooters, add the user_subvol_rm_allowed flag to /etc/fstab if you'd like to use podman rootless with a BTRFS file system. Be aware of potential security implications to this option, and do your homework before enabling.

The root line of my /etc/fstab looks like this:

UID=466f96bf-0e8f-44cb-a42b-c46cedca5803  /                       btrfs  defaults,user_subvol_rm_allowed  0  0

@mheon Something to consider though, the overlay option is not supported for people running BTRFS, so in effect people with BTRFS filesystems as their root filesystem effectively cannot run Podman rootless without issue deleting containers. Just an FYI on that.

I was wrong here and had a different configuration issue. overlay is the default and preferred option here.

@cmurf
Copy link

@cmurf cmurf commented Sep 12, 2019

I think the second part needs an issue of its own. I know btrfs and overlayfs can co-exist in production environments with tens of thousands of snapshots. I'm not sure what the nature of this lack of support could be about. There are nuances that can be workload specific where one works better than the other, and even where overlayfs copy-up operation can be made more efficient using cloning (Btrfs has had reflinks since forever, and XFS enables them in the most recent xfsprogs at mkfs time).

@cmurf
Copy link

@cmurf cmurf commented Sep 12, 2019

Hmm, on Fedora 31, by default it appears podman is using fuse-overlayfs on Btrfs.

store:
  ConfigFile: /home/chris/.config/containers/storage.conf
  ContainerStore:
    number: 2
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/chris/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
@rhatdan
Copy link
Member

@rhatdan rhatdan commented Sep 13, 2019

@SwitchedToGitlab We would welcome a PR on container/storage to better support BTRFS for rootless containers.

@rhatdan
Copy link
Member

@rhatdan rhatdan commented Sep 13, 2019

Or at least add some of this information on how to setup btrfs for use in rootless containers in some of the troublsheooting rootless.md files on github.

@t-msn
Copy link

@t-msn t-msn commented Sep 18, 2019

Hello.

I noticed this issue on btrfs ML and if nobody is working on, I'm willing to help (though I'm not familiar with podman and may take some time).

@rhatdan
Copy link
Member

@rhatdan rhatdan commented Sep 18, 2019

That would be great, Changes might be required in github.com/containers/storage though since most of the BTRFS code is in there.

@t-msn
Copy link

@t-msn t-msn commented Sep 19, 2019

Thanks, let me try.

@SwitchedToGitlab
Copy link
Author

@SwitchedToGitlab SwitchedToGitlab commented Sep 20, 2019

Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code. It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.I'll see if I can get some time to update the rootless tutorial on using overlay as the storage driver, ensuring fuse-overlayfs is installed, and the advantages of doing so. I'm also planning on adding that for BTRFS support to work you need to set user_subvol_rm_allowed flag in /etc/fstab.

That said, @t-msn I think we run into trouble at or about line 308 in containers/storage. The elegant solution IMHO would be to trap for errors on this unix.Syscall then re-try deleting the subvolume using the Golang OS standard library if it fails. It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go (if that makes sense). Not a Go programmer, but that's my understanding.

@cmurf
Copy link

@cmurf cmurf commented Sep 20, 2019

It works great with BTRFS once you enable user_subvol_rm_allowed on the root filesystem.

Right. The needed fix is when that mount option is not set, the subvolume remove fails with an error, in which case the fallback should be to 'rm -rf' the subvolume. It's slower than subvolume delete, but at least it won't fail, unless the user doesn't actually own what they're deleting.

@rhatdan
Copy link
Member

@rhatdan rhatdan commented Sep 20, 2019

Are either of you guys at All systems go this weekend?

@t-msn
Copy link

@t-msn t-msn commented Sep 20, 2019

Honestly using overlay is greatly preferred. Not sure if this is something that really needs a fix in code.

well, I think the fix is trivial and won't hurt anyone (please see below).

It's also my understanding that you will need to walk the subvolume, deleting any subvolumes as you go

We cannot call IOC_SNAP_DESTROY ioctl if it contains other subolumes. This is the reason subovlDelte() performs path walk to remove subvolume bottom-up, but it is not necessary for "rm -r"

I check the code and notice that system.EnsureRemoveAll() is called after subvolDelete() and it uses os.RemoveAll() to ensure the target path is removed.

Therefore the simplest solution would be just ignoring the error of subvolDelete() and fallback to system.EnsureRemoveAll(): t-msn/storage@41c2a90

I followed the tutorial (https://github.com/containers/libpod/blob/master/docs/tutorials/rootless_tutorial.md and https://github.com/containers/libpod/blob/master/docs/tutorials/podman_tutorial.md) and with above fix I can do "podman rm".

(BTW, subvol quota operation needs privilege.)

@github-actions
Copy link

@github-actions github-actions bot commented Oct 31, 2019

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

@vwbusguy
Copy link

@vwbusguy vwbusguy commented Jan 10, 2020

@t-msn - Would you be willing to submit your change (t-msn/storage@41c2a90) as a Pull Request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.