RUN --mount-type=cache,sharing=locked hangs in specific cases #4342

marwatk · 2022-10-14T17:55:01Z

Description

When performing specific actions (rm or chmod) on a cached volume mounted with locking enabled future RUN clauses will hang. (See Dockerfile below).

We discovered the bug because we chmod cache folders in some builds to allow writes when we drop root. It looks like buildah writes the lock file in the volume so chmod somehow disrupts the lock.

In other builds we utilize an env var to wipe the cache folder so we don't have to prune everything for a single build (though we still haven't figured out how to prune cache volumes, neither rmi --prune nor rm -a seem to remove them), which you can replicate with the WIPE_CACHE build arg.

Interestingly the second volume is required to exhibit the bug when chmoding, even though it's not locked and never touched. The second volume is not required for the WIPE_CACHE case to hang.

Without the second mount or --build-arg WIPE_CACHE=1 a warning is issued that the lock file is missing, but the hang still occurs. The warning is not present if both volumes are supplied but the hang still occurs.

I cannot reproduce the issue with buildkit.

Steps to reproduce the issue:
Steps 1 and 2 may not be required but help avoid hitting cached layers. These steps reproduce the issue from both root and non-root users.

buildah rm -a
buildah rmi --prune
buildah build -f - < ./Dockerfile

Dockerfile:

FROM quay.io/centos/centos:7

ARG WIPE_CACHE

RUN --mount=type=cache,target=/cache1,sharing=locked \
    --mount=type=cache,target=/cache2 \
    set -ex; \
    ls -l /cache1; \
    if [[ -v WIPE_CACHE ]]; then \
      >&2 echo "Wiping cache"; \
      find /cache1 -mindepth 1 -delete; \
    fi; \
    echo "foo" > /cache1/foo.txt; \
    ls -l /cache1; \
    chmod --recursive g=u /cache1; \
    : ;

RUN --mount=type=cache,target=/cache1,sharing=locked \
    >&2 echo "Never get here"

Describe the results you received:

Build hangs indefinitely at:

STEP 4/4: RUN --mount=type=cache,target=/cache1,sharing=locked     >&2 echo "Never get here"

Describe the results you expected:

Build completes

Output of rpm -q buildah or apt list buildah:

$ rpm -q buildah
buildah-1.26.2-1.module+el8.6.0+997+05c9d812.x86_64

Output of buildah version:

$ buildah version
Version:         1.26.2
Go Version:      go1.17.12
Image Spec:      1.0.2-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        1.0.0
libcni Version:  v1.1.0
image Version:   5.21.1
Git Commit:
Built:           Tue Aug  2 03:54:09 2022
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of cat /etc/*release:

$ cat /etc/*release
Rocky Linux release 8.6 (Green Obsidian)
NAME="Rocky Linux"
VERSION="8.6 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
Rocky Linux release 8.6 (Green Obsidian)
Rocky Linux release 8.6 (Green Obsidian)
Rocky Linux release 8.6 (Green Obsidian)

Output of uname -a:

$ uname -a
Linux pd-ci 4.18.0-372.19.1.el8_6.x86_64 #1 SMP Tue Aug 2 16:19:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

$ cat /etc/containers/storage.conf
# This file is is the configuration file for all tools
# that use the containers/storage library. The storage.conf file
# overrides all other storage.conf files. Container engines using the
# container/storage library do not inherit fields from other storage.conf
# files.
#
#  Note: The storage.conf file overrides other storage.conf files based on this precedence:
#      /usr/containers/storage.conf
#      /etc/containers/storage.conf
#      $HOME/.config/containers/storage.conf
#      $XDG_CONFIG_HOME/containers/storage.conf (If XDG_CONFIG_HOME is set)
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver, Must be set for proper operation.
driver = "overlay"

# Temporary storage location
runroot = "/run/containers/storage"

# Primary Read/Write location of container storage
# When changing the graphroot location on an SELINUX system, you must
# ensure  the labeling matches the default locations labels with the
# following commands:
# semanage fcontext -a -e /var/lib/containers/storage /NEWSTORAGEPATH
# restorecon -R -v /NEWSTORAGEPATH
graphroot = "/var/lib/containers/storage"


# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partitioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = "false"

# Inodes is used to set a maximum inodes of the container image.
# inodes = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Set to skip a PRIVATE bind mount on the storage home directory.
# skip_mount_home = "false"

# Size is used to set a maximum size of the container image.
# size = ""

# ForceMask specifies the permissions mask that is used for new files and
# directories.
#
# The values "shared" and "private" are accepted.
# Octal permission masks are also accepted.
#
#  "": No value specified.
#     All files/directories, get set with the permissions identified within the
#     image.
#  "private": it is equivalent to 0700.
#     All files/directories get set with 0700 permissions.  The owner has rwx
#     access to the files. No other users on the system can access the files.
#     This setting could be used with networked based homedirs.
#  "shared": it is equivalent to 0755.
#     The owner has rwx access to the files and everyone else can read, access
#     and execute them. This setting is useful for sharing containers storage
#     with other users.  For instance have a storage owned by root but shared
#     to rootless users as an additional store.
#     NOTE:  All files within the image are made readable and executable by any
#     user on the system. Even /etc/shadow within your image is now readable by
#     any user.
#
#   OCTAL: Users can experiment with other OCTAL Permissions.
#
#  Note: The force_mask Flag is an experimental feature, it could change in the
#  future.  When "force_mask" is set the original permission mask is stored in
#  the "user.containers.override_stat" xattr and the "mount_program" option must
#  be specified. Mount programs like "/usr/bin/fuse-overlayfs" present the
#  extended attribute permissions to processes within containers rather then the
#  "force_mask"  permissions.
#
# force_mask = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# metadata_size is used to set the `pvcreate --metadatasize` options when
# creating thin devices. Default is 128k
# metadata_size = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"

The text was updated successfully, but these errors were encountered:

flouthoc · 2022-10-17T15:32:54Z

I'll check this, thanks for reporting.

`--mount=type=cache` must not add internal lockfiles to cache directory created by users instead store it in a different central directory with path as `/base/buildah-cache/buildah-lockfiles`. There are use-cases where users can wipe cache between the builds so lockfiles will be removed in unexpected manner and also its not okay to mix buildah's internal construct with user's cache content. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Single `RUN` can contain multiple `--mount` commands so lets append into `lockedTargets` so we collect `lockfiles` from all the `--mount` instructions. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Use-cases as shown in below containerfile cleans cache between the builds, in previous commits we have ensured that buildah lockfiles will not be part of user's cache content hence following use-case must paas ``` FROM quay.io/centos/centos:7 ARG WIPE_CACHE RUN --mount=type=cache,target=/cache1,sharing=locked \ --mount=type=cache,target=/cache2 \ set -ex; \ ls -l /cache1; \ if [[ -v WIPE_CACHE ]]; then \ >&2 echo "Wiping cache"; \ find /cache1 -mindepth 1 -delete; \ fi; \ echo "foo" > /cache1/foo.txt; \ ls -l /cache1; \ chmod --recursive g=u /cache1; \ : ; RUN --mount=type=cache,target=/cache1,sharing=locked \ >&2 echo "Never get here" ``` Closes: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

flouthoc · 2022-10-18T10:22:19Z

@marwatk Thanks above PR #4349 should close this.

marwatk · 2022-10-18T14:06:26Z

Wow, that was an insanely fast turnaround. I was still setting up a build environment for it!

Thanks!

`--mount=type=cache` must not add internal lockfiles to cache directory created by users instead store it in a different central directory with path as `/base/buildah-cache/buildah-lockfiles`. There are use-cases where users can wipe cache between the builds so lockfiles will be removed in unexpected manner and also its not okay to mix buildah's internal construct with user's cache content. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Single `RUN` can contain multiple `--mount` commands so lets append into `lockedTargets` so we collect `lockfiles` from all the `--mount` instructions. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Use-cases as shown in below containerfile cleans cache between the builds, in previous commits we have ensured that buildah lockfiles will not be part of user's cache content hence following use-case must paas ``` FROM quay.io/centos/centos:7 ARG WIPE_CACHE RUN --mount=type=cache,target=/cache1,sharing=locked \ --mount=type=cache,target=/cache2 \ set -ex; \ ls -l /cache1; \ if [[ -v WIPE_CACHE ]]; then \ >&2 echo "Wiping cache"; \ find /cache1 -mindepth 1 -delete; \ fi; \ echo "foo" > /cache1/foo.txt; \ ls -l /cache1; \ chmod --recursive g=u /cache1; \ : ; RUN --mount=type=cache,target=/cache1,sharing=locked \ >&2 echo "Never get here" ``` Closes: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

`--mount=type=cache` must not add internal lockfiles to cache directory created by users instead store it in a different central directory with path as `/base/buildah-cache/buildah-lockfiles`. There are use-cases where users can wipe cache between the builds so lockfiles will be removed in unexpected manner and also its not okay to mix buildah's internal construct with user's cache content. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Single `RUN` can contain multiple `--mount` commands so lets append into `lockedTargets` so we collect `lockfiles` from all the `--mount` instructions. Helps in: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

Use-cases as shown in below containerfile cleans cache between the builds, in previous commits we have ensured that buildah lockfiles will not be part of user's cache content hence following use-case must paas ``` FROM quay.io/centos/centos:7 ARG WIPE_CACHE RUN --mount=type=cache,target=/cache1,sharing=locked \ --mount=type=cache,target=/cache2 \ set -ex; \ ls -l /cache1; \ if [[ -v WIPE_CACHE ]]; then \ >&2 echo "Wiping cache"; \ find /cache1 -mindepth 1 -delete; \ fi; \ echo "foo" > /cache1/foo.txt; \ ls -l /cache1; \ chmod --recursive g=u /cache1; \ : ; RUN --mount=type=cache,target=/cache1,sharing=locked \ >&2 echo "Never get here" ``` Closes: containers#4342 Signed-off-by: Aditya R <arajan@redhat.com>

flouthoc · 2022-10-19T11:59:27Z

Wow, that was an insanely fast turnaround. I was still setting up a build environment for it!

Thanks!

@marwatk Thanks, PR is merged feel free to directly use main branch or the patch. This will officially show up in next release of buildah.

Clockwork-Muse · 2022-12-29T00:51:53Z

Additional(?) simpler case:

FROM quay.io/centos/centos:7

RUN \
    --mount=type=cache,id=442870093b67597071f8e55,target=/cache1,sharing=locked \
    --mount=type=cache,id=442870093b67597071f8e55,target=/cache2,sharing=locked \
    >&2 echo "Never get here"

@flouthoc - I'm not the best at go, and I haven't spent any real time in the code base, but I think this might not be covered by the existing fix?

marwatk · 2022-12-29T15:08:19Z

You want to mount the same files twice and lock them both times? What's the use case for that?

If you just want the same files to show up under two different paths (though I'm still have trouble imagining why) and ensure other builds can't use them you can just lock the first one, no need to lock both.

What is your expected behavior by locking both?

Clockwork-Muse · 2022-12-29T19:34:17Z

Yes, you're right that the files do show up twice (I think I was expecting them to be separated by destination, for some reason). So no, it's not likely to be observed in real scenarios.

That said, a hang in this case is still unfriendly.

flouthoc self-assigned this Oct 17, 2022

flouthoc mentioned this issue Oct 18, 2022

mount,cache: internal lockfiles must not be part of user's cache content #4349

Merged

openshift-merge-robot closed this as completed in #4349 Oct 19, 2022

github-actions bot added the locked - please file new issue/PR label Aug 29, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUN --mount-type=cache,sharing=locked hangs in specific cases #4342

RUN --mount-type=cache,sharing=locked hangs in specific cases #4342

marwatk commented Oct 14, 2022 •

edited

flouthoc commented Oct 17, 2022

flouthoc commented Oct 18, 2022

marwatk commented Oct 18, 2022

flouthoc commented Oct 19, 2022

Clockwork-Muse commented Dec 29, 2022

marwatk commented Dec 29, 2022

Clockwork-Muse commented Dec 29, 2022

RUN --mount-type=cache,sharing=locked hangs in specific cases #4342

RUN --mount-type=cache,sharing=locked hangs in specific cases #4342

Comments

marwatk commented Oct 14, 2022 • edited

flouthoc commented Oct 17, 2022

flouthoc commented Oct 18, 2022

marwatk commented Oct 18, 2022

flouthoc commented Oct 19, 2022

Clockwork-Muse commented Dec 29, 2022

marwatk commented Dec 29, 2022

Clockwork-Muse commented Dec 29, 2022

marwatk commented Oct 14, 2022 •

edited