Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sandbox directory cannot be removed if original container has non-writeable directories #4517

Closed
afortiorama opened this issue Sep 24, 2019 · 14 comments
Assignees
Labels
Bug Regression Issues which describes regressions in Singularity Release 3.4
Milestone

Comments

@afortiorama
Copy link

Version of Singularity:

singularity-3.4.0-1.2.el7.x86_64

Expected behavior

singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7

Expect like in 3.2.1 or even 2.6.1 singularity to build the sandbox with the paermission of the callers

Actual behavior

if I leave the cache eanbled the command fails with

2019-09-24 22:37:58,127 | ERROR    | Container execution failed with errors. Error code: 255
2019-09-24 22:37:58,127 | ERROR    | FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

if I disable the cache it returns some warnings and then it builds the sandbox with root permissions whether the user napmesspaces is enabled or not.

warnings

2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:49  info unpack layer: sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:49  warn rootless{usr/bin/ping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:52  warn rootless{usr/sbin/arping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019-09-24 22:43:54,556 | WARNING  | 2019/09/24 22:43:52  warn rootless{usr/sbin/clockdiff} ignoring (usually) harmless EPERM on setxattr "security.capability"

permissions that do not allow to delete the sandbox

-bash-4.2$ rm -rf docker_centos_7/
rm: cannot remove ‘docker_centos_7/image/root/.cshrc’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/anaconda-ks.cfg’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bashrc’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bash_logout’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.bash_profile’: Permission denied
rm: cannot remove ‘docker_centos_7/image/root/.tcshrc’: Permission denied
[......]

Steps to reproduce this behavior

with the cache

mkdir docker_centos_7
export SINGULARITY_TMPDIR=docker_centos_7
export SINGULARITY_CACHEDIR=$SINGULAIRTY_TMPDIR/cache
singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7

without the cache

export SINGULARITY_DISABLE_CACHE=1
docker_centos_7
export SINGULARITY_TMPDIR=docker_centos_7
singularity -s build --sandbox /home/aforti/docker_centos_7/image docker://centos:7

What OS/distro are you running

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

How did you install Singularity

rpm from EPEL repository

@DrDaveD
Copy link
Collaborator

DrDaveD commented Sep 24, 2019

I believe the removal failure could be because of the unpack failure; if it is like another docker unpacker I looked at lately, it fixes up permissions after successful completion. To workaround the removal you can do find path_to_image -type d ! -perm -200|xargs chmod u+w.

I see different error messages when building both centos:7 and centos:6, with version 3.4.0-1.2 and 3.4.1-1.1. This is what I see when building with privileged singularity:

FATAL:   While performing build: sandbox assemble failed: exit status 1: mv: cannot open '/tmp/sbuild-811365620/fs/etc/gshadow' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/shadow' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/shadow-' for reading: Permission denied
mv: cannot open '/tmp/sbuild-811365620/fs/etc/gshadow-' for reading: Permission denied

and this is what I see with the -u option:

INFO:    Starting build...
INFO:    Building into existing container: /cloud/login/dwd/scratch/centos7-sandbox
FATAL:   While performing build: failed to retrieve path for /cloud/login/dwd/scratch/centos7-sandbox: lstat /cloud/login/dwd/scratch/centos7-sandbox: no such file or directory

I don't understand how nobody caught this before now since this is a pretty common task. I know I have been building sandboxes in the last month. @cclerget @dctrud can you take a look?

@afortiorama
Copy link
Author

afortiorama commented Sep 24, 2019

Thanks @DrDaveD, I don't want to use workarounds it makes progressively more fragile. Already I had to rename the default I/O directory because of this in 3.2.1 issue https://github.com/sylabs/singularity/issues/4498 now I have also to go around this. Note that already building the sandbox is an attempt to fix this sylabs/singularity#2588 and which is also a problem for others https://github.com/sylabs/singularity/issues/3886 and were never really replied.

@siscia
Copy link

siscia commented Sep 25, 2019

This problem is impacting us (unpacked.cern.ch) as well.

We are not able to delete images that are created from a docker container, which is quite problematic.
Also we don't have root access.

@dtrudg
Copy link
Contributor

dtrudg commented Sep 25, 2019

I can confirm with the docker://centos:7 latest container pulled onto a Debian buster machine. I can remove the failed sandbox with...

chmod -R +rw test_sandbox/
rm -rf test_sandbox/

I'm going to guess something is going on here with the umoci based code for unpacking which was brought in to fix some different issues. @ikaneshiro - can you advise on this at all?

@dtrudg dtrudg changed the title several problems with singularity build sandbox in 3.4.0 Failure creating centos7 sandbox in Singularity 3.4.0 & cannot delete sandbox Sep 25, 2019
@dtrudg dtrudg assigned jmstover, dtrudg and ikaneshiro and unassigned mem, gvallee and cclerget Sep 25, 2019
@dtrudg dtrudg added the Urgent label Sep 25, 2019
@dtrudg dtrudg added this to the 3.4.2 milestone Sep 25, 2019
@afortiorama
Copy link
Author

afortiorama commented Sep 25, 2019

@dctrud as written in the ticket if the cache is not disabled it doesn't even create the image,
it exits with

While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

same goes using -u fails with another error

singularity build -u --sandbox /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image docker://centos:7
INFO:    Starting build...
INFO:    Building into existing container: /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image
FATAL:   While performing build: failed to retrieve path for /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image: lstat /home/aforti/github/panda-wnscript/src/runcontainer/docker_centos_7/image: no such file or directory

so the new title doesn't reflect the whole story and may lead you to fix only 1 of what look like 3 problem(s). Unless you think they all have the same root cause.

thanks

@dtrudg
Copy link
Contributor

dtrudg commented Sep 25, 2019

Hi @afortiorama - the caching issue is a separate thing, which neither @DrDaveD or we are replicating yet. It's noted, but we're concentrating on the sandbox permission problem first, which we can replicate with caching enabled. I'll split the caching thing into a new issue in a bit.

@dtrudg dtrudg added the Regression Issues which describes regressions in Singularity label Sep 25, 2019
@dtrudg
Copy link
Contributor

dtrudg commented Sep 25, 2019

PR #4522 should return the non-root OCI/docker origin sandboxes to the previous state from <3.4.0. Any 👀 on it much appreciated.

@dtrudg
Copy link
Contributor

dtrudg commented Sep 25, 2019

@DrDaveD

I don't understand how nobody caught this before now since this is a pretty common task. I know I have been building sandboxes in the last month

I swore I tested this... and looking back I had... but in VMs where I was building a sandbox on the same device as where /tmp is present. That means no inter-device move of the sandbox (which is a cp + delete) so the permissions error doesn't occur.

The regression test being added to the PR is looking at actual permissions on files in the container, in order to not be affected by this inter vs intra-device move causing success/failure.

@mem
Copy link
Contributor

mem commented Sep 26, 2019

In order to try to separate this into the different problem being reported, this part:

2019-09-24 22:37:58,127 | ERROR    | FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

is because of this typo:

export SINGULARITY_TMPDIR=docker_centos_7
export SINGULARITY_CACHEDIR=$SINGULAIRTY_TMPDIR/cache

note it says "SINGULAIRTY_TMPDIR" instead of "SINGULARITY_TMPDIR".

Because of that the cache directory is being set to /cache and that's what causes the error:

$ SINGULARITY_CACHEDIR=/cache SINGULARITY_TMPDIR=issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
FATAL:   Unable to create build: could not create temp dir in "issue-4517": stat issue-4517: no such file or directory

$ mkdir issue-4517

$ SINGULARITY_CACHEDIR=/cache SINGULARITY_TMPDIR=issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

note this is the original issue as reported.

@mem
Copy link
Contributor

mem commented Sep 26, 2019

Then there's a second different issue:

$ SINGULARITY_CACHEDIR=$PWD/issue-4517/cache SINGULARITY_TMPDIR=$PWD/issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: Error initializing destination oci::307835c385f656ec2e2fec602cf093224173c51119bbebd602c53c3653a3d6eb: mkdir : no such file or directory

$ mkdir issue-4517/cache

$ SINGULARITY_CACHEDIR=$PWD/issue-4517/cache SINGULARITY_TMPDIR=$PWD/issue-4517 ./builddir/singularity build --sandbox $PWD/issue-4517/image docker://centos:7
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
 71.92 MiB / 71.92 MiB [===================================================] 12s
Copying config sha256:acab94af64effb1f7481666a37788e7a59465e723f0b0fe0a0f458f3f4856638
 1.05 KiB / 1.05 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
2019/09/25 18:13:48  info unpack layer: sha256:d8d02d45731499028db01b6fa35475f91d230628b4e25fab8e3c015594dc3261
2019/09/25 18:13:49  warn rootless{usr/bin/ping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019/09/25 18:13:50  warn rootless{usr/sbin/arping} ignoring (usually) harmless EPERM on setxattr "security.capability"
2019/09/25 18:13:50  warn rootless{usr/sbin/clockdiff} ignoring (usually) harmless EPERM on setxattr "security.capability"
INFO:    Creating sandbox directory...
INFO:    Build complete: /home/mem/devel/sylabs/singularity/src/github.com/sylabs/singularity/issue-4517/image

it seems even a non-existent but writeable cache directory will cause this error.

@mem
Copy link
Contributor

mem commented Sep 26, 2019

The third issue is that once the cache is set to something that does exist, as in the last command in the previous command, the resulting sandbox cannot be deleted:

$ rm -rf issue-4517/image/
rm: cannot remove 'issue-4517/image/root/.bash_logout': Permission denied
rm: cannot remove 'issue-4517/image/root/anaconda-ks.cfg': Permission denied
rm: cannot remove 'issue-4517/image/root/.tcshrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.bashrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.cshrc': Permission denied
rm: cannot remove 'issue-4517/image/root/.bash_profile': Permission denied
...

Examining these files:

$ ls -l issue-4517/image/root/.bash_logout
-rw-r--r-- 1 mem mem 18 Dec 28  2013 issue-4517/image/root/.bash_logout

$ ls -ld issue-4517/image/root/
dr-xr-x--- 2 mem mem 4096 Jul 31 19:10 issue-4517/image/root/

the file itself is OK, but the directory containing it does not have write permissions.

This is fixed by:

$ chmod -R +w issue-4517/image

$ rm -rf issue-4517/image

@mem mem changed the title Failure creating centos7 sandbox in Singularity 3.4.0 & cannot delete sandbox sandbox directory cannot be removed if original container has non-writeable directories Sep 26, 2019
@dtrudg
Copy link
Contributor

dtrudg commented Sep 26, 2019

@mem - not quite... your third issue isn't quite the same as reported in the thread above - there's a build failure that can happen from the same cause.

This is becoming very confusing to track, so I'm going to close this issue, and open multiple new ones which are more granular right now.

@mem
Copy link
Contributor

mem commented Sep 26, 2019

@dctrud I think the difference is I'm looking at master, not 3.4.

I believe a change was added in master that's effectively fixing the build failure. This problem seems specific to 3.4 right now.

@Naeemkh
Copy link

Naeemkh commented Feb 15, 2021

Same problem with singularity version 3.0.3-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Issues which describes regressions in Singularity Release 3.4
Projects
None yet
Development

No branches or pull requests