Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error mount-binding path from time to time #1313

Closed
thiell opened this issue Feb 17, 2018 · 15 comments
Closed

Error mount-binding path from time to time #1313

thiell opened this issue Feb 17, 2018 · 15 comments

Comments

@thiell
Copy link

thiell commented Feb 17, 2018

Version of Singularity: 2.4

We're facing some weird bind mount errors on CentOS 7 and Singularity 2.4. The path and filesystem does exist, but singularity is reporting ENOENT when bind mounting it (it only happens from time to time):

ERROR : There was an error binding the path /oak: No such file or directory

Expected behavior

Bind mount works

Actual behavior

Bind mount fails from time to time.

Steps to reproduce behavior

We tried to reproduce the error without luck. Only our users are reporting the problem. I'm opening this ticket in the hope somebody else is seeing this.

Also, I quickly checked the code, and I noticed parts like that:

    singularity_message(VERBOSE, "Binding file '%s' to '%s'\n", source, dest);
    if ( singularity_mount(source, dest, NULL, MS_BIND|MS_NOSUID|MS_NODEV|MS_REC, NULL) < 0 ) {
        singularity_priv_drop();
        singularity_message(ERROR, "There was an error binding %s to %s: %s\n", source, dest, strerror(errno));
        ABORT(255);
    }

Because errno is not saved, and singularity_priv_drop() may call other glibc functions, the error string extracted from errno and passed to singularity_message could be wrong, masking the real issue.

@cclerget
Copy link
Collaborator

@thiell Is there an autofs setup involved with bind directory ?

@thiell
Copy link
Author

thiell commented Feb 20, 2018

@cclerget yes, we're using autofs on /home directories, but not on the lustre filesystems like /scratch and /oak which are both permanently mounted. The mount-bind of this last filesystem, oak, is always the issue there, but we don't understand why at this point.

Also we have mount hostfs = yes in singularity.conf, instead of listing filesystem explicitly, maybe that's not a good idea after all, hmm...

@cclerget
Copy link
Collaborator

Also we have mount hostfs = yes in singularity.conf, instead of listing filesystem explicitly, maybe that's not a good idea after all, hmm...

Maybe you can try by disabling hostfs and use bind path = /oak. The main difference is that hostfs rely on /proc/mounts content, maybe something wrong there ...

Between did you have a full debug log when bind failed ?

Because errno is not saved, and singularity_priv_drop() may call other glibc functions, the error string extracted from errno and passed to singularity_message could be wrong, masking the real issue.

Nice catch ! Thank you. I will check code base to fix that

@thiell
Copy link
Author

thiell commented Feb 22, 2018

Ah thanks for the suggestion @cclerget, I checked and unfortunately we can't do that as we're also mounting some group NFS home space using autofs, so this has to be somehow dynamic. I can't reproduce the issue myself, users are reporting this issue only from time to time, on nodes that have /oak mounted on the host...

@cclerget
Copy link
Collaborator

Sorry @thiell, that will be difficult to help without some debug lines. After looking into singularity_priv_drop, there is no function which could set errno to ENOENT, so we can assume errno was set by mount. Why that happen sometime ? a lustre issue ? mount namespace issue ?

You can try to use autofs bug path configuration directive like this :
autofs bug path = /oak

to force Singularity to open /oak before entering in mount namespace, if Singularity can't open /oak user will be notified with the following warning message: Failed to open directory '/oak', if this warning appears with mount error, there is more probability that be a filesystem issue

@thiell
Copy link
Author

thiell commented Mar 27, 2018

Hi @cclerget,
I enabled autofs bug path = /oak a while back after seeing your message and users are still reporting the issue without the additional warning message so far. Also, we are pretty sure /oak is mounted on all nodes as we do have a health check for that during production.

So I guess the error reported by Singularity is wrong...

I believe all users that have reported this don't have access to this mount point /oak, maybe that's the issue, so I'll investigate.

@vsoch
Copy link
Collaborator

vsoch commented Mar 28, 2018

Hey @cclerget ! We have some debug output for you! We are still having this issue on Sherlock so your wisdom is greatly appreciated. @thiell can verify - but it should be the case that /oak is mounted on all nodes. @thiell is this the case for the login node as well, without any extra groups / actions by the user?

Here is an output in debug mode:
(epi) [eilon@sh-101-72 login! /home/groups/pritch/eilon/singularity]$ singularity pull --name pvacseq shub://researchapps/sherlock:pvacseq@84011d3a460036dcb42ee9b64fd3e7da6f0bbc95
(epi) [eilon@sh-101-72 login! /home/groups/pritch/eilon/singularity]$ singularity -d run pvacseq shell pvacseq
Enabling debugging
Ending argument loop
Singularity version: 2.4.5-dist
Exec'ing: /share/software/user/open/singularity/2.4.5/libexec/singularity/cli/run.exec
Evaluating args: 'pvacseq shell pvacseq'
VERBOSE [U=0,P=60068]      message_init()                            Set messagelevel to: 5
DEBUG   [U=0,P=60068]      fd_cleanup()                              Cleanup file descriptor table
VERBOSE [U=0,P=60068]      singularity_config_parse()                Initialize configuration file: /share/software/user/open/singularity/2.4.5/etc/singularity/singularity.conf
DEBUG   [U=0,P=60068]      singularity_config_parse()                Starting parse of configuration file /share/software/user/open/singularity/2.4.5/etc/singularity/singularity.conf
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key allow setuid = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key max loop devices = '256'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key allow pid ns = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key config passwd = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key config group = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key config resolv_conf = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount proc = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount sys = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount dev = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount devpts = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount home = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount tmp = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount hostfs = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key bind path = '/etc/localtime'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key bind path = '/etc/hosts'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key user bind control = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key enable overlay = 'try'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key mount slave = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key sessiondir max size = '16'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key allow container squashfs = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key allow container extfs = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key allow container dir = 'yes'
VERBOSE [U=0,P=60068]      singularity_config_parse()                Got config key autofs bug path = '/oak'
DEBUG   [U=0,P=60068]      singularity_config_parse()                Finished parsing configuration file '/share/software/user/open/singularity/2.4.5/etc/singularity/singularity.conf'
VERBOSE [U=0,P=60068]      singularity_registry_init()               Initializing Singularity Registry
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'LIBEXECDIR' = '/share/software/user/open/singularity/2.4.5/libexec'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(libexecdir, /share/software/user/open/singularity/2.4.5/libexec) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'COMMAND' = 'run'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(COMMAND, run) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'MESSAGELEVEL' = '5'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(MESSAGELEVEL, 5) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'VERSION' = '2.4.5-dist'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(version, 2.4.5-dist) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'LOCALSTATEDIR' = '/share/software/user/open/singularity/2.4.5/var'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(localstatedir, /share/software/user/open/singularity/2.4.5/var) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'SYSCONFDIR' = '/share/software/user/open/singularity/2.4.5/etc'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(sysconfdir, /share/software/user/open/singularity/2.4.5/etc) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'BINDIR' = '/share/software/user/open/singularity/2.4.5/bin'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(bindir, /share/software/user/open/singularity/2.4.5/bin) = 0
VERBOSE [U=0,P=60068]      singularity_registry_set()                Adding value to registry: 'IMAGE' = 'pvacseq'
DEBUG   [U=0,P=60068]      singularity_registry_set()                Returning singularity_registry_set(IMAGE, pvacseq) = 0
DEBUG   [U=0,P=60068]      singularity_registry_get()                Returning NULL on 'HOME'
DEBUG   [U=0,P=60068]      singularity_registry_get()                Returning NULL on 'TARGET_UID'
DEBUG   [U=0,P=60068]      singularity_registry_get()                Returning NULL on 'TARGET_GID'
DEBUG   [U=0,P=60068]      singularity_priv_init()                   Initializing user info
DEBUG   [U=0,P=60068]      singularity_priv_init()                   Set the calling user's username to: eilon
DEBUG   [U=0,P=60068]      singularity_priv_init()                   Marking uinfo structure as ready
DEBUG   [U=0,P=60068]      singularity_priv_init()                   Obtaining home directory
VERBOSE [U=0,P=60068]      singularity_priv_init()                   Set home (via getpwuid()) to: /home/users/eilon
VERBOSE [U=0,P=60068]      singularity_suid_init()                   Running SUID program workflow
VERBOSE [U=0,P=60068]      singularity_suid_init()                   Checking program has appropriate permissions
VERBOSE [U=0,P=60068]      singularity_suid_init()                   Checking configuration file is properly owned by root
VERBOSE [U=0,P=60068]      singularity_suid_init()                   Checking if singularity.conf allows us to run as suid
DEBUG   [U=0,P=60068]      singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(allow setuid, yes)
DEBUG   [U=0,P=60068]      singularity_config_get_value_impl()       Returning configuration value allow setuid='yes'
DEBUG   [U=0,P=60068]      singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow setuid, yes) = 1
DEBUG   [U=0,P=60068]      singularity_registry_get()                Returning NULL on 'NOSUID'
VERBOSE [U=0,P=60068]      singularity_priv_userns()                 Invoking the user namespace
DEBUG   [U=0,P=60068]      singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(allow user ns, yes)
DEBUG   [U=0,P=60068]      singularity_config_get_value_impl()       No configuration entry found for 'allow user ns'; returning default value 'yes'
DEBUG   [U=0,P=60068]      singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow user ns, yes) = 1
VERBOSE [U=0,P=60068]      singularity_priv_userns()                 Not virtualizing USER namespace: running as SUID
DEBUG   [U=0,P=60068]      singularity_priv_userns()                 Returning singularity_priv_init(void)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
VERBOSE [U=251417,P=60068] singularity_runtime_autofs()              Autofs bug path requested
VERBOSE [U=251417,P=60068] singularity_runtime_autofs()              Autofs bug fix for directory /oak
WARNING [U=251417,P=60068] singularity_runtime_autofs()              Autofs bug path /oak is not a directory
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_START'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_daemon_init()                 Not joining a daemon, daemon join not set
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'WRITABLE'
VERBOSE [U=251417,P=60068] main()                                    Instantiating read only container image object
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning value from registry: 'IMAGE' = 'pvacseq'
DEBUG   [U=251417,P=60068] singularity_image_init()                  Calling image_init for each file system module
DEBUG   [U=251417,P=60068] singularity_image_dir_init()              Opening file descriptor to directory: /home/groups/pritch/eilon/singularity/pvacseq
DEBUG   [U=251417,P=60068] singularity_image_dir_init()              This is not a directory based image
DEBUG   [U=251417,P=60068] singularity_image_squashfs_init()         Checking if writable image requested
DEBUG   [U=251417,P=60068] singularity_image_squashfs_init()         Opening file descriptor to image: /home/groups/pritch/eilon/singularity/pvacseq
VERBOSE [U=251417,P=60068] singularity_image_squashfs_init()         Checking that file pointer is a Singularity image
DEBUG   [U=251417,P=60068] singularity_image_squashfs_init()         Checking for magic in the top of the file
VERBOSE [U=251417,P=60068] singularity_image_squashfs_init()         File is a valid SquashFS image
DEBUG   [U=251417,P=60068] singularity_image_init()                  got image_init type for squashfs
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(allow container squashfs, yes)
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value allow container squashfs='yes'
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow container squashfs, yes) = 1
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       No configuration entry found for 'limit container paths'; returning default value 'NULL'
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       No configuration entry found for 'limit container owners'; returning default value 'NULL'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'CLEANUPDIR'
VERBOSE [U=251417,P=60068] singularity_registry_set()                Adding value to registry: 'CLEANUPD_FD' = '-1'
DEBUG   [U=251417,P=60068] singularity_registry_set()                Returning singularity_registry_set(CLEANUPD_FD, -1) = 0
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'NOSESSIONCLEANUP'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'NOCLEANUP'
DEBUG   [U=251417,P=60068] singularity_cleanupd()                    Not running a cleanup thread, no 'SINGULARITY_CLEANUPDIR' defined
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_runtime_ns()                  Calling: _singularity_runtime_ns_ipc()
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(allow ipc ns, yes)
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       No configuration entry found for 'allow ipc ns'; returning default value 'yes'
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow ipc ns, yes) = 1
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'UNSHARE_IPC'
VERBOSE [U=251417,P=60068] singularity_runtime_ns_ipc()              Not virtualizing IPC namespace on user request
DEBUG   [U=251417,P=60068] singularity_runtime_ns()                  Calling: _singularity_runtime_ns_pid()
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(allow pid ns, yes)
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value allow pid ns='yes'
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow pid ns, yes) = 1
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'UNSHARE_PID'
VERBOSE [U=251417,P=60068] singularity_runtime_ns_pid()              Not virtualizing PID namespace on user request
DEBUG   [U=251417,P=60068] singularity_runtime_ns()                  Calling: _singularity_runtime_ns_net()
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'UNSHARE_NET'
VERBOSE [U=251417,P=60068] singularity_runtime_ns_net()              Not virtualizing network namespace on user request
DEBUG   [U=251417,P=60068] singularity_runtime_ns()                  Calling: _singularity_runtime_ns_mnt()
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(mount slave, yes)
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value mount slave='yes'
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(mount slave, yes) = 1
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      singularity_runtime_ns_mnt()              Virtualizing FS namespace
DEBUG   [U=0,P=60068]      singularity_runtime_ns_mnt()              Virtualizing mount namespace
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
DEBUG   [U=251417,P=60068] singularity_runtime_ns_mnt()              Making mounts slave
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Setting sessiondir
VERBOSE [U=251417,P=60068] singularity_sessiondir()                  Using session directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/session
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Checking for session directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/session
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Obtaining the default sessiondir size
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value sessiondir max size='16'
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Converted sessiondir size to: 16
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Creating the sessiondir size mount option length
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Got size length of: 9
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Creating the sessiondir size mount option string
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Checking to make sure the string was allocated correctly
DEBUG   [U=251417,P=60068] singularity_sessiondir()                  Mounting sessiondir tmpfs: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/session
VERBOSE [U=251417,P=60068] singularity_registry_set()                Adding value to registry: 'SESSIONDIR' = '/share/software/user/open/singularity/2.4.5/var/singularity/mnt/session'
DEBUG   [U=251417,P=60068] singularity_registry_set()                Returning singularity_registry_set(SESSIONDIR, /share/software/user/open/singularity/2.4.5/var/singularity/mnt/session) = 0
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_image_mount()                 Figuring out which mount module to use...
DEBUG   [U=251417,P=60068] singularity_image_mount()                 Calling squashfs_mount
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value max loop devices='256'
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Entered singularity_image_bind()
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Converting max_loop_devs_string to int: '256'
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Converted max_loop_devs_string to int: '256' -> 256
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Checking if this image has been properly opened
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      singularity_image_bind()                  Finding next available loop device...
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
VERBOSE [U=251417,P=60068] singularity_image_bind()                  Found available loop device: /dev/loop0
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Setting LO_FLAGS_AUTOCLEAR
DEBUG   [U=251417,P=60068] singularity_image_bind()                  Using image offset: 31
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      singularity_image_bind()                  Setting loop device flags
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
VERBOSE [U=251417,P=60068] singularity_image_bind()                  Using loop device: /dev/loop0
VERBOSE [U=251417,P=60068] singularity_image_squashfs_mount()        Mounting squashfs image: /dev/loop0 -> /share/software/user/open/singularity/2.4.5/var/singularity/mnt/container
DEBUG   [U=251417,P=60068] action_ready()                            Checking if container is valid at: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/container
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      singularity_runtime_overlayfs()           Creating overlay_final directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/final
DEBUG   [U=0,P=60068]      s_mkpath()                                Directory exists, returning '0': /share/software/user/open/singularity/2.4.5/var/singularity/mnt/final
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
DEBUG   [U=251417,P=60068] singularity_runtime_overlayfs()           Checking if overlayfs should be used
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value enable overlay='try'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DISABLE_OVERLAYFS'
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'WRITABLE'
VERBOSE [U=251417,P=60068] singularity_runtime_overlayfs()           Trying OverlayFS as requested by configuration
DEBUG   [U=251417,P=60068] singularity_runtime_overlayfs()           Setting up overlay mount options
DEBUG   [U=251417,P=60068] singularity_runtime_overlayfs()           Checking for existance of overlay directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'OVERLAYIMAGE'
DEBUG   [U=251417,P=60068] singularity_runtime_overlayfs()           Mounting overlay tmpfs: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      singularity_runtime_overlayfs()           Creating upper overlay directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/upper
DEBUG   [U=0,P=60068]      s_mkpath()                                Creating directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/upper
DEBUG   [U=0,P=60068]      singularity_runtime_overlayfs()           Creating overlay work directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/work
DEBUG   [U=0,P=60068]      s_mkpath()                                Creating directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/work
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
VERBOSE [U=251417,P=60068] singularity_runtime_overlayfs()           Mounting overlay with options: lowerdir=/share/software/user/open/singularity/2.4.5/var/singularity/mnt/container,upperdir=/share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/upper,workdir=/share/software/user/open/singularity/2.4.5/var/singularity/mnt/overlay/work
VERBOSE [U=251417,P=60068] singularity_registry_set()                Adding value to registry: 'OVERLAYFS_ENABLED' = '1'
DEBUG   [U=251417,P=60068] singularity_registry_set()                Returning singularity_registry_set(OVERLAYFS_ENABLED, 1) = 0
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning NULL on 'DAEMON_JOIN'
VERBOSE [U=251417,P=60068] singularity_runtime_mounts()              Running all mount components
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Called singularity_config_get_bool(mount hostfs, no)
DEBUG   [U=251417,P=60068] singularity_config_get_value_impl()       Returning configuration value mount hostfs='yes'
DEBUG   [U=251417,P=60068] singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(mount hostfs, no) = 1
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Checking to see if /proc/mounts exists
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Opening /proc/mounts
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Getting line by line
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping root (/): rootfs,/,rootfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping root (/): /dev/sda1,/,xfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: devtmpfs,/dev,devtmpfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: tmpfs,/dev/shm,tmpfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: tmpfs,/dev/shm,tmpfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: devpts,/dev/pts,devpts
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: hugetlbfs,/dev/hugepages,hugetlbfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /dev based file system: mqueue,/dev/mqueue,mqueue
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /proc based file system: proc,/proc,proc
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /proc based file system: systemd-1,/proc/sys/fs/binfmt_misc,autofs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /proc based file system: binfmt_misc,/proc/sys/fs/binfmt_misc,binfmt_misc
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /proc based file system: nfsd,/proc/fs/nfsd,nfsd
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: sysfs,/sys,sysfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: securityfs,/sys/kernel/security,securityfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: tmpfs,/sys/fs/cgroup,tmpfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/systemd,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/blkio,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/cpu,cpuacct,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/perf_event,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/cpuset,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/memory,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/devices,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/pids,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/freezer,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/hugetlb,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: cgroup,/sys/fs/cgroup/net_cls,net_prio,cgroup
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: pstore,/sys/fs/pstore,pstore
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: configfs,/sys/kernel/config,configfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /sys based file system: debugfs,/sys/kernel/debug,debugfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /run based file system: tmpfs,/run,tmpfs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Checking if host file system is already mounted: /lscratch
DEBUG   [U=251417,P=60068] check_mounted()                           Opening /proc/mounts
DEBUG   [U=251417,P=60068] check_mounted()                           Iterating through /proc/mounts
DEBUG   [U=251417,P=60068] singularity_registry_get()                Returning value from registry: 'OVERLAYFS_ENABLED' = '1'
DEBUG   [U=251417,P=60068] singularity_priv_escalate()               Temporarily escalating privileges (U=251417)
DEBUG   [U=0,P=60068]      singularity_priv_escalate()               Clearing supplementary GIDs.
DEBUG   [U=0,P=60068]      s_mkpath()                                Creating directory: /share/software/user/open/singularity/2.4.5/var/singularity/mnt/final/lscratch
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Dropping privileges to UID=251417, GID=60207 (3 supplementary GIDs)
DEBUG   [U=0,P=60068]      singularity_priv_drop()                   Restoring supplementary groups
DEBUG   [U=251417,P=60068] singularity_priv_drop()                   Confirming we have correct UID/GID
VERBOSE [U=251417,P=60068] singularity_runtime_mount_hostfs()        Binding '/lscratch'(xfs) to '/share/software/user/open/singularity/2.4.5/var/singularity/mnt/final//lscratch'
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Skipping /var based file system: sunrpc,/var/lib/nfs/rpc_pipefs,rpc_pipefs
DEBUG   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Checking if host file system is already mounted: /oak
DEBUG   [U=251417,P=60068] check_mounted()                           Opening /proc/mounts
VERBOSE [U=251417,P=60068] singularity_runtime_mount_hostfs()        Binding '/oak'(lustre) to '/share/software/user/open/singularity/2.4.5/var/singularity/mnt/final//oak'
ERROR   [U=251417,P=60068] singularity_runtime_mount_hostfs()        There was an error binding the path /oak: No such file or directory
ABORT   [U=251417,P=60068] singularity_runtime_mount_hostfs()        Retval = 255

@thiell
Copy link
Author

thiell commented Mar 28, 2018

@vsoch indeed oak is permanently mounted on all nodes including login nodes on this cluster. I checked this one, and /oak was mounted on March 16 (and the node hasn't rebooted since):

Mar 16 09:58:49 sh-101-72.int kernel: Lustre: Mounted oak-client

and then I also see:

Mar 27 16:44:47 sh-101-72.int Singularity[59330]: action-suid (U=251417,P=59330)> Autofs bug path /oak is not a directory
Mar 27 16:44:47 sh-101-72.int Singularity[59330]: action-suid (U=251417,P=59330)> There was an error binding the path /oak: No such file or directory

Right now:

[root@sh-101-72 ~]# ls -ld /oak
drwxr-xr-x 4 root nobody 4096 Sep 28 22:12 /oak

@thiell
Copy link
Author

thiell commented Mar 28, 2018

Also, I would like to highlight that in the debug output above, Singularity does see /oak as a lustre mount point:

VERBOSE [U=251417,P=60068] singularity_runtime_mount_hostfs()        Binding '/oak'(lustre) to '/share/software/user/open/singularity/2.4.5/var/singularity/mnt/final//oak'

@cclerget
Copy link
Collaborator

@thiell @vsoch the following lines make me really think the problem is on the Lustre side :

Mar 27 16:44:47 sh-101-72.int Singularity[59330]: action-suid (U=251417,P=59330)> Autofs bug path /oak is not a directory
Mar 27 16:44:47 sh-101-72.int Singularity[59330]: action-suid (U=251417,P=59330)> There was an error binding the path /oak: No such file or directory

Autofs bug path /oak is not a directory means Singularity can't stat on /oak directory, that's why I think the problem could be with Lustre, I don't know Lustre at all, an assumption : maybe it appears from time to time when metadata service is overloaded ?

@thiell
Copy link
Author

thiell commented Mar 29, 2018

@cclerget thx but what is the errno from stat()? it just can't be ENOENT, so that would be helpful to know. /oak is a production filesystem that is closely monitored, has been working well lately and concurrently used by dozen of users without issue. Only singularity is having issue with it. Could this be a race when several singularity instances are starting at the same time?

@thiell
Copy link
Author

thiell commented Apr 4, 2018

We were just tracking one of these errors Autofs bug path /oak is not a directory followed by the usual singularity failure, and /oak was perfectly functional. This is clearly our main issue right now with Singularity on Sherlock, as it is impacting more and more users. I wonder if that could be related to overlayfs? We're running CentOS 7.4

[62089.622356] TECH PREVIEW: Overlay filesystem may not be fully supported.

Would it be easy to do a change that would allow our users to ignore a set of host mount points?

@cclerget
Copy link
Collaborator

cclerget commented Apr 5, 2018

Sorry for the delay @thiell, I just commit some changes on this branch https://github.com/cclerget/singularity/tree/autofs-debug to display string error. Can you give it a try to look what happens exactly ?

@thiell
Copy link
Author

thiell commented Apr 5, 2018

Hi @cclerget,
Since my last comment we have made some great progress regarding this issue. It seems indeed due to the fact that stat() fails from time to time on the base mount point /oak, for users not having any account in Oak. We're using a new lustre feature (nodemap) and the UID/GID mapping is complex, so for some users, Lustre was returning EPERM for stat() on /oak. However, it is a root directory (in /) that we think is subject to the (shared) Linux inode cache. If a user having access to Oak previously did the equivalent of ls -l / on a compute node, thus populating the inode cache, this somehow fixed the "issue" for users not having access to /oak. We have applied a workaround on the Lustre server part so that unknown UIDs are now known and resolved as dummy user (without any specific access level). This seems to have stopped confusing Singularity so far.

Despite the fact that this Lustre thing might be confusing, we still think that Singularity has a bug and isn't returning the proper errno (should have been EPERM and not ENOENT). So I'm asking, is it OK to always make the assumption that stat() will succeed on a mount point? Would it be useful to at least allow the Singularity administrator to list optional mount points so that stat() could fail?

Thanks a lot for your patch too. I'm not sure how we can try your autofs-debug patch right now, as we don't want to disrupt the production anymore (the use of Singularity has been painful enough for our users lately due to this issue), but I'll see what I can do and keep you posted if I find something.

@cclerget
Copy link
Collaborator

cclerget commented Apr 6, 2018

Hey @thiell , glad you found a solution ! You're right, Singularity don't return proper errno, that was fixed in release candidate for mount errors. The patch submitted was only to see exact stat() errno, since you got proper errno, you don't need it.

About your question, if stat() fails mount() will fail too, that's not possible. Since /oak is not managed by autofs, you don't need to use autofs bug path directive and stat(). But you says, when user do ls / before using /oak, the problem seems fixed, maybe you could try to put a ls / > /dev/null in /usr/local/etc/singularity/init (depends of your setup), this file is located in same folder than singularity.conf, since this file is sourced for all users before executing Singularity binary

@dtrudg dtrudg closed this as completed Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants