Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cephfs FUSE "[Errno 19] No such device" in read_mod_and_etag #7185

Closed
1 of 2 tasks
brainstorm opened this issue Apr 19, 2018 · 3 comments
Closed
1 of 2 tasks

Cephfs FUSE "[Errno 19] No such device" in read_mod_and_etag #7185

brainstorm opened this issue Apr 19, 2018 · 3 comments
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity

Comments

@brainstorm
Copy link

brainstorm commented Apr 19, 2018

I'm submitting a...

  • bug report
  • feature request

Current Behavior

In a FUSE-mounted Cephfs share on the Spartan HPC cluster, we are getting [Errno 19] No such device when trying to install/update conda.

Writing outside the FUSE client works fine, but perhaps this issue could be mitigated by setting MAP_PRIVATE on mmap called by the read_mod_and_etag method.

Issue #4592 might also benefit from this insight perhaps?

/cc @chapmanb @ohofmann @pdiakumis

Steps to Reproduce

Given the FUSE-mounted Cephfs system above, here's what happens:

spartan /data/cephfs/punim0010/local $ python bcbio_nextgen_install.py /data/cephfs/punim0010/local/development/bcbio --tooldir=/data/cephfs/punim0010/local/development/ --genomes GRCh37 --aligners bwa
Checking required dependencies
Installing isolated base python installation
--2018-04-19 09:31:37--  https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.16.18.10, 104.16.19.10, 2400:cb00:2048:1::6810:120a, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.16.18.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39893266 (38M) [application/x-sh]
Saving to: ‘Miniconda2-latest-Linux-x86_64.sh’
     0K ........ ........ ........ ........ ........ ........  7%  105M 0s
  3072K ........ ........ ........ ........ ........ ........ 15%  132M 0s
  6144K ........ ........ ........ ........ ........ ........ 23%  117M 0s
  9216K ........ ........ ........ ........ ........ ........ 31%  114M 0s
 12288K ........ ........ ........ ........ ........ ........ 39%  117M 0s
 15360K ........ ........ ........ ........ ........ ........ 47%  122M 0s
 18432K ........ ........ ........ ........ ........ ........ 55%  133M 0s
 21504K ........ ........ ........ ........ ........ ........ 63%  123M 0s
 24576K ........ ........ ........ ........ ........ ........ 70%  142M 0s
 27648K ........ ........ ........ ........ ........ ........ 78%  143M 0s
 30720K ........ ........ ........ ........ ........ ........ 86%  102M 0s
 33792K ........ ........ ........ ........ ........ ........ 94%  110M 0s
 36864K ........ ........ ........ ........                  100%  130M=0.3s
2018-04-19 09:31:37 (121 MB/s) - ‘Miniconda2-latest-Linux-x86_64.sh’ saved [39893266/39893266]
PREFIX=/data/cephfs/punim0010/local/development/bcbio/anaconda
installing: python-2.7.14-h1571d57_29 ...
Python 2.7.14 :: Anaconda, Inc.
installing: ca-certificates-2017.08.26-h1d4fec5_0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-h7cc24e2_2 ...
installing: libstdcxx-ng-7.2.0-h7a57d05_2 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.0-h9df7e31_2 ...
installing: openssl-1.0.2n-hb7f436b_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-ha838bed_2 ...
installing: libedit-3.1-heed3624_0 ...
installing: readline-7.0-ha6073c6_4 ...
installing: sqlite-3.22.0-h1bed415_0 ...
installing: asn1crypto-0.24.0-py27_0 ...
installing: certifi-2018.1.18-py27_0 ...
installing: chardet-3.0.4-py27hfa10054_1 ...
installing: enum34-1.1.6-py27h99a27e9_1 ...
installing: futures-3.2.0-py27h7b459c0_0 ...
installing: idna-2.6-py27h5722d68_1 ...
installing: ipaddress-1.0.19-py27_0 ...
installing: pycosat-0.6.3-py27ha4109ae_0 ...
installing: pycparser-2.18-py27hefa08c5_1 ...
installing: pysocks-1.6.7-py27he2db6d2_1 ...
installing: ruamel_yaml-0.15.35-py27h14c3975_1 ...
installing: six-1.11.0-py27h5f960f1_1 ...
installing: cffi-1.11.4-py27h9745a5d_0 ...
installing: setuptools-38.4.0-py27_0 ...
installing: cryptography-2.1.4-py27h6697b16_0 ...
installing: wheel-0.30.0-py27h2bc6bb2_1 ...
installing: pip-9.0.1-py27ha730c48_4 ...
installing: pyopenssl-17.5.0-py27hcee3be0_0 ...
installing: urllib3-1.22-py27ha55213b_0 ...
installing: requests-2.18.4-py27hc5b0589_1 ...
installing: conda-4.4.10-py27_0 ...
installation finished.
Installing bcbio-nextgen
--2018-04-19 09:35:01--  https://raw.githubusercontent.com/bcbio/bcbio-nextgen/master/requirements-conda.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.80.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.80.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20 [text/plain]
Saving to: ‘requirements-conda.txt’
100%[==================================================================================================================================================================================================>] 20          --.-K/s   in 0s
2018-04-19 09:35:01 (3.77 MB/s) - ‘requirements-conda.txt’ saved [20/20]
Solving environment: failed
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
    Traceback (most recent call last):
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/exceptions.py", line 789, in __call__
        return func(*args, **kwargs)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/cli/main.py", line 78, in _main
        exit_code = do_call(args, p)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/cli/conda_argparse.py", line 77, in do_call
        exit_code = getattr(module, func_name)(args, parser)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/cli/main_install.py", line 11, in execute
        install(args, parser, 'install')
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/cli/install.py", line 236, in install
        force_reinstall=context.force,
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/solve.py", line 504, in solve_for_transaction
        force_remove, force_reinstall)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/solve.py", line 437, in solve_for_diff
        final_precs = self.solve_final_state(deps_modifier, prune, ignore_pinned, force_remove)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/solve.py", line 178, in solve_final_state
        index, r = self._prepare(prepared_specs)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/solve.py", line 560, in _prepare
        self.subdirs, prepared_specs)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/index.py", line 213, in get_reduced_index
        new_records = query_all(spec)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/index.py", line 186, in query_all
        return tuple(concat(future.result() for future in as_completed(futures)))
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/repodata.py", line 90, in query
        self.load()
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/repodata.py", line 134, in load
        _internal_state = self._load()
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/repodata.py", line 171, in _load
        mod_etag_headers = read_mod_and_etag(self.cache_path_json)
      File "/data/cephfs/punim0010/local/development/bcbio/anaconda/lib/python2.7/site-packages/conda/core/repodata.py", line 354, in read_mod_and_etag
        with closing(mmap(f.fileno(), 0, access=ACCESS_READ)) as m:
    error: [Errno 19] No such device
`$ /data/cephfs/punim0010/local/development/bcbio/anaconda/bin/conda install --yes conda>=4.4`
  environment variables:
                 CIO_TEST=<not set>
               CONDA_ROOT=/data/cephfs/punim0010/local/development/bcbio/anaconda
          LD_LIBRARY_PATH=/usr/local/slurm/latest/lib
  LMOD_DEFAULT_MODULEPATH=/usr/local/easybuild/modules/all:/etc/modulefiles:/usr/share/modulefil
                          es:/usr/share/Modules/modulefiles:/usr/share/modulefiles/Linux:/usr/sh
                          are/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core:/usr/local/
                          Modules/modulefiles
                  MANPATH=/usr/local/slurm/latest/share/man:/usr/share/lmod/lmod/share/man::
               MODULEPATH=~/.modules:/usr/local/easybuild/modules/all:/etc/modulefiles:/usr/shar
                          e/modulefiles:/usr/share/Modules/modulefiles:/usr/share/modulefiles/Li
                          nux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core:
                          /usr/local/Modules/modulefiles
                     PATH=/usr/local/showq/0.15/bin:/usr/local/slurm/latest/bin:/usr/lib64/qt-3.
                          3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/b
                          in:/opt/puppetlabs/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
     active environment : None
       user config file : /home/pdiakumis/.condarc
 populated config files :
          conda version : 4.4.10
    conda-build version : not installed
         python version : 2.7.14.final.0
       base environment : /data/cephfs/punim0010/local/development/bcbio/anaconda  (writable)
           channel URLs : https://repo.continuum.io/pkgs/main/linux-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/linux-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
          package cache : /data/cephfs/punim0010/local/development/bcbio/anaconda/pkgs
                          /home/pdiakumis/.conda/pkgs
       envs directories : /data/cephfs/punim0010/local/development/bcbio/anaconda/envs
                          /home/pdiakumis/.conda/envs
               platform : linux-64
             user-agent : conda/4.4.10 requests/2.18.4 CPython/2.7.14 Linux/4.4.123-1.el7.elrepo.x86_64 rhel/7.5 glibc/2.17
                UID:GID : 10658:10000
             netrc file : None
           offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
Upload successful.
Traceback (most recent call last):
  File "bcbio_nextgen_install.py", line 260, in <module>
    main(parser.parse_args(), sys.argv[1:])
  File "bcbio_nextgen_install.py", line 38, in main
    bcbio = install_conda_pkgs(anaconda, args)
  File "bcbio_nextgen_install.py", line 76, in install_conda_pkgs
    subprocess.check_call([anaconda["conda"], "install", "--yes", "conda>=4.4"])
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/data/cephfs/punim0010/local/development/bcbio/anaconda/bin/conda', 'install', '--yes', 'conda>=4.4']' returned non-zero exit status 1

Expected Behavior

Conda install and updates should succeed flawlessly.

brainstorm referenced this issue in ceph/ceph Apr 19, 2018
Signed-off-by: Sage Weil <sage@redhat.com>
@brainstorm
Copy link
Author

brainstorm commented Apr 19, 2018

So... sorry @liewegas but we just got some clarity from our HPC CephFS sysadmins:

Yeah, it was a known issue that came up when we made a change to disable the local filesystem caching on the nodes. We did this as some users were reporting that if they created a directory/file on one node in a multi node job, it would take a while for that file to be visible on another node. We're currently investigating whether we should revert this change.

I will just close this issue in the coming days unless the countermeasure outlined above (MAP_PRIVATE on mmap) is still a safe/bulletproof to handle this anyway at the application level?... although not sure since a lot of apps might have to adopt this with performance impacts I guess?

@github-actions
Copy link

Hi there, thank you for your contribution to Conda!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this issue to remain open please:

  1. Verify that you can still reproduce the issue in the latest version of Conda

  2. Comment that the issue is still reproducible and include:

  • What version of Conda you reproduced the issue on
  • What OS and version you reproduced the issue on
  • What steps you followed to reproduce the issue
  1. It would also be helpful to have the output of the following commands available:
  • conda info
  • conda config --show-sources
  • conda list --show-channel-urls

NOTE: If this issue was closed prematurely, please leave a comment and we will gladly reopen the issue.

In case this issue was originally about a project that is covered by the Anaconda issue tracker (e.g. Anaconda, Miniconda, packages built by Anaconda, Inc. like Anaconda Navigator etc), please reopen the issue there again.

Thanks!

@github-actions github-actions bot added the stale [bot] marked as stale due to inactivity label Nov 21, 2021
@github-actions
Copy link

Hi again!

This issue has been closed since it has not had recent activity.
Please don't hesitate to leave a comment if that was done prematurely.

Thank you for your contribution.

@github-actions github-actions bot added the stale::closed [bot] closed after being marked as stale label Feb 19, 2022
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Feb 19, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

1 participant