Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5Fcreate(): unable to create file #1165

Closed
hyjin86 opened this issue May 28, 2019 · 4 comments
Closed

H5Fcreate(): unable to create file #1165

hyjin86 opened this issue May 28, 2019 · 4 comments

Comments

@hyjin86
Copy link

hyjin86 commented May 28, 2019

Hi,

I ran yank in serial, but it got crashed as soon as it created the file named "complex.nc" with the following error message:

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140080270632768:
#000: H5F.c line 444 in H5Fcreate(): unable to create file
major: File accessibilty
minor: Unable to open file
#1: H5Fint.c line 1364 in H5F__create(): unable to open file
major: File accessibilty
minor: Unable to open file
#2: H5Fint.c line 1615 in H5F_open(): unable to lock the file
major: File accessibilty
minor: Unable to open file
#3: H5FD.c line 1640 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Can't update object
#4: H5FDsec2.c line 939 in H5FD_sec2_lock(): file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override), errno = 38, error message = 'Function not implemented'
major: File accessibilty
minor: Bad file ID accessed
Traceback (most recent call last):
File "/home01/a788a01/anaconda3/bin/yank", line 11, in
load_entry_point('yank==0.24.0', 'console_scripts', 'yank')()
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/cli.py", line 73, in main
dispatched = getattr(commands, command).dispatch(command_args)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/commands/script.py", line 148, in dispatch
yaml_builder.run_experiments(write_status=write_status)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/experiment.py", line 797, in run_experiments
completed[exp_index] = self._run_experiment(exp, write_status=write_status)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/experiment.py", line 3156, in _run_experiment
built_experiment.run(n_iterations=switch_experiment_interval)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/experiment.py", line 451, in run
alchemical_phase = phase.initialize_alchemical_phase()
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/experiment.py", line 317, in initialize_alchemical_phase
alchemical_phase = self.create_alchemical_phase()
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/experiment.py", line 303, in create_alchemical_phase
**create_kwargs)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/yank.py", line 1089, in create
storage=storage, unsampled_thermodynamic_states=expanded_cutoff_states, metadata=metadata)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/multistate/multistatesampler.py", line 541, in create
self._initialize_reporter()
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/mpi.py", line 271, in _wrapper
return run_single_node(rank, task, *args, **kwargs)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/mpi.py", line 220, in run_single_node
result = task(*args, **kwargs)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/multistate/multistatesampler.py", line 1076, in _initialize_reporter
self._reporter.open(mode='w')
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/multistate/multistatereporter.py", line 270, in open
mode, version=netcdf_format)
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/multistate/multistatereporter.py", line 366, in _open_dataset_robustly
raise e
File "/home01/a788a01/anaconda3/lib/python3.6/site-packages/yank/multistate/multistatereporter.py", line 359, in _open_dataset_robustly
return netcdf.Dataset(*args, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2291, in netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1855, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: b'fep/experiments/complex.nc'

I wonder whether anyone can help me to fix this.
Thanks!

Hyunjin.

@andrrizzi
Copy link
Contributor

Hi Hyunjin. Unfortunately, the only thing that worked for me was trying to restart. Sometimes, it needed ~10 times for netcdf to get unstucked. If your calculation crashed at a very bad time, it is possible however that the dataset got currupted and netcdf won't be able to recover the data. If you do find an alternative solution to work around the new locking problems of netcdf4 and hdf5, please do let us know!

@hyjin86
Copy link
Author

hyjin86 commented May 29, 2019

Hi Andrea,

I fix it^^. I find the following site, which explains well how to deal with this problem.
https://support.nesi.org.nz/hc/en-gb/articles/360000902955-NetCDF-HDF5-file-locking

Best,
Hyunjin.

@andrrizzi
Copy link
Contributor

That's fantastic, Hyunjin. Thanks for finding the link! Currently, we attempt about 5 times to open the file before giving up. We could try exporting HDF5_USE_FILE_LOCKING=False at the very last attempt to make this automatic.

@andrrizzi
Copy link
Contributor

This was implemented in #1168 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants