-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save error when using lazy data and dask distributed scheduler #3157
Comments
Hi @jvegasbsc, thanks for contacting us with this issue you've hit, and apologies it's taken us a little while to get back to you. The problem you've hit here is that the save is trying to run in parallel. This means that different parts of the NetCDF file to be are being handled by different processes, and eventually these different processes will need to be pulled together, as only a single process can be used to write the NetCDF file. It's at this point that the error occurs. For the save to happen in only a single process all the work done in the other processes needs to be pulled together, and this is done by communicating the intermediate results between workers. Dask does this by pickling the intermediate results in preparation for transmitting from one worker to the next. The problem is that NetCDF objects can't be pickled, which is what causes the error above to be raised. The solution for this is to not parallelise the save step, but instead run it in serial. This can be done with a context manager: print('Saving...')
with dask.config.set(scheduler='synchronous'):
iris.save(cube, '/home/Earth/jvegas/temp.nc') Note that you may need to edit the name of the scheduler based on the specific version of dask you're using. For some reason scheduler names are very ripe for change between dask versions! |
Thanks. It works now |
Hi @jvegasbsc - great to hear it's working now and glad that we were able to help you out with getting to the bottom of this! |
Hi
I am having an error when using Iris and dask distributed or processes scheduler (not the threaded one). Look at the following code:
Independently of the netcdf file used for testing, it fails with the following traceback:
Iris was installed with conda and versions are:
dask 0.19.0 py_0 conda-forge
dask-core 0.19.0 py_0 conda-forge
distributed 1.23.0 py36_0 conda-forge
iris 2.1.0 py36_3 conda-forge
I also tested with other versions of dask and got the same error
The text was updated successfully, but these errors were encountered: