I updated dask/xarray via conda yesterday (to distributed-1.21.4) and started having issues with workers not starting on an hpc system.
Research eventually brought me to 31c470e which I tried.
It seems to solve some issues i.e.:
- workers are starting
- client restarts still appear to have some issues:
client = Client(scheduler_file=os.path.expanduser('/home1/scratch/aponte/dask/scheduler.json'))
client.restart()
leads to:
distributed.client - ERROR - Restart timed out after 3.000000 seconds
I can work however but have some issues when trying to upload files (I realize this may be another issue):
client.upload_file('/home1/datahome/aponte/iwave_sst/hw/utils.py')
leads to:
distributed.utils - ERROR - [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
Traceback (most recent call last):
File "/home1/datahome/aponte/.miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
result[0] = yield make_coro()
File "/home1/datahome/aponte/.miniconda3/envs/pangeo/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/home1/datahome/aponte/.miniconda3/envs/pangeo/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
The /home1/scratch/aponte/dask/worker-oft8kj37/ does not exist indeed.
When I look for oft8kj37 in log files I get the following.
The last hit may be relevant:
iwave_sst/datarmor% grep oft8kj37 dask-*
dask-scheduler.o958818:distributed.core - ERROR - [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-scheduler.o958818:FileNotFoundError: [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-worker.o958819:distributed.worker - INFO - Local Directory: /home1/scratch/aponte/dask/worker-oft8kj37
dask-worker.o958819:distributed.core - ERROR - [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-worker.o958819:FileNotFoundError: [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-worker.o958819:distributed.core - ERROR - [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-worker.o958819:FileNotFoundError: [Errno 2] No such file or directory: '/home1/scratch/aponte/dask/worker-oft8kj37/utils.py'
dask-worker.o958822:distributed.diskutils - WARNING - Found stale lock file and directory '/home1/scratch/aponte/dask/worker-oft8kj37', purging
I updated dask/xarray via conda yesterday (to distributed-1.21.4) and started having issues with workers not starting on an hpc system.
Research eventually brought me to 31c470e which I tried.
It seems to solve some issues i.e.:
leads to:
I can work however but have some issues when trying to upload files (I realize this may be another issue):
leads to:
The
/home1/scratch/aponte/dask/worker-oft8kj37/does not exist indeed.When I look for
oft8kj37in log files I get the following.The last hit may be relevant: