-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make job submission asynchronous #470
Comments
For what it's worth, this should be pretty trivial to make async using |
Or to not have to change anything else using the async def _submit_job(self, script_filename):
return await asyncio.get_running_loop().run_in_executor(None, self._call(...)) |
For the terminology, it's on the Jupyterlab UI process (the notebook server) that dask-labextension will run, and not in the Kernel (another process were the code gets executed, e.g. the notebook cells).
So if I understand, the An easy solution is stop using jupyterlab-extension for starting Dask cluster for the time being 😄. I understand this can be seen as a regression for users... For my part I've never use dask-labextension to launch Dask clusters on our job scheduling system, I'm always doing it inside a notebook cell (so the Kernel). I only use the extension to watch my computations. I also think that this might be a Condor issue (job submission should be almost immediate in job queueing systems), or that this can be handled in dask-labextention maybe? Anyway, if you find a simple way to make things asynchronous here, this would be welcomed too! |
Related to #567 |
I am experiencing a similar issue as described here. In fact, my workers actually exit because the main process is hanging for so long, all because it's busy waiting for async def _submit_job(self, script_filename):
return await asyncio.get_running_loop().run_in_executor(None, self._call, shlex.split(self.submit_command) + [script_filename]) Would love to see this changed! |
Well, I know almost nothing in asyncio stuff. I think we should make dask-jobqueue more compatible with it, but I'm also not sure we cannot make it just by adding small changes like this, or can we? cc @jacobtomlinson. |
@jrueb it would be great to see a PR with this change. If |
Okay, I will look into it and make a PR once I got a satisfying solution. Will also be interesting to see why the last PR for this was never finished. |
I have noticed that execution of commands (e.g.,
condor_submit
for the condor backend) appear to be synchronous. In fact, there's a small note about this in the code itself:https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/core.py#L305
We've started to notice this particularly at very busy batch schedulers. For example, when dask labextension (https://github.com/dask/dask-labextension) is used in a Jupyter notebook, it will spawn the Dask scheduler inside the jupyter hub process (I think I got this terminology right?) and not the notebook. Because it's in the hub itself, if dask jobqueue is non-responsive then the entire UI freezes (as no I/O is done in the event loop). This triggers user complaints of "Jupyter stops working when we use Dask".
The impact of the blocking behavior can be easily seen by replacing the submit executable with a shell script that does a
sleep 20
before invoking the real submit executable.The text was updated successfully, but these errors were encountered: