Skip to content

Local Python env Cluster#5013

Open
DPeterK wants to merge 14 commits intodask:mainfrom
DPeterK:venv_cluster
Open

Local Python env Cluster#5013
DPeterK wants to merge 14 commits intodask:mainfrom
DPeterK:venv_cluster

Conversation

@DPeterK
Copy link
Copy Markdown
Contributor

@DPeterK DPeterK commented Jul 1, 2021

Add a new distributed cluster manager that provides a Scheduler and Workers, running on localhost, but which are run using a different, user-specified Python executable.

Rationale
This is particularly desirable functionality when using the dask labextension and JupyterLab, as dask-labextension will launch a LocalCluster using the same Python executable as provides the JupyterLab instance. This may not always be desirable, notably when also using Cloud JupyterLab services (such as SageMaker Studio or AzureML), where you do not have control over the Python executable running the JupyterLab instance.

Note: no tests yet! I'd appreciate some input on how I might test this...

cc @jacobtomlinson @jrbourbeau

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome. We've discussed this offline a few times and it's really nice to see a PR.

Comment thread distributed/deploy/local_env.py Outdated
Comment thread distributed/deploy/local_env.py Outdated
Comment thread distributed/deploy/local_env.py Outdated
@jacobtomlinson
Copy link
Copy Markdown
Member

This is looking good @DPeterK. Just needs some tests now. I would look at the SSHCluster and LocalCluster tests for some ideas.

I expect you will need a fixture which figures out the current conda env's name and uses that. Testing multiple envs may be tricky from a setup point of view.

@jacobtomlinson
Copy link
Copy Markdown
Member

@jrbourbeau usually had good pointers on testing.

@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@DPeterK
Copy link
Copy Markdown
Contributor Author

DPeterK commented Aug 2, 2021

I've added a module with tests for this cluster manager, to at least show my intention for testing it - as the tests aren't passing currently. There seem to be two problems:

  1. bad integration with the IO loop (?)
  2. the cluster isn't closing cleanly.

Here's the traceback in each case...

1. bad integration with the IO loop:

Traceback (most recent call last):
  File "/.../distributed/distributed/deploy/tests/test_local_env.py", line 82, in test_set_env
    result = await client.submit(f)
  File "/.../distributed/distributed/client.py", line 427, in __await__
    return self.result().__await__()
AttributeError: 'int' object has no attribute '__await__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/.../distributed/distributed/utils.py", line 671, in log_errors
    yield
  File "/.../distributed/distributed/client.py", line 1358, in _close
    await asyncio.wait_for(
  File "/.../miniconda3/envs/dask-distributed/lib/python3.8/asyncio/tasks.py", line 475, in wait_for
    fut = ensure_future(fut, loop=loop)
  File "/.../miniconda3/envs/dask-distributed/lib/python3.8/asyncio/tasks.py", line 678, in ensure_future
    raise ValueError('The future belongs to a different loop than '
ValueError: The future belongs to a different loop than the one specified as the loop argument

2. the cluster isn't closing cleanly:

Traceback (most recent call last):
  File "/.../distributed/distributed/client.py", line 1209, in __del__
    self.close()
  File "/.../distributed/distributed/client.py", line 1449, in close
    sync(self.loop, self._close, fast=True, callback_timeout=timeout)
  File "/.../distributed/distributed/utils.py", line 348, in sync
    raise TimeoutError("timed out after %s s." % (callback_timeout,))
asyncio.exceptions.TimeoutError: timed out after 20 s.

I'll dig into these errors, but I thought it was worth pushing the tests and noting the errors in case they are known / common to others...

I'll also fix the conflict with upstream.


@pytest.mark.asyncio
async def test_basic():
async with LocalEnvCluster(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you instantiate LocalEnvCluster with async with you need to also set the asynchronous=True kwarg.

@panas2567
Copy link
Copy Markdown

Hello, is there a reason why this functionality hasn't been finalized?

@jacobtomlinson
Copy link
Copy Markdown
Member

I think it was just a lack of review capacity. Sorry this has sat here for so long @DPeterK! @panas2567 do you have any interest in trying to push these changes over the line?

@panas2567
Copy link
Copy Markdown

Thanks @jacobtomlinson for such a quick response. Well, the case I'm solving now, is that I'm setting up an Jhub environment in AzureML for my team, hence I have to tackle the dependency differences between the default jupyter_env conda env (the scheduler and workers' env) and the notebook's kernel (the client's env).

It'd directly solve this challenge, if we were able to select a custom environment for the cluster created from the Dask extension/widget in Jhub.

@jacobtomlinson
Copy link
Copy Markdown
Member

I think this was also the same use case that @DPeterK had

@panas2567
Copy link
Copy Markdown

I see, I'm happy to help if there is anything still to be implemented.

@jacobtomlinson
Copy link
Copy Markdown
Member

@panas2567 I think the next steps here would be to resolve the merge conflicts and then test this out and see if it solves your problem. Then report back here.

@marberi
Copy link
Copy Markdown

marberi commented Mar 3, 2026

+1 for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants