Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpecificationCluster #2675

Merged
merged 58 commits into from May 22, 2019

Conversation

@mrocklin
Copy link
Member

commented May 9, 2019

This is intended to be a base for LocalCluster (and others) that want to
specify more heterogeneous information about workers.

Additionally, this PR does the following:

  1. Starts the use of Python 3 only code in the main codebase
  2. Cleans up a number of our intermittent testing failures (we had nannies that survived test cleanup before, and they were sending random messages to ports that were screwing up other tests)
  3. Adds a couple of new small failures, notably silent shutdown is no longer entirely silent (working on it)

Docstring

Cluster that requires a full specification of workers

This attempts to handle much of the logistics of cleanly setting up and
tearing down a scheduler and workers, without handling any of the logic
around user inputs. It should form the base of other cluster creation
functions.

Examples

>>> spec = {
...     'my-worker': {"cls": Worker, "options": {"ncores": 1}},
...     'my-nanny': {"cls": Nanny, "options": {"ncores": 2}},
... }
>>> cluster = SpecCluster(workers=spec)
Add SpecificationCluster
This is intended to be a base for LocalCluster (and others) that want to
specify more heterogeneous information about workers.
@mrocklin

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

I'm going to try this out with some heterogeneous GPU machines. This feels like a nice base on which to rewrite and cleanup LocalCluster though, a prospect for which I am excited :)

@dhirschfeld

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

Is the spec intended to be per-worker - e.g.:

spec = {
    'worker1': {"cls": Worker, "options": {"ncores": 1}},
    'nanny1': {"cls": Nanny, "options": {"ncores": 2}},
    'worker2': {"cls": Worker, "options": {"ncores": 1}},
    'nanny2': {"cls": Nanny, "options": {"ncores": 2}},
    'worker3': {"cls": Worker, "options": {"ncores": 1}},
    'nanny3': {"cls": Nanny, "options": {"ncores": 2}},
    ...
}
@dhirschfeld

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

I'm just wondering if now is a good time to introduce the concept of "worker pools":
#2208 (comment)

e.g. you would pass pools in addition to workers and have the workers dict reference the specs defined in pools

>>> pool_specs = {
...     'default': {
...         'worker': {"cls": Worker, "options": {"ncores": 1}},
...         'nanny': {"cls": Nanny, "options": {"ncores": 2}},
...     },
...     'no-nanny': {
...         'worker': {"cls": Worker, "options": {"ncores": 1}},
...     },
... }

>>> worker_specs = {'worker1': 'default', 'worker2': 'no-nanny'}
>>> cluster = SpecCluster(workers=worker_specs, pools=pool_specs)

mrocklin added some commits May 16, 2019

Cleanup the handling of nannies
Previously nannies could leak out in various ways
@mrocklin

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

I'm just wondering if now is a good time to introduce the concept of "worker pools":

This is related to that issue, but is lower level. I think that it would enable other people to add things like pools more easily. If this is something that you'd like to explore I encourage you to do so now. I agree that now would be a good time to explore this to help guide design.

distributed/deploy/local.py Outdated Show resolved Hide resolved
# If people call this frequently, we only want to run it once
return self._correct_state_waiting
else:
task = asyncio.Task(self._correct_state_internal())

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

You shouldn't create Tasks manually, but instead use asyncio.ensure_future.

distributed/deploy/spec.py Show resolved Hide resolved
d = self.worker_spec[name]
cls, opts = d["cls"], d.get("options", {})
if "name" not in opts:
opts = toolz.merge({"name": name}, opts, {"loop": self.loop})

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

Did you mean to include the loop in here?

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

Yes, ideally we want the worker to use the IOLoop used by the cluster object.

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

I mean that loop is only added if name is not in opts. Wouldn't you always want to pass it?

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

Ah, indeed. Looking at this again it looks like we do this in an async def function anyway, so IOLoop.current() should be valid regardless. I'll remove the reference to loop entirely, which should be helpful in reducing the contract too.

if workers:
await asyncio.wait(workers)
for w in workers:
w._cluster = weakref.ref(self)

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

What is the cluster weakref for?

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

There are a lot of weakrefs around now. They're useful when tracking down leaking references to things.

for w in workers:
w._cluster = weakref.ref(self)
if self.status == "running":
await w

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

The non running workers are never awaited, what happens to them? They're still added to the workers dict below.

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

This is again a tornado/asyncio difference. I've removed the running check and made things optimal, I think for both async def and gen.coroutine style functions.


async def _close(self):
while self.status == "closing":
await asyncio.sleep(0.1)

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

Instead of polling, could have a future for the closing operation (created by the first call to _close), and just wait on that?

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

Good thought. I'm inclined to wait on this for now though if that's ok.


def _correct_state(self):
if self._correct_state_waiting:
# If people call this frequently, we only want to run it once

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

I think this drops scale requests while a current scale request is processing:

  • Call scale
  • Spec updated
  • correct state task start, task is stored as _correct_state_waiting
  • scale returns
  • Call scale again
  • Spec updated
  • since previous call is still in progress, state is not corrected, no new workers are started/stopped. Spec and tasks are now out of sync. Also, since there are multiple await calls in _correct_state_internal, the worker_spec can be different at different points in that function, leading to potential bugs.

One naive solution would be to have a background task that loops forever, waiting on an event:

while self.running:
    await self._spec_updated.wait()
    # update workers to match spec
    # After updating, only clear the event if things are up to date
    # If things aren't up to date, then we loop again
    if self.spec_matches_current_state():
        self._spec_updated.clear()

Then _correct_state would look like:

def _correct_state(self):
    # set the event, it's only ever cleared in the loop
    # We force synchronization here to prevent scheduling tons
    # of tasks all setting the event, this blocks until it's set.
    return self.sync(self._mark_state_updated)

async def _mark_state_updated(self):
    self._state_updated.set()

There are likely other ways to handle this. In dask-gateway I have a task per worker/scheduler. As the spec updates, unfinished tasks are cancelled or new ones are fired. If a previous scale call is still in progress for a cluster, scale will block until that call has finished. Note that this only blocks while we update our internal task state (cancelling/firing new tasks), not until those tasks have completed.

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

since previous call is still in progress, state is not corrected, no new workers are started/stopped. Spec and tasks are now out of sync. Also, since there are multiple await calls in _correct_state_internal, the worker_spec can be different at different points in that function, leading to potential bugs.

So, the _correct_state_waiting attribute isn't the currently running task, it's the currently enqueued one. Once _correct_state starts running it immediately clears this attribute. After someone calls scale there is a clean, not-yet-run _correct_state_waiting future that will run soon.

This comment has been minimized.

Copy link
@jcrist

jcrist May 22, 2019

Member

Since _correct_state_internal waits on the created workers, this does mean that there's no way to cancel pending workers. This is fine for LocalCluster, but would be problematic if used as a base class for other cluster managers. The following would request and start 100 workers before scaling back down afaict:

cluster.scale(100)
cluster.scale(2)

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 22, 2019

Author Member

I think that this depends on what you mean by "waits on".

One approach is that for a cluster manager to reach a correct state it only has to successfully submit a request to the resource manager have received an acknowledgement that the resource manager is handling it. We're not guaranteeing full deployment, merely that we've done our part of the job. I would expect this to almost always be fairly fast.

Separately, there is now a Client.wait_for_workers(n=10) method that might be used for full client <-> scheduler checks.


async def _start(self):
while self.status == "starting":
await asyncio.sleep(0.01)

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

Same here as closing, could wait on the start task instead of polling.


def __enter__(self):
self.sync(self._correct_state)
self.sync(self._wait_for_workers)

This comment has been minimized.

Copy link
@jcrist

jcrist May 21, 2019

Member

Does this mean that __enter__ will only complete once the initial n workers have started? What happens if we request 2, 1 worker starts and 1 fails?

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

Yes, this might hang. I'm not sure we ever had a test in our test suite with this case. I'll add something.

This comment has been minimized.

Copy link
@mrocklin

mrocklin May 21, 2019

Author Member

Added a test in 5e94069

mrocklin added some commits May 21, 2019

@mrocklin mrocklin force-pushed the mrocklin:spec-cluster branch from d157410 to 5e94069 May 21, 2019

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

Thanks for the review @jcrist ! If you have a chance to pass through things tomorrow I would appreciate it

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented May 22, 2019

I plan to merge this later today if there are no further comments. Tests here are pretty decent, although I'll need to overhaul adaptive. I'd like to do this in a separate PR though.

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented May 22, 2019

OK. Merging this in. I intend to be active in this area for a while, so if there are still issues please feel free to raise them. I plan to do the following:

  1. Fix up adaptive so that it moves logic into the scheduler, and makes tests here pass
  2. Try out SpecCluster with Dask-Kubernetes. I imagine that this will force some changes here.

@mrocklin mrocklin merged commit 6e0c0a6 into dask:master May 22, 2019

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@mrocklin mrocklin deleted the mrocklin:spec-cluster branch May 22, 2019

@jrbourbeau jrbourbeau referenced this pull request May 23, 2019
0 of 2 tasks complete

lesteve added a commit to lesteve/distributed that referenced this pull request May 29, 2019

Add back LocalCluster.__repr__.
LocalCluster.__repr__ was removed in dask#2675.

lesteve added a commit to lesteve/distributed that referenced this pull request May 29, 2019

Add back LocalCluster.__repr__.
LocalCluster.__repr__ was removed in dask#2675.

mrocklin added a commit that referenced this pull request May 29, 2019

Add back LocalCluster.__repr__. (#2732)
LocalCluster.__repr__ was removed in #2675.

calebho added a commit to calebho/distributed that referenced this pull request May 29, 2019

Add SpecificationCluster (dask#2675)
This is intended to be a base for LocalCluster (and others) that want to
specify more heterogeneous information about workers.

This forces the use of Python 3 and introduces more asyncio and async def handling.

This cleans up a number of intermittent testing failures and improves our testing harness hygeine.

calebho added a commit to calebho/distributed that referenced this pull request May 29, 2019

Add back LocalCluster.__repr__. (dask#2732)
LocalCluster.__repr__ was removed in dask#2675.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.