Skip to content

Calling Client() twice confuses new users #2186

@mrocklin

Description

@mrocklin

Most Dask notebooks start off with the following three lines:

from dask.distributed import Client
client = Client()
client

Then users click on the dashboard link, and do some work.

Eventually the users runs all the cells in their notebook, rerunning the Client() call. This is a problem because it creates a new LocalCluster (Client with no args creates a LocalCluster). Now we have two clusters running which causes two problems:

  1. The old cluster still takes up modest resources. This isn't that bad, but is slightly inefficient
  2. The new cluster serves the diagnostic dashboard on a random port, so the user's dashboard appears to be non-responsive.

There are a few potential solutions to this problem:

  1. We could change the policy of Client() with no arguments from "Create a LocalCluster" to "Use a pre-existing cluster if present, otherwise make one". However this becomes complex if the keyword arguments passed down to LocalCluster change, like changing Client(processes=False) to Client(processes=True). In this case we might consider closing the old LocalCluster before starting the new one so that we can claim the correct port
  2. We might consider erring if we end up choosing a random bokeh port if the user didn't explicitly request this.
  3. We might consider erring hard when calling Client() twice, encouraging the user to call client.cluster.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions