## The Dask Distributed Scheduler

There are a few different Dask schedulers, but when starting out on your Dask journey you only need to know about the most powerful and feature-complete one: the **distributed scheduler**. This scheduler offers more features and diagnostics. You can think of the distributed scheduler as an advanced scheduler that also does the basic stuff really well.

The distributed scheduler can be used in a cluster as well as locally. Deploying a remote Dask cluster involves additional setup that you can read more about on the Dask [setup documentation](https://docs.dask.org/en/latest/setup.html). Alternatively, you can use [Coiled](https://docs.coiled.io/user_guide/index.html#what-is-coiled) which provides a cluster-as-a-service functionality to provision hosted Dask clusters on demand, and you can try it for free.  

For now, we will set up the scheduler locally. To set up the distributed scheduler locally we need to create a `Client` object, which will let you interact with the "cluster" (local threads or processes on your machine)

In [None]:
from distributed import Client

client = Client(n_workers=8) #shorthand for creating a 'local cluster' of all your machine's cores
client

## The Dask Dashboard

When we create a distributed scheduler Client, it registers itself as the default Dask scheduler. From now on, all `.compute()` calls will start using the distributed scheduler unless otherwise is specified.

The distributed scheduler has many features that you can learn more about in the Dask distributed documentation but a nice feature to explore is diagnostic the Dashboard. We will be taking a look at the dashboard as we perform computations but for a brief overview of the main components of the dashboard you can check the Dask documentation on diagnosing performance.

If you click on the link of the dashboard on the cell above and run the computation we did before you will see now some action happening on the dashboard.

In [None]:
%%time
ddf.groupby("DayOfWeek")["DepDelay"].mean().compute()

In [None]:
client.close()

## Task Graphs

Let's look at the task graph for our Dask DataFrame to get a sense for where these partitions are coming from:

In [None]:
ddf.visualize()

Each partition in our Dask DataFrame is the result of calling Pandas' `read_csv` on an input CSV file in our dataset.

We can view the start of the data with `df.head()`

In [None]:
ddf.head(10)

`.head()` triggers a computation to show the first 10 rows of the DataFrame.

<img src="https://raw.githubusercontent.com/dask/dask/main/docs/source/images/dask-overview.svg" 
     width="100%"
     alt="Dask overview\" />