# 3.1. A Cheyenne Example

This example is meant to use one of the RDA datasets available on Cheyenne's GLADE storage system.

In [None]:
import dask
import time

## How to Set Up Your Dask Distributed Cluster on Cheyenne

The easiest way to start a cluster on Cheyenne is using the `dask.distributed` `Client` object and a `PBSCluster` object from the `dask-jobqueue` package.

In [None]:
from dask.distributed import Client
from dask_jobqueue import PBSCluster

cluster = PBSCluster(queue='regular', project='NIOW0001', processes=6, cores=36, memory='64GB', walltime='01:00:00')
client = Client(cluster)

### PBSCluster

The `PBSCluster` object defines what a "single worker" looks like on a PBS-based job queuing system, like the one on NCAR's Cheyenne.  Each "worker" is submitted to the PBS queuing system separately, as its own job.  So, the arguments of the `PBSCluster` object define what is requested for each "worker" via the PBS queuing system:

  - `queue`: The job queue to worker job is submitted to.
  - `project`: The project ID code for your allocation on Cheyenne.
  - `processes`: The number of separate Python processes associated with a single worker job.
  - `cores`: The number of cores given to a single worker.
  - `memory`: The memory available to a single worker.
  - `walltime`: The walltime to give to the worker job.
  
The setup above initializes a cluster with 0 nodes.  No jobs are submitted to the PBS queue, yet.  However, once we submit jobs to the PBS queue (with the `scale` command), each job will contain 6 processes (or workers) and 6 cores-per-process (for 36 cores total, or a full Cheyenne node) for 1 hour.

In [None]:
cluster

### Dashboard

In the above output (of the `cluster` object), we see the number of workers and the total number of cores in the cluster.  Currently, we see 0 workers (and 0 cores, obviously).  

We also see a pointer to the Dashboard.  Unfortunately, because we are accessing this cluster via an SSH tunnel, we cannot click this link and be taken to the Dashboard directly.  Instead, we have to note the Dashboard port number (probably 8787) and create *another* SSH tunnel to redirect that port to our browser.

In a termal, do that now.  It should look exactly like the other SSH tunnel command, but with the port numbers changed to match the port of the above link.  Then, open up another browser tab with *localhost:port*.

### Scaling up your cluster

Now that we have the cluster, and we have the SSH tunnel open to the dashboard, we can request some workers using the `scale` command of the cluster.

In [None]:
cluster.scale(4)

If you watch the `PBSCluster` box above, you will eventually notice the number of worker rise to 24 (and the number of cores rise to 144).  By submitted 4 PBS jobs to the queue, we eventually launched 24 Dask workers, with each worker having 6 cores.

## Example: 