# Dask jobqueue example

## What is Dask jobqueue? (<https://jobqueue.dask.org/>)

* deploys Dask workers on typical HPC job queueing systems

## Monte-Carlo estimate with multiple Dask batch job workers

We define a Dask jobqueue cluster with Dask workers that each have 8 CPUs and 48 GB of memory.

In [1]:
import dask, dask.distributed
import dask_jobqueue

In [2]:
cluster = dask_jobqueue.SLURMCluster(

    # Dask worker size
    cores=8, memory='48GB',
    processes=1, # Dask workers per job
    
    # SLURM job script things
    queue='cluster', walltime='00:15:00',
    
    # Dask worker network and temporary storage
    interface='ib0', local_directory='$TMPDIR'
)

client = dask.distributed.Client(cluster)
cluster.scale(jobs=1)

In [5]:
client

0,1
Client  Scheduler: tcp://172.18.4.11:37357  Dashboard: http://172.18.4.11:8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


### Let's scale up the cluster

In [6]:
cluster.scale(jobs=8)

In [11]:
client

0,1
Client  Scheduler: tcp://172.18.4.11:37357  Dashboard: http://172.18.4.11:8787/status,Cluster  Workers: 8  Cores: 64  Memory: 384.00 GB


### From here everything is the same as with LocalCluster

In [12]:
import numpy, dask.array

def calculate_pi(size_in_bytes, number_of_chunks):
    
    """Calculate pi using a Monte Carlo method."""
    
    array_shape = (int(size_in_bytes / 8 / 2), 2)
    chunk_size = (int(array_shape[0] / number_of_chunks), 2)
    
    # 2D random positions array using dask.array
    xy = dask.array.random.uniform(
        low=0.0, high=1.0, size=array_shape,
        # specify chunk size, i.e. task number
        chunks=chunk_size )
  
    xy_inside_circle = (xy ** 2).sum(axis=1) < 1 # boolean

    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size
    
    # start Dask calculation
    pi = pi.compute()

    print(f"\nfrom {xy.nbytes / 1e9} GB randomly chosen positions")
    print(f"   pi estimate: {pi}")
    print(f"   pi error: {abs(pi - numpy.pi)}\n")
    display(xy)
    
    return pi

### Let's calculate again...

In [13]:
%time pi = calculate_pi(size_in_bytes=10_000_000_000, number_of_chunks=100) # 10 GB


from 10.0 GB randomly chosen positions
   pi estimate: 3.1415877376
   pi error: 4.915989793019548e-06



Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 10.00 GB 100.00 MB Shape (625000000, 2) (6250000, 2) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",2  625000000,

Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


CPU times: user 296 ms, sys: 27.1 ms, total: 323 ms
Wall time: 905 ms


In [14]:
%time pi = calculate_pi(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB


from 100.0 GB randomly chosen positions
   pi estimate: 3.14158325824
   pi error: 9.395349793273056e-06



Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 100.00 GB 400.00 MB Shape (6250000000, 2) (25000000, 2) Count 250 Tasks 250 Chunks Type float64 numpy.ndarray",2  6250000000,

Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray


CPU times: user 1.32 s, sys: 74.8 ms, total: 1.39 s
Wall time: 5.54 s


In [15]:
# %time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=2_000) # 1 TB

### And we can scale up the cluster whenever needed

In [16]:
cluster.scale(jobs=32)

In [17]:
client

0,1
Client  Scheduler: tcp://172.18.4.11:37357  Dashboard: http://172.18.4.11:8787/status,Cluster  Workers: 8  Cores: 64  Memory: 384.00 GB


### Let's calculate again...

In [18]:
%time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=1_000) # 1 TB


from 1000.0 GB randomly chosen positions
   pi estimate: 3.141592672512
   pi error: 1.8922206912463935e-08



Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,1000.00 MB
Shape,"(62500000000, 2)","(62500000, 2)"
Count,1000 Tasks,1000 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1000.00 GB 1000.00 MB Shape (62500000000, 2) (62500000, 2) Count 1000 Tasks 1000 Chunks Type float64 numpy.ndarray",2  62500000000,

Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,1000.00 MB
Shape,"(62500000000, 2)","(62500000, 2)"
Count,1000 Tasks,1000 Chunks
Type,float64,numpy.ndarray


CPU times: user 8.15 s, sys: 267 ms, total: 8.41 s
Wall time: 19.4 s


### Let's scale adaptively again

Dask jobqueue is able to scale total worker number based on problem size. You can also specify a target duration.

In [19]:
from time import sleep

In [20]:
ca = cluster.adapt(
    minimum=2, maximum=32,
    target_duration="160s",  # measured in CPU time per worker
                             # -> 20 seconds at 8 cores / worker
    wait_count=5  # scale down less agressively
);

sleep(10)  # Allow for scale-down

In [21]:
%time pi = calculate_pi(size_in_bytes=10_000_000_000, number_of_chunks=100) # 10 GB


from 10.0 GB randomly chosen positions
   pi estimate: 3.1416929536
   pi error: 0.00010030001020666646



Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 10.00 GB 100.00 MB Shape (625000000, 2) (6250000, 2) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",2  625000000,

Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


CPU times: user 897 ms, sys: 92.3 ms, total: 990 ms
Wall time: 2.06 s


In [22]:
%time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=1_000) # 1 TB


from 1000.0 GB randomly chosen positions
   pi estimate: 3.14160074656
   pi error: 8.092970206874384e-06



Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,1000.00 MB
Shape,"(62500000000, 2)","(62500000, 2)"
Count,1000 Tasks,1000 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1000.00 GB 1000.00 MB Shape (62500000000, 2) (62500000, 2) Count 1000 Tasks 1000 Chunks Type float64 numpy.ndarray",2  62500000000,

Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,1000.00 MB
Shape,"(62500000000, 2)","(62500000, 2)"
Count,1000 Tasks,1000 Chunks
Type,float64,numpy.ndarray


CPU times: user 21.1 s, sys: 806 ms, total: 21.9 s
Wall time: 52.6 s


In [24]:
%time pi = calculate_pi(size_in_bytes=3_000_000_000_000, number_of_chunks=3_000) # 3 TB

KeyboardInterrupt: 

distributed.core - ERROR - Exception while handling op heartbeat_worker
Traceback (most recent call last):
  File "/gxfs_home/geomar/smomw122/miniconda3/envs/dask_jobqueue_workshop/lib/python3.8/site-packages/distributed/core.py", line 493, in handle_comm
    result = handler(comm, **msg)
  File "/gxfs_home/geomar/smomw122/miniconda3/envs/dask_jobqueue_workshop/lib/python3.8/site-packages/distributed/scheduler.py", line 2196, in heartbeat_worker
    ws._executing = {
  File "/gxfs_home/geomar/smomw122/miniconda3/envs/dask_jobqueue_workshop/lib/python3.8/site-packages/distributed/scheduler.py", line 2197, in <dictcomp>
    self.tasks[key]: duration for key, duration in executing.items()
KeyError: "('sum-aggregate-uniform-sum-bc7d966be595b6966d4096ce842034fb', 2333)"
distributed.core - ERROR - Exception while handling op heartbeat_worker
Traceback (most recent call last):
  File "/gxfs_home/geomar/smomw122/miniconda3/envs/dask_jobqueue_workshop/lib/python3.8/site-packages/distributed/cor