# Dask jobqueue example

## What is Dask jobqueue? (<https://jobqueue.dask.org/>)

* deploys Dask workers on typical HPC job queueing systems

## Monte-Carlo estimate with multiple Dask batch job workers

We define a Dask jobqueue cluster with Dask workers that each have 4 CPUs and 24 GB of memory.

In [1]:
from time import sleep

import dask, dask.distributed
import dask_jobqueue

In [2]:
cluster = dask_jobqueue.SLURMCluster(

    # Dask worker size
    cores=4, memory='24GB',
    processes=1, # Dask workers per job
    
    # SLURM job script things
    queue='cluster', walltime='00:15:00',
    
    # Dask worker network and temporary storage
    interface='ib0', local_directory='$TMPDIR'
)

client = dask.distributed.Client(cluster)
cluster.scale(jobs=1)

In [4]:
client

0,1
Client  Scheduler: tcp://172.18.4.100:35197  Dashboard: http://172.18.4.100:8787/status,Cluster  Workers: 1  Cores: 4  Memory: 24.00 GB


### Let's scale up the cluster

In [5]:
cluster.scale(jobs=8)

In [7]:
client

0,1
Client  Scheduler: tcp://172.18.4.100:35197  Dashboard: http://172.18.4.100:8787/status,Cluster  Workers: 8  Cores: 32  Memory: 192.00 GB


### From here everything is the same as with LocalCluster

In [8]:
import numpy, dask.array

def calculate_pi(size_in_bytes, number_of_chunks):
    
    """Calculate pi using a Monte Carlo method."""
    
    array_shape = (int(size_in_bytes / 8 / 2), 2)
    chunk_size = (int(array_shape[0] / number_of_chunks), 2)
    
    # 2D random positions array using dask.array
    xy = dask.array.random.uniform(
        low=0.0, high=1.0, size=array_shape,
        # specify chunk size, i.e. task number
        chunks=chunk_size )
  
    xy_inside_circle = (xy ** 2).sum(axis=1) < 1 # boolean

    pi = 4 * xy_inside_circle.sum() / xy_inside_circle.size
    
    # start Dask calculation
    pi = pi.compute()

    print(f"\nfrom {xy.nbytes / 1e9} GB randomly chosen positions")
    print(f"   pi estimate: {pi}")
    print(f"   pi error: {abs(pi - numpy.pi)}\n")
    display(xy)
    
    return pi

### Let's calculate again...

In [9]:
%time pi = calculate_pi(size_in_bytes=10_000_000_000, number_of_chunks=100) # 10 GB


from 10.0 GB randomly chosen positions
   pi estimate: 3.1416051136
   pi error: 1.2460010206716277e-05



Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 10.00 GB 100.00 MB Shape (625000000, 2) (6250000, 2) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",2  625000000,

Unnamed: 0,Array,Chunk
Bytes,10.00 GB,100.00 MB
Shape,"(625000000, 2)","(6250000, 2)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


CPU times: user 439 ms, sys: 30.1 ms, total: 470 ms
Wall time: 1.66 s


In [10]:
%time pi = calculate_pi(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB


from 100.0 GB randomly chosen positions
   pi estimate: 3.14160629632
   pi error: 1.3642730206875342e-05



Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 100.00 GB 400.00 MB Shape (6250000000, 2) (25000000, 2) Count 250 Tasks 250 Chunks Type float64 numpy.ndarray",2  6250000000,

Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray


CPU times: user 1.88 s, sys: 69.7 ms, total: 1.95 s
Wall time: 9.39 s


### And we can scale up the cluster whenever needed

In [12]:
cluster.scale(jobs=16)

In [14]:
client

0,1
Client  Scheduler: tcp://172.18.4.100:35197  Dashboard: http://172.18.4.100:8787/status,Cluster  Workers: 8  Cores: 32  Memory: 192.00 GB


### Let's calculate again...

In [15]:
%time pi = calculate_pi(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB


from 100.0 GB randomly chosen positions
   pi estimate: 3.14156043008
   pi error: 3.2223509792927985e-05



Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 100.00 GB 400.00 MB Shape (6250000000, 2) (25000000, 2) Count 250 Tasks 250 Chunks Type float64 numpy.ndarray",2  6250000000,

Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray


CPU times: user 1.76 s, sys: 64.5 ms, total: 1.82 s
Wall time: 5.09 s


### Let's scale adaptively

Dask jobqueue is able to scale total worker number based on problem size. You can also specify a target duration.

More on Dask's adaptivity [can be found in the docs](https://docs.dask.org/en/latest/setup/adaptive.html).

In [17]:
ca = cluster.adapt(
    minimum=2, maximum=32,
    target_duration="80s",  # measured in CPU time per worker
                             # -> 20 seconds at 4 cores / worker
    wait_count=5  # scale down less agressively
);

sleep(10)  # Allow for scale-down

In [18]:
%time pi = calculate_pi(size_in_bytes=100_000_000_000, number_of_chunks=250) # 100 GB


from 100.0 GB randomly chosen positions
   pi estimate: 3.14159560832
   pi error: 2.9547302067278736e-06



Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 100.00 GB 400.00 MB Shape (6250000000, 2) (25000000, 2) Count 250 Tasks 250 Chunks Type float64 numpy.ndarray",2  6250000000,

Unnamed: 0,Array,Chunk
Bytes,100.00 GB,400.00 MB
Shape,"(6250000000, 2)","(25000000, 2)"
Count,250 Tasks,250 Chunks
Type,float64,numpy.ndarray


CPU times: user 7.07 s, sys: 196 ms, total: 7.27 s
Wall time: 23 s


In [22]:
%time pi = calculate_pi(size_in_bytes=1_000_000_000_000, number_of_chunks=2_000) # 1 TB


from 1000.0 GB randomly chosen positions
   pi estimate: 3.141594421952
   pi error: 1.7683622068886962e-06



Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,500.00 MB
Shape,"(62500000000, 2)","(31250000, 2)"
Count,2000 Tasks,2000 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1000.00 GB 500.00 MB Shape (62500000000, 2) (31250000, 2) Count 2000 Tasks 2000 Chunks Type float64 numpy.ndarray",2  62500000000,

Unnamed: 0,Array,Chunk
Bytes,1000.00 GB,500.00 MB
Shape,"(62500000000, 2)","(31250000, 2)"
Count,2000 Tasks,2000 Chunks
Type,float64,numpy.ndarray


CPU times: user 27.3 s, sys: 692 ms, total: 28 s
Wall time: 30.8 s


In [20]:
sleep(10)

In [21]:
%time pi = calculate_pi(size_in_bytes=3_000_000_000_000, number_of_chunks=5_000) # 3 TB


from 3000.0 GB randomly chosen positions
   pi estimate: 3.141591755136
   pi error: 8.984537931411296e-07



Unnamed: 0,Array,Chunk
Bytes,3.00 TB,600.00 MB
Shape,"(187500000000, 2)","(37500000, 2)"
Count,5000 Tasks,5000 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 3.00 TB 600.00 MB Shape (187500000000, 2) (37500000, 2) Count 5000 Tasks 5000 Chunks Type float64 numpy.ndarray",2  187500000000,

Unnamed: 0,Array,Chunk
Bytes,3.00 TB,600.00 MB
Shape,"(187500000000, 2)","(37500000, 2)"
Count,5000 Tasks,5000 Chunks
Type,float64,numpy.ndarray


CPU times: user 1min 15s, sys: 1.78 s, total: 1min 16s
Wall time: 1min 36s
