In [1]:
import numpy as np
from dask import delayed
from dask.distributed import Client
import time

### Initiate a Dask Client

In [2]:
client = Client(processes=False, n_workers=4)



### Call the client object to get the address. 
- Once you have the address (e.g., http://127.0.0.1:8787/status), click on the dask icon on the left-hand side of the jupyter-lab interface.
- Copy and paste the address into the field at the top. Press enter.
- Select which displays you would like to access. A few good ones to start with include:
    - Graph
    - Memory Use
    - Processing Tasks
    - Profile
    - Profile Server
    - Progress
    - Task Stream
    - Workers

A default layout can also be specified by going to the File menu and choosing "Launch Dask Dashboard Layout". See more here: https://github.com/dask/dask-labextension?tab=readme-ov-file#configuring-a-default-layout

In [3]:
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://129.112.108.61:8787/status,

0,1
Dashboard: http://129.112.108.61:8787/status,Workers: 4
Total threads: 16,Total memory: 48.00 GiB
Status: running,Using processes: False

0,1
Comm: inproc://129.112.108.61/36739/1,Workers: 0
Dashboard: http://129.112.108.61:8787/status,Total threads: 0
Started: 1 minute ago,Total memory: 0 B

0,1
Comm: inproc://129.112.108.61/36739/4,Total threads: 4
Dashboard: http://129.112.108.61:52236/status,Memory: 12.00 GiB
Nanny: None,
Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-xos9v2v_,Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-xos9v2v_

0,1
Comm: inproc://129.112.108.61/36739/6,Total threads: 4
Dashboard: http://129.112.108.61:52237/status,Memory: 12.00 GiB
Nanny: None,
Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-xqjfszrm,Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-xqjfszrm

0,1
Comm: inproc://129.112.108.61/36739/8,Total threads: 4
Dashboard: http://129.112.108.61:52238/status,Memory: 12.00 GiB
Nanny: None,
Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-00etg6u5,Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-00etg6u5

0,1
Comm: inproc://129.112.108.61/36739/10,Total threads: 4
Dashboard: http://129.112.108.61:52239/status,Memory: 12.00 GiB
Nanny: None,
Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-22s43e13,Local directory: /var/folders/8v/nddcvmn12wl6rr_v0cwvvjvr0000gr/T/dask-scratch-space/worker-22s43e13


### Standard Python Execution

In [4]:
%%timeit

# Start the clock
start_time = time.time()

# Two large arrays (e.g. parts of a dataset)
A, B = np.random.random((10000, 10000)), np.random.random((10000, 10000))

# Sum each array.
sumA, sumB = np.sum(A), np.sum(B)

# Add the two sums.
total = np.sum((sumA, sumB))

execution_duration = time.time() - start_time

print(f"The result {total} took {execution_duration} seconds to run")

The result 99991444.53529353 took 0.4961991310119629 seconds to run
The result 100001217.07773495 took 0.4423820972442627 seconds to run
The result 100003998.22906311 took 0.4575977325439453 seconds to run
The result 99994985.10943103 took 0.44934606552124023 seconds to run
The result 99999219.26763572 took 0.4676659107208252 seconds to run
The result 99999586.51242806 took 0.45572900772094727 seconds to run
The result 99999377.05793613 took 0.45090723037719727 seconds to run
The result 99996932.02769473 took 0.45115208625793457 seconds to run
454 ms ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
%%timeit

# Start the clock
start_time = time.time()

# Two large arrays (e.g. parts of a dataset)
A, B = np.random.random((10000, 10000)), np.random.random((10000, 10000))

# Create delayed tasks for summing each array
sumA, sumB = delayed(np.sum)(A), delayed(np.sum)(B)

# Create another task to add the two sums
total = delayed(lambda x, y: x + y)(sumA, sumB)

result = total.compute()  # triggers parallel execution of the task graph

execution_duration = time.time() - start_time
print(f"The result {result} took {execution_duration} seconds to run")

This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.


The result 99998356.61858316 took 0.45119500160217285 seconds to run
The result 100001839.79809268 took 0.44001102447509766 seconds to run
The result 100005141.50930907 took 0.4363100528717041 seconds to run
The result 100005936.97197047 took 0.4694211483001709 seconds to run
The result 99991019.08537178 took 0.4346439838409424 seconds to run
The result 99996749.82789762 took 0.46250295639038086 seconds to run
The result 100001095.82207704 took 3250.425938129425 seconds to run


This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.


The result 100009271.94983488 took 649.9306468963623 seconds to run
The slowest run took 7475.45 times longer than the fastest. This could mean that an intermediate result is being cached.
9min 17s ± 18min 41s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
client.shutdown()