# Dask arrays

Let's explore Dask arrays!

In [3]:
import numpy as np
import dask.array as da

Let's create a random dask array, do some operation and visualize it:

In [None]:
x = da.random.random((2000, 2000), chunks=(500, 500))
x

In [None]:
x = da.random.random((8000, 8000))
x

In [None]:
y = x.mean()

In [None]:
y.visualize(optimize_graph=True, color='order',
            cmap='autumn', node_attr={'penwidth': '2'})

Let's increase the size of the array and compute the operation.

In [None]:
%%time
N = 20000
x = da.random.random((N, N))
y = x.mean()
y.compute()

Let's compare the results with NumPy:

In [None]:
%%time
N = 20000
rng = np.random.default_rng()
x = rng.random((N, N))
x.mean()

<mark>**Question**</mark>: Try the two cells above with `N = 200`. Which one is faster, the numpy version or the dask one?

***

Let's consider now the operation `x @ x`. <mark>**Question**</mark>: Could you explain the results of the timings?

In [None]:
%%time
N = 6000
x = rng.random((N, N))
y = x @ x

In [10]:
def mymatmul(x):
    n = x.shape[0]
    y = np.zeros((n,n))
    for i in range(n):
        for j in range(n):
            for k in range(n):
                y[i,j] += x[i,k] * x[k,j]
    return y

In [16]:
N = 16
x = da.random.random((N, N), chunks=(8,8))

In [17]:
x

Unnamed: 0,Array,Chunk
Bytes,2.00 kiB,512 B
Shape,"(16, 16)","(8, 8)"
Dask graph,4 chunks in 1 graph layer,4 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.00 kiB 512 B Shape (16, 16) (8, 8) Dask graph 4 chunks in 1 graph layer Data type float64 numpy.ndarray",16  16,

Unnamed: 0,Array,Chunk
Bytes,2.00 kiB,512 B
Shape,"(16, 16)","(8, 8)"
Dask graph,4 chunks in 1 graph layer,4 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [18]:
y = mymatmul(x)

In [20]:
y

array([[3.45971351, 3.93183097, 2.44841311, 4.39783654, 3.74905762,
        3.11367075, 2.84798073, 3.15039181, 3.1171389 , 4.93954122,
        3.60262474, 3.56341458, 2.14011444, 3.74810855, 3.36252708,
        4.49506714],
       [3.14250508, 3.39345113, 2.57050982, 4.26655621, 4.09997134,
        2.6342515 , 2.77440157, 2.7274344 , 3.37698007, 5.00995995,
        3.43839319, 4.00521531, 1.93747367, 3.50665681, 3.27457406,
        4.63534217],
       [3.37605491, 3.47356991, 2.88158408, 4.62242446, 4.18020866,
        2.65070218, 3.1511815 , 3.50596416, 3.08662932, 5.57213092,
        4.2384829 , 4.19495988, 2.09884349, 3.57752573, 3.21388126,
        4.52715291],
       [3.78910085, 3.97090268, 3.03276707, 4.63099994, 4.44579019,
        3.82650793, 3.4306939 , 3.06496823, 3.23206483, 5.28174107,
        3.95143913, 4.23689313, 3.0322534 , 4.47778262, 3.18703199,
        4.70253787],
       [3.30694077, 3.31069922, 2.63005192, 4.33302996, 4.07412219,
        3.37104488, 3.36029803, 

In [19]:
y.visualize(rankdir='LR')

AttributeError: 'numpy.ndarray' object has no attribute 'visualize'

In [None]:
%%time
y.compute().shape