# Dask array

In [1]:
from dask.distributed import Client

client = Client(n_workers=3, threads_per_worker = 2, memory_limit='4G')
display(client)

2023-03-08 16:10:17,812 - distributed.diskutils - INFO - Found stale lock file and directory '/home/jana/delavnice/Dask/natebook/dask-worker-space/worker-uvaehki3', purging
2023-03-08 16:10:17,826 - distributed.diskutils - INFO - Found stale lock file and directory '/home/jana/delavnice/Dask/natebook/dask-worker-space/worker-rzpphpp2', purging
2023-03-08 16:10:17,835 - distributed.diskutils - INFO - Found stale lock file and directory '/home/jana/delavnice/Dask/natebook/dask-worker-space/worker-d8cgy4v_', purging


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 3
Total threads: 6,Total memory: 11.18 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:41611,Workers: 3
Dashboard: http://127.0.0.1:8787/status,Total threads: 6
Started: Just now,Total memory: 11.18 GiB

0,1
Comm: tcp://127.0.0.1:46759,Total threads: 2
Dashboard: http://127.0.0.1:38925/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:42419,
Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-akf4yhz6,Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-akf4yhz6

0,1
Comm: tcp://127.0.0.1:36655,Total threads: 2
Dashboard: http://127.0.0.1:37769/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:45059,
Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-xa5mfpa0,Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-xa5mfpa0

0,1
Comm: tcp://127.0.0.1:41625,Total threads: 2
Dashboard: http://127.0.0.1:40611/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:42839,
Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-z3a3w9ya,Local directory: /home/jana/delavnice/Dask/natebook/dask-worker-space/worker-z3a3w9ya


In [20]:
client.shutdown()

2023-03-09 13:46:58,811 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client


Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays. This lets us compute on arrays larger than memory using all of our cores.

A dask array consists of multiple NumPy arrays arranged in a grid.

In [2]:
import numpy as np

In [3]:
x = np.random.rand(100000, 1000)

We want to calculate the sum of all the elements.

In [4]:
x.sum()

49997900.363303356

We can split the computation in chunks and calculate the sum of each chunk and then calculate the final sum.

In [5]:
sums = []
for i in range(0, 100000, 1000):
    chunk = x[i: i + 1000]  # pull out numpy array
    sums.append(chunk.sum())

total = sum(sums)
print(total)

49997900.363303505


This is what dask array does. It slipts the array into smaller ones and makes calculations on each one separately.

Let's try to create a really large array with dask array.

In [6]:
import dask.array as da
x = da.random.normal(10, 0.5, size=(30_000, 50_000), chunks=(3000, 5000))
x

Unnamed: 0,Array,Chunk
Bytes,11.18 GiB,114.44 MiB
Shape,"(30000, 50000)","(3000, 5000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 11.18 GiB 114.44 MiB Shape (30000, 50000) (3000, 5000) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",50000  30000,

Unnamed: 0,Array,Chunk
Bytes,11.18 GiB,114.44 MiB
Shape,"(30000, 50000)","(3000, 5000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


In [7]:
sx = x.mean(axis=0)
sx

Unnamed: 0,Array,Chunk
Bytes,390.62 kiB,39.06 kiB
Shape,"(50000,)","(5000,)"
Count,240 Tasks,10 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 390.62 kiB 39.06 kiB Shape (50000,) (5000,) Count 240 Tasks 10 Chunks Type float64 numpy.ndarray",50000  1,

Unnamed: 0,Array,Chunk
Bytes,390.62 kiB,39.06 kiB
Shape,"(50000,)","(5000,)"
Count,240 Tasks,10 Chunks
Type,float64,numpy.ndarray


In [8]:
sx.compute()

array([ 9.99628186, 10.00304643,  9.99928851, ..., 10.00223246,
        9.9995785 ,  9.99940218])

In [9]:
sx.size

50000

It is difficult to read this much data with numpy.

In [10]:
%%time
xnp = np.random.normal(10, 0.1, size=(30_000, 50_000))

MemoryError: Unable to allocate 11.2 GiB for an array with shape (30000, 50000) and data type float64

## A word about chunk sizes

If your data fits comfortably in RAM and you are not performance bound, then using NumPy might be the right choice. Dask adds another layer of complexity which may get in the way.

A common performance problem among Dask Array users is that they have chosen a chunk size that is either 
- too small which leads to lots of overhead.
- too big. Chunks that are too large are bad because then you are likely to run out of working memory
- poorly aligned with their data (leading to inefficient reading). If your Dask array chunks aren’t multiples of array storage formats chunk shapes then you will have to read the same data repeatedly, which can be expensive. 

You want to choose a chunk size that is large in order to reduce the number of chunks that Dask has to think about (which affects overhead) but also small enough so that many of them can fit in memory at once. Dask will often have as many chunks in memory as twice the number of active threads.

For performance, a good choice of chunks follows the following rules:

1. A chunk should be small enough to fit comfortably in memory. We’ll have many chunks in memory at once
2. A chunk must be large enough so that computations on that chunk take significantly longer than the 1ms overhead per task that Dask scheduling incurs. A task should take longer than 100ms
3. Chunk sizes between 10MB-1GB are common, depending on the availability of RAM and the duration of computations
4. Chunks should align with the computation that you want to do.
5. For example, if you plan to frequently slice along a particular dimension, then it’s more efficient if your chunks are aligned so that you have to touch fewer chunks. If you want to add two arrays, then its convenient if those arrays have matching chunks patterns
6. Chunks should align with your storage, if applicable.

Array data formats are often chunked as well. When loading or saving data, it is useful to have Dask array chunks that are aligned with the chunking of your storage, often an even multiple times larger in each direction.

Source: https://docs.dask.org/en/stable/array-chunks.html

It is advisable to use the shape of chunks that most suits your computations.

In [11]:
z = da.random.normal(5, 0.1, size=(30000, 5000), chunks=(3000, 500))

This array is chunked row-wise. What if we want it to be chunked column-wise:

In [12]:
z = z.rechunk((500, 3000))

We can use the rechink function rechunk function.

In [13]:
z

Unnamed: 0,Array,Chunk
Bytes,1.12 GiB,11.44 MiB
Shape,"(30000, 5000)","(500, 3000)"
Count,820 Tasks,120 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.12 GiB 11.44 MiB Shape (30000, 5000) (500, 3000) Count 820 Tasks 120 Chunks Type float64 numpy.ndarray",5000  30000,

Unnamed: 0,Array,Chunk
Bytes,1.12 GiB,11.44 MiB
Shape,"(30000, 5000)","(500, 3000)"
Count,820 Tasks,120 Chunks
Type,float64,numpy.ndarray


You can read directly files that contain already chunked arrays like HDF5, NetCDF and Zarr. You can read all these files into a Dask array. It is advisable to align the chunk sizes with the chunks that are already available. 

### Special types of arrays

How to create a sparse array.

In [14]:
x = da.random.random((100000, 100000), chunks=(1000, 1000))
x[x < 0.95] = 0

We map each dask array chunk which is actually a NumPy array into a sparse.COO array:

In [15]:
import sparse
s = x.map_blocks(sparse.COO)

Or you can do the other way around: Create a sparse array and transform it to a Dask array.

In [16]:
x = sparse.COO({(10000, 10000): 1})
x = da.from_array(x, chunks=(1000, 1000), asarray=False)

How to create masked arrays:

In [17]:
import numpy as np
data = np.arange(6).reshape((2, 3)) 
x = da.from_array(data, chunks=(1, 1))
data
da.ma.masked_array(x, mask=[[False, True, False],  
                              [False, False, True]])

Unnamed: 0,Array,Chunk
Bytes,48 B,8 B
Shape,"(2, 3)","(1, 1)"
Count,25 Tasks,6 Chunks
Type,int64,numpy.ma.core.MaskedArray
"Array Chunk Bytes 48 B 8 B Shape (2, 3) (1, 1) Count 25 Tasks 6 Chunks Type int64 numpy.ma.core.MaskedArray",3  2,

Unnamed: 0,Array,Chunk
Bytes,48 B,8 B
Shape,"(2, 3)","(1, 1)"
Count,25 Tasks,6 Chunks
Type,int64,numpy.ma.core.MaskedArray


There is also a matrix inverse implemented in Dask. This is very computationally complex.

In [18]:
import dask.array as da
x = da.random.random((10000, 10000), chunks=(5000, 5000))
x

Unnamed: 0,Array,Chunk
Bytes,762.94 MiB,190.73 MiB
Shape,"(10000, 10000)","(5000, 5000)"
Count,4 Tasks,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 762.94 MiB 190.73 MiB Shape (10000, 10000) (5000, 5000) Count 4 Tasks 4 Chunks Type float64 numpy.ndarray",10000  10000,

Unnamed: 0,Array,Chunk
Bytes,762.94 MiB,190.73 MiB
Shape,"(10000, 10000)","(5000, 5000)"
Count,4 Tasks,4 Chunks
Type,float64,numpy.ndarray


In [19]:
%%time
da.linalg.inv(x).compute()

CPU times: user 2.87 s, sys: 1.84 s, total: 4.71 s
Wall time: 1min 30s


array([[ 0.77756553,  0.67054084, -0.68808214, ...,  1.09463085,
        -1.04722772,  1.61512682],
       [ 0.10216635,  0.05352405, -0.07980771, ...,  0.12926145,
        -0.09719899,  0.15043324],
       [-0.52394971, -0.4627729 ,  0.47115203, ..., -0.75389514,
         0.67288331, -1.0645436 ],
       ...,
       [-0.17712045, -0.05484573,  0.12438653, ..., -0.19170338,
         0.18708789, -0.26470933],
       [ 0.43419134,  0.28090451, -0.33669136, ...,  0.56396854,
        -0.52423932,  0.80042559],
       [ 0.37586958,  0.33234414, -0.34441725, ...,  0.51196692,
        -0.49626775,  0.81207585]])

**Exercise:**
Create a random matrix sizes 50000 * 50000 and add the matrix to its transpose. Try different number of chunks. What happens?

## XArray

Xarray is a Python package that extends the labeled data functionality of Pandas to N-dimensional array-like datasets. It shares a similar API to NumPy and Pandas and supports both Dask and NumPy arrays under the hood.

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.  NumPy provides the fundamental data structure and API for working with raw N-demensional arrays. However, real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc.

This data model is borrowed from the netCDF file format, which also provides xarray with a natural and portable serialization format. NetCDF is very popular in the geosciences, and there are existing libraries for reading and writing netCDF in many programming languages, including Python.

Xarray integrates with Dask to support parallel computations and streaming computation on datasets that don’t fit into memory.

In [2]:
import xarray as xr

In [12]:
ds = xr.open_dataset('../output.nc', chunks={"time": 10})

In [13]:
ds

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 16 B 16 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 4 B 4 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type int8 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 4 B 4 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type int8 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,4 B,4 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (4,) (4,) Count 2 Tasks 1 Chunks Type object numpy.ndarray",4  1,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(4,)","(4,)"
Count,2 Tasks,1 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,119.77 kiB,1.64 kiB
Shape,"(730, 21)","(10, 21)"
Count,74 Tasks,73 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 119.77 kiB 1.64 kiB Shape (730, 21) (10, 21) Count 74 Tasks 73 Chunks Type datetime64[ns] numpy.ndarray",21  730,

Unnamed: 0,Array,Chunk
Bytes,119.77 kiB,1.64 kiB
Shape,"(730, 21)","(10, 21)"
Count,74 Tasks,73 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 11.93 MiB 167.34 kiB Shape (4, 51, 730, 21, 1) (4, 51, 10, 21, 1) Count 74 Tasks 73 Chunks Type float32 numpy.ndarray",51  4  1  21  730,

Unnamed: 0,Array,Chunk
Bytes,11.93 MiB,167.34 kiB
Shape,"(4, 51, 730, 21, 1)","(4, 51, 10, 21, 1)"
Count,74 Tasks,73 Chunks
Type,float32,numpy.ndarray


In [22]:
ds.attrs

{'Conventions': 'CF-1.7',
 'GRIB_centre': 'ecmf',
 'GRIB_centreDescription': 'European Centre for Medium-Range Weather Forecasts',
 'GRIB_edition': 1,
 'GRIB_subCentre': 0,
 'history': '2022-07-05T05:36 GRIB to CDM+CF via cfgrib-0.9.10.1/ecCodes-2.24.2 with {"source": "N/A", "filter_by_keys": {}, "encode_cf": ["parameter", "time", "geography", "vertical"]}\nGrid point values extracted with xarray by Jonathan Demaeyer, August 2022',
 'institution': 'European Centre for Medium-Range Weather Forecasts',
 'land usage history': 'Retrieved from https://land.copernicus.eu/pan-european/corine-land-cover, July 2022',
 'land usage legend': "{1: {'label': '111 - Continuous urban fabric', 'numeric_label': 111, 'color': '#e6004d'}, 2: {'label': '112 - Discontinuous urban fabric', 'numeric_label': 112, 'color': '#ff0000'}, 3: {'label': '121 - Industrial or commercial units', 'numeric_label': 121, 'color': '#cc4df2'}, 4: {'label': '122 - Road and rail networks and associated land', 'numeric_label': 1

In [27]:
ds.t2m.values

array([[[[[274.17676],
          [268.31665],
          [276.7556 ],
          ...,
          [270.5235 ],
          [262.7539 ],
          [259.0979 ]],

         [[274.3474 ],
          [268.93628],
          [275.40686],
          ...,
          [267.82336],
          [260.63312],
          [254.68904]],

         [[270.1577 ],
          [269.92813],
          [272.27374],
          ...,
          [269.25146],
          [268.19293],
          [269.4193 ]],

         ...,

         [[273.49902],
          [270.89258],
          [274.65063],
          ...,
          [272.95117],
          [270.00525],
          [267.97906]],

         [[273.8108 ],
          [273.79175],
          [275.0725 ],
          ...,
          [270.62988],
          [266.7162 ],
          [268.35437]],

         [[274.29858],
          [273.7859 ],
          [276.8435 ],
          ...,
          [271.6775 ],
          [269.70718],
          [270.52444]]],


        [[[274.80713],
          [268.0027 ],
       

As is usuall with Dask this is a lazy Dask structure, of we want to convert it to a Numpy array, we call load.

In [29]:
ds.t2m.time

In [34]:
ds.station_id

In [51]:
ds.t2m.station_id

Selecting just one station:

In [70]:
d2= ds.sel(station_id = 11105)
d2

In [68]:
d2.t2m.mean('time')