## Xarray engine: chunks on a Dask cluster

First, set up a default Dask cluster.

In [None]:
from dask.distributed import LocalCluster
cluster = LocalCluster()
client = cluster.get_client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 5
Total threads: 10,Total memory: 16.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:53324,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:53342,Total threads: 2
Dashboard: http://127.0.0.1:53347/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:53327,
Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-hvxk1kzq,Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-hvxk1kzq

0,1
Comm: tcp://127.0.0.1:53343,Total threads: 2
Dashboard: http://127.0.0.1:53346/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:53329,
Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-mzvdy60d,Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-mzvdy60d

0,1
Comm: tcp://127.0.0.1:53339,Total threads: 2
Dashboard: http://127.0.0.1:53345/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:53331,
Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-hi1u3d9o,Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-hi1u3d9o

0,1
Comm: tcp://127.0.0.1:53341,Total threads: 2
Dashboard: http://127.0.0.1:53348/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:53333,
Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-54nhcix0,Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-54nhcix0

0,1
Comm: tcp://127.0.0.1:53340,Total threads: 2
Dashboard: http://127.0.0.1:53344/status,Memory: 3.20 GiB
Nanny: tcp://127.0.0.1:53335,
Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-i57w1o2j,Local directory: /var/folders/fj/lrpq2ljj3js9t8d_b24cv_zc0000gn/T/dask-scratch-space/worker-i57w1o2j


Next, we get 2m temperature data for a whole year on a low resolution regular latitude-longitude grid. It contains 2 fields per day (at 0 and 6 UTC). This data obviously fits into memory, so only used for demonstration purposes.

In [None]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "t2_1_year_hourly.grib")
len(ds_fl)

                                                                                                                                                                                                                      

732

Now, we convert the GRIB Fieldlist to Xarray using the chunk size of 10 fields.

In [None]:
ds = ds_fl.to_xarray(time_dim_mode="valid_time", 
                     chunks={"valid_time": 10}, 
                     add_earthkit_attrs=False)
ds["2t"]

Unnamed: 0,Array,Chunk
Bytes,1.74 MiB,24.38 kiB
Shape,"(732, 13, 24)","(10, 13, 24)"
Dask graph,74 chunks in 2 graph layers,74 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.74 MiB 24.38 kiB Shape (732, 13, 24) (10, 13, 24) Dask graph 74 chunks in 2 graph layers Data type float64 numpy.ndarray",24  13  732,

Unnamed: 0,Array,Chunk
Bytes,1.74 MiB,24.38 kiB
Shape,"(732, 13, 24)","(10, 13, 24)"
Dask graph,74 chunks in 2 graph layers,74 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Finally, we compute the mean along the temporal dimension. Xarray will load data in chunks for this computation keeping the memory usage low.

In [None]:
m = ds["2t"].mean(dim="valid_time").load()
m

In the Dask dashboard you should see something like this:

![dask-dashboard.png](attachment:1c016fb3-c120-4452-b587-9c17c3506dfd.png)