## 3. Use cases: Uniform Zarr access across formats

In this Notebook, we modify the server script from #1 in order to enable

**Kerchunk input data**. Afterwards, we will introduce and work with the `/kerchunk` API which is a leight-weight Zarr API which hosts raw chunks without server processing.

As an example, we will host a full time series of a ERA5 Dataset to show that cloudify simplifes data access on top of kerchunk:

- 1. clients see **native zarr** and are therefore relieved from the requirements
    - to have access to kerchunk tables 
    - to parse the tables, e.g. understand *parquet* files and reserve memory
    - to retrieve data from storage backends with specific software
- 2. providers can leverage server-side processing and optimize the dataset on the fly
    - adapt precision (64 to 32)
    - regrid

### Server script

**Adaptions to Notebook #1**:

- we open data through the so called *lazy reference* mapper with
    ```python
    fsspec.get_mapper(
        lazy=True,
        )
    ```
    which we pass to xarray afterwards. This only works for kerchunked input data.

**New introduced features**:

- we add a *dict* of fspec mappern to the kerchunk plugin by setting `kp.mapper_dict` 
- we convert float64 data variables to float32
- **Remapping** for the `/zarr` plugin: we remap the data to a regular grid using a linear interpolation across longitudes. For that, `unstack` and `interp` are applied lazily to all data variables for the dask-backed API

In [1]:
%%writefile xpublish_references.py

#ssl_keyfile="/work/bm0021/k204210/cloudify/workshop/key.pem"
#ssl_certfile="/work/bm0021/k204210/cloudify/workshop/cert.pem"

from cloudify.plugins.stacer import *
from cloudify.utils.daskhelper import *
from cloudify.plugins.kerchunk import *
import xarray as xr
import xpublish as xp
import asyncio
import nest_asyncio
import sys
import os
import socket   
from contextlib import closing
nest_asyncio.apply()


def unstack(ds):
    onlydimname = [a for a in ds.dims if a not in ["time", "level", "lev"]]
    if not "lat" in ds.coords:
        if not "latitude" in ds.coords:
            raise ValueError("No latitude given")
        else:
            ds = ds.rename(latitude="lat")
    if not "lon" in ds.coords:
        if not "longitude" in ds.coords:
            raise ValueError("No longitude given")
        else:
            ds = ds.rename(longitude="lon")
    if len(onlydimname) > 1:
        raise ValueError("More than one dim: " + onlydimname)
    onlydimname = onlydimname[0]
    return (
        ds.rename({onlydimname: "latlon"})
        .set_index(latlon=("lat", "lon"))
        .unstack("latlon")
    )


def interp(ds):
    global equator_lons
    return ds.interpolate_na(dim="lon", method="linear", period=360.0).reindex(
        lon=equator_lons
    )


def find_free_port_in_range(start=9000, end=9100):
    for port in range(start, end + 1):
        try:
            with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
                s.bind(('', port))
                s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
            return port
        except:
            continue
    raise RuntimeError("No free port found in the specified range.")
    
if __name__ == "__main__":  # This avoids infinite subprocess creation
    import dask
    zarrcluster = asyncio.get_event_loop().run_until_complete(get_dask_cluster())
    os.environ["ZARR_ADDRESS"]=zarrcluster.scheduler._address
    
    dsname=sys.argv[1]
    testvar=sys.argv[2]
    glob_inp=sys.argv[3]

    chunks = {}
    for coord in ["lon", "lat"]:
        chunk_size = os.environ.get(f"XPUBLISH_{coord.upper()}_CHUNK_SIZE", None)
        if chunk_size:
            chunks[coord] = int(chunk_size)

    chunks["time"] = 1
    
    dsdict={}
    source="reference::/"+glob_inp
    fsmap = fsspec.get_mapper(
        source,
        remote_protocol="file",
        lazy=True,
        cache_size=0
    )
    ds=xr.open_dataset(
        fsmap,
        engine="zarr",
        chunks=chunks,
        consolidated=False
    )
    kp = KerchunkPlugin()
    kp.mapper_dict = {source:fsmap}

    ds=ds[[testvar]]
    dvs = []
    l_el = False
    for dv in ds.variables:
        if "time" in dv:
            ds[dv] = ds[dv].load()
            ds[dv].encoding["dtype"] = "float64"
            ds[dv].encoding["compressor"] = None
    
    for dv in ds.data_vars:
        print(dv)
        ds[dv].encoding["dtype"] = "float32"
        template_unstack = unstack(ds[dv].isel(time=0).load())
        if not l_el:
            equator_lons = template_unstack.sel(lat=0.0, method="nearest").dropna(
                dim="lon"
            )["lon"]
            l_el = True
            latlonchunks = {a: len(template_unstack[a]) for a in template_unstack.dims}
        template_unstack = template_unstack.chunk(**latlonchunks)
        template_interp = interp(template_unstack)
        template_unstack = template_unstack.expand_dims(**{"time": ds["time"]}).chunk(
            time=1
        )
        template_interp = template_interp.expand_dims(**{"time": ds["time"]}).chunk(
            time=1
        )
        unstacked = ds[dv].map_blocks(unstack, template=template_unstack)
        interped = unstacked.map_blocks(interp, template=template_interp)
        dsv = dask.optimize(interped)[0]
        print("optimized")
        del template_unstack, template_interp
        dvs.append(dsv)
    ds = xr.combine_by_coords(dvs)
    print("combined")
    ds=ds.drop_encoding()    
    ds.encoding["source"]=source
    for dv in ds.data_vars:
        ds[dv].encoding["dtype"] = "float32"    
    dsdict[dsname]=ds
    
    collection = xp.Rest(dsdict)
    collection.register_plugin(Stac())
    collection.register_plugin(kp)
    freeport=find_free_port_in_range()
    listen_uri_fn=f"{os.environ['HOSTNAME']}_{freeport}"
    with open(listen_uri_fn, "w"):    
        collection.serve(
            host="0.0.0.0",
            port=freeport,
            #ssl_keyfile=ssl_keyfile,
            #ssl_certfile=ssl_certfile
        )

Overwriting xpublish_references.py


We run this app with an ERA5 2D-surface, analysis dataset with daily frequency. The remapping should work for all variables at once but the registration would take a while. We therefore specify only a single test variable. 

```
dsname="era5"
testvar="100u"
glob_inp="/work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet"
```

by applying:

In [None]:
%%bash --bg
source activate /work/bm0021/conda-envs/cloudify-test
python xpublish_references.py era5 100u /work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet

If sth goes wrong, you can check for *cloudify* processes that you can *kill* by ID.

In [None]:
!ps -ef | grep k204210

In [None]:
!kill 1211291

In [2]:
import glob
server_pattern="*dkrz.de*"
cloudify_uris=[
    "http://"+':'.join(cu.split('_'))
    for cu in sorted(glob.glob(
        server_pattern
    ))
]
#use the first
cloudify_uri=cloudify_uris[0]
print(cloudify_uri)

http://l40157.lvt.dkrz.de:9000


We have to tell the python programs to do not verify ssl certificates for our purposes:

**Xarray**

Our era dataset is available via both the *zarr* API **and** the *kerchunk* API.
They are named similar:

In [3]:
dsname="era5"
zarr_url='/'.join([cloudify_uri,"datasets",dsname,"zarr"])
kerchunk_url='/'.join([cloudify_uri,"datasets",dsname,"kerchunk"])
print(kerchunk_url)

http://l40157.lvt.dkrz.de:9000/datasets/era5/kerchunk


In [4]:
import xarray as xr
ds_raw=xr.open_zarr(kerchunk_url)#
ds_processed=xr.open_zarr(zarr_url)#

In [5]:
ds_raw

Unnamed: 0,Array,Chunk
Bytes,4.14 MiB,4.14 MiB
Shape,"(542080,)","(542080,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.14 MiB 4.14 MiB Shape (542080,) (542080,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  1,

Unnamed: 0,Array,Chunk
Bytes,4.14 MiB,4.14 MiB
Shape,"(542080,)","(542080,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.14 MiB,4.14 MiB
Shape,"(542080,)","(542080,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.14 MiB 4.14 MiB Shape (542080,) (542080,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  1,

Unnamed: 0,Array,Chunk
Bytes,4.14 MiB,4.14 MiB
Shape,"(542080,)","(542080,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 125.02 GiB 4.14 MiB Shape (30955, 542080) (1, 542080) Dask graph 30955 chunks in 2 graph layers Data type float64 numpy.ndarray",542080  30955,

Unnamed: 0,Array,Chunk
Bytes,125.02 GiB,4.14 MiB
Shape,"(30955, 542080)","(1, 542080)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [6]:
ds_processed

Unnamed: 0,Array,Chunk
Bytes,94.47 GiB,3.12 MiB
Shape,"(30955, 640, 1280)","(1, 640, 1280)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 94.47 GiB 3.12 MiB Shape (30955, 640, 1280) (1, 640, 1280) Dask graph 30955 chunks in 2 graph layers Data type float32 numpy.ndarray",1280  640  30955,

Unnamed: 0,Array,Chunk
Bytes,94.47 GiB,3.12 MiB
Shape,"(30955, 640, 1280)","(1, 640, 1280)"
Dask graph,30955 chunks in 2 graph layers,30955 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


As expected, `ds_processed` appears on a regular grid while not yet computed. Each and every chunk is only calculated on request (`load()` or `compute()`). The data on the regular grid can be easily computed.

In [7]:
ds_raw.isel(time=-1).load()

In [8]:
computed=ds_processed.isel(time=-1).compute()
computed

In [9]:
import hvplot.xarray
ds_processed.isel(time=-1).squeeze().hvplot(coastline="50m")



**Intake**

The default **method** for intake datasets is *kerchunk* i.e. the datasets are loaded through the kerchunk API per default.

In [10]:
intake_url='/'.join([cloudify_uri,"intake.yaml"])
print(intake_url)

http://l40157.lvt.dkrz.de:9000/intake.yaml


In [11]:
import intake
storage_options=dict(verify_ssl=False)
cat=intake.open_catalog(
    intake_url,
    storage_options=storage_options
)
list(cat)

['era5']

In [12]:
ds_raw=cat[dsname].to_dask()
ds_processed=cat[dsname](method="zarr").to_dask()

  'dims': dict(self._ds.dims),
  'dims': dict(self._ds.dims),
