# Kerchunk/Zarr Demonstration (Direct Link)

Here we demonstrate the use of kerchunk/zarr files for CCI use cases, using an example cloud-ECV dataset, derived from ATSR2/AATSR Measurements.

**more details on the dataset as needed**

In [1]:
import xarray as xr
import fsspec
file = 'https://dap.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/obs4MIPs/DWD/ESACCI-CLOUD-ATSR2-AATSR-3-0/mon/clwCCI/gr/v20200106/clwCCI_mon_ESACCI-CLOUD-ATSR2-AATSR-3-0_BE_gr_199501-201212_kr1.0.json'

## Kerchunk Basics
Kerchunk is an aggregation framework for legacy file formats to both aggregate existing multi-file datasets and enable cloud-based services to access the data. Native NetCDF is not suited for cloud applications because the file cannot be accessed remotely without downloading the entire file. Cloud optimised formats tend to break the data into individual smaller objects that are easier to manipulate across networks. 

Kerchunk enables byte-range requests over NetCDF files exposed for download, acting as if internal chunks within NetCDF are their own independent objects to be requested, thereby getting past the download issue. For CCI we have a number of kerchunk files that are archived and available for use.

In [2]:
file = 'https://dap.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/obs4MIPs/DWD/ESACCI-CLOUD-ATSR2-AATSR-3-0/mon/clwCCI/gr/v20200106/clwCCI_mon_ESACCI-CLOUD-ATSR2-AATSR-3-0_BE_gr_199501-201212_kr1.0.json'
mapper = fsspec.get_mapper('reference://',fo=file)
ds = xr.open_zarr(mapper, consolidated=False)
ds

Unnamed: 0,Array,Chunk
Bytes,10.22 GiB,2.22 MiB
Shape,"(216, 7, 7, 360, 720)","(1, 3, 3, 180, 360)"
Dask graph,7776 chunks in 2 graph layers,7776 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 10.22 GiB 2.22 MiB Shape (216, 7, 7, 360, 720) (1, 3, 3, 180, 360) Dask graph 7776 chunks in 2 graph layers Data type float32 numpy.ndarray",7  216  720  360  7,

Unnamed: 0,Array,Chunk
Bytes,10.22 GiB,2.22 MiB
Shape,"(216, 7, 7, 360, 720)","(1, 3, 3, 180, 360)"
Dask graph,7776 chunks in 2 graph layers,7776 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.62 kiB,5.62 kiB
Shape,"(360, 2)","(360, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 5.62 kiB 5.62 kiB Shape (360, 2) (360, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  360,

Unnamed: 0,Array,Chunk
Bytes,5.62 kiB,5.62 kiB
Shape,"(360, 2)","(360, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.25 kiB,11.25 kiB
Shape,"(720, 2)","(720, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 11.25 kiB 11.25 kiB Shape (720, 2) (720, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  720,

Unnamed: 0,Array,Chunk
Bytes,11.25 kiB,11.25 kiB
Shape,"(720, 2)","(720, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,112 B,112 B
Shape,"(7, 2)","(7, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 112 B 112 B Shape (7, 2) (7, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  7,

Unnamed: 0,Array,Chunk
Bytes,112 B,112 B
Shape,"(7, 2)","(7, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,112 B,112 B
Shape,"(7, 2)","(7, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 112 B 112 B Shape (7, 2) (7, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  7,

Unnamed: 0,Array,Chunk
Bytes,112 B,112 B
Shape,"(7, 2)","(7, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.38 kiB,16 B
Shape,"(216, 2)","(1, 2)"
Dask graph,216 chunks in 2 graph layers,216 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 3.38 kiB 16 B Shape (216, 2) (1, 2) Dask graph 216 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",2  216,

Unnamed: 0,Array,Chunk
Bytes,3.38 kiB,16 B
Shape,"(216, 2)","(1, 2)"
Dask graph,216 chunks in 2 graph layers,216 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray


## Zarr Stores
Zarr is a cloud-native data format which works across multiple programming languages including Python, and can be accessed remotely when hosted on Object Storage. A Zarr store consists of a set of metadata JSON files alongside binary data files that contain the compressed chunked portions of the total arrays for different variables. Inspecting a zarr store will reveal a filesystem directory structure inside the store, with different directories for the various variables and dimensions of the store. This therefore means that zarr is NOT a file-format, and workflows that involve Zarr stores must adopt an Object-based instead of a File-based approach to data access.

In [None]:
zstore = ''
ds = xr.open_dataset(zstore, engine='zarr')