## Reading OME-Zarr Data

## Learning Objectives

- Understand that OME-Zarr data is **lazily loaded**
- Understand how **compression** and **chunking** limits fetched data
- Learn how to load OME-Zarr datasets in Python

### Install and import notebook depedencies

In [1]:
import sys

!{sys.executable} -m pip install -q zarr 'fsspec[http]' ngff-zarr

In [16]:
from zarr.storage import FSStore
from ngff_zarr import from_ngff_zarr
import numpy as np
from rich import print

## Lazy Loading

In OME-Zarr, pixel data is not loaded until it is used, i.e. it is *lazily loaded*.

In [21]:
# Access an OME-Zarr via HTTPS
store = FSStore('https://s3.us-west-2.amazonaws.com/aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr')
multiscales = from_ngff_zarr(store)
print(multiscales)

The previous cell loaded quickly because we did not need to download all the data in the image first!

How large is pixel data for the highest resolution scale?

In [18]:
print(multiscales.images[0].data.nbytes)

> 600 GB!

What is the structure and content of the pixel data? How do we access pixel values

[Dask](https://www.dask.org/) is a Python library that allows use to lazily load the Zarr array.

We can see the structure:

In [19]:
multiscales.images[0].data

Unnamed: 0,Array,Chunk
Bytes,592.80 GiB,144.53 MiB
Shape,"(1, 1, 4200, 10240, 7400)","(1, 1, 1, 10240, 7400)"
Dask graph,4200 chunks in 2 graph layers,4200 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray
"Array Chunk Bytes 592.80 GiB 144.53 MiB Shape (1, 1, 4200, 10240, 7400) (1, 1, 1, 10240, 7400) Dask graph 4200 chunks in 2 graph layers Data type uint16 numpy.ndarray",1  1  7400  10240  4200,

Unnamed: 0,Array,Chunk
Bytes,592.80 GiB,144.53 MiB
Shape,"(1, 1, 4200, 10240, 7400)","(1, 1, 1, 10240, 7400)"
Dask graph,4200 chunks in 2 graph layers,4200 chunks in 2 graph layers
Data type,uint16 numpy.ndarray,uint16 numpy.ndarray


And load regions of the pixel data on demand.

Note that when loading the data, it is transferred in its highly compressed form. Only the chunks than are required to provide the requested region are loaded.

Use `np.asarray` with NumPy indexing on the Dask Array for an memory `numpy.ndarray`'s.

In [20]:
print(np.asarray(multiscales.images[0].data[0,0,0,:8:8]))

## Exercises

### Exercise 1: Compare multiscale sizes

How do the sizes of the downsampled scales compare to the original resolution data?

In [33]:
# %load Reading_OME-Zarr_Data_Exercise_1_Solution.py

### Exercise 2: Fetch image

Fetch all the image pixel data for scale 4.

Print the image metadata for scale 4.

In [36]:
# %load Reading_OME-Zarr_Data_Exercise_2_Solution.py