In [1]:
from pprint import pprint as print
import pandas as pd
import numpy as np
import zarr

# Review: NumPy Arrays

In [2]:
a = np.zeros(shape=(10, 20), dtype='f8')
a.shape, a.dtype

((10, 20), dtype('float64'))

This array lives in memory.

How much memory does the array use?

In [3]:
a.nbytes

1600

Getting a piece of data with slicing:

In [4]:
a[:2, :2]

array([[0., 0.],
       [0., 0.]])

Create a new array and assign to it:



In [5]:
b = np.ones(shape=(40, 50), dtype='f8')
b[:10, :20] = a
b

array([[0., 0., 0., ..., 1., 1., 1.],
       [0., 0., 0., ..., 1., 1., 1.],
       [0., 0., 0., ..., 1., 1., 1.],
       ...,
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]])

## Zarr: N-dimensional arrays with any NumPy dtype
Zarr is a format for the storage of chunked, compressed, N-dimensional arrays inspired by [HDF5](https://www.hdfgroup.org/solutions/hdf5/), [h5py](https://www.h5py.org/) and [bcolz](https://bcolz.readthedocs.io/en/latest/).

## Highlights
- Create N-dimensional arrays with any NumPy dtype.
- Chunk arrays along any dimension.
- Compress and/or filter chunks using any NumCodecs codec.
- Store arrays in memory, on disk, inside a Zip file, on S3, …
- Read an array concurrently from multiple threads or processes.
- Write to an array concurrently from multiple threads or processes.
- Organize arrays into hierarchies via groups.

In [13]:
z = zarr.create(shape=(40, 50), chunks=(10, 10), dtype='f8', store='test.zarr') # data is divided into chunks and each chunk is compressed
# Store arguments is for 
z

<zarr.core.Array (40, 50) float64>

In [14]:
z.store

<zarr.storage.DirectoryStore at 0x7f97996988b0>

In [15]:
z.info

0,1
Type,zarr.core.Array
Data type,float64
Shape,"(40, 50)"
Chunk shape,"(10, 10)"
Order,C
Read-only,False
Compressor,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)"
Store type,zarr.storage.DirectoryStore
No. bytes,16000 (15.6K)
No. bytes stored,337


In [16]:
z.fill_value

0.0

No data has been written to the array yet. If we try to access data, we will just get the fill value:

In [17]:
z[0, 0]

0.0

This is how we assign data to the array. When we do this it gets written immediately.

In [18]:
z[:] = 1

In [19]:
z.info

0,1
Type,zarr.core.Array
Data type,float64
Shape,"(40, 50)"
Chunk shape,"(10, 10)"
Order,C
Read-only,False
Compressor,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)"
Store type,zarr.storage.DirectoryStore
No. bytes,16000 (15.6K)
No. bytes stored,1277 (1.2K)
