# ASDF array storage

One of the powerful features of asdf is the ability to read and write array data.

These data can be small, large or 'big' each presenting a unique set of challenges.

Being familiar with how array data is serialized, written to disk, deserialized and read can help to avoid performance issues, memory problems and puzzling bugs.

## ASDF file layout: the tree and blocks

Asdf, the python implementation of the ASDF standard, supports reading and writing a wide variety of python objects to ASDF files. ASDF files are made of two major components:

- a `tree` (stored as YAML)
- binary `blocks` (typically one for each array in the tree)


To begin exploring how asdf stores array data in blocks, let's start by making a simple ASDF file with a few arrays.

In [1]:
import asdf
import numpy as np

# construct an AsdfFile object with 3 arrays
example_af = asdf.AsdfFile({
    'array0': np.arange(42),
    'array1': np.arange(7),
    'array2': np.arange(26),
})

# print out summary info for the tree
example_af.info()

[1mroot[0m (AsdfObject)
[2m├─[0m[1marray0[0m (ndarray): shape=(42,), dtype=int64
[2m├─[0m[1marray1[0m (ndarray): shape=(7,), dtype=int64
[2m└─[0m[1marray2[0m (ndarray): shape=(26,), dtype=int64


Next, let's define a few functions to help us look at what bytes are produced when asdf writes out an AsdfFile object to an ASDF file. To avoid writing files to disk (unless necessary) let's write a function to write an AsdfFile object to an `io.BytesIO` object (that can be used like a file) and apply it to our `example_af`.

In [2]:
import pathlib
import tempfile

_temporary_directory = tempfile.TemporaryDirectory()
tmp_path = pathlib.Path(_temporary_directory.name)

example_fn = tmp_path / 'example.asdf'
example_af.write_to(example_fn)

with open(example_fn, 'rb') as f:
    print(f.read())

b'#ASDF 1.0.0\n#ASDF_STANDARD 1.5.0\n%YAML 1.1\n%TAG ! tag:stsci.edu:asdf/\n--- !core/asdf-1.1.0\nasdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: \'http://github.com/asdf-format/asdf\',\n  name: asdf, version: 3.0.0.dev307+gb0c9a50}\nhistory:\n  extensions:\n  - !core/extension_metadata-1.0.0\n    extension_class: asdf.extension._manifest.ManifestExtension\n    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0\n    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}\n  - !core/extension_metadata-1.0.0\n    extension_class: asdf.extension.BuiltinExtension\n    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}\narray0: !core/ndarray-1.0.0\n  source: 0\n  datatype: int64\n  byteorder: little\n  shape: [42]\narray1: !core/ndarray-1.0.0\n  source: 1\n  datatype: int64\n  byteorder: little\n  shape: [7]\narray2: !core/ndarray-1.0.0\n  source: 2\n  datatype: int64\n  byteorder: little\n  shape: [26]\n.

This wall of bytes isn't the easiest to view so let's define a few functions to split the ASDF file bytes into the ASDF tree and blocks. We can use the ASDF block 'magic' bytes `b'\xd3BLK'` (which occur at the start of each ASDF block) to break up the file into:

- tree (bytes before the first block)
- blocks
- block index (an optional YAML document at the end of the file that defines the byte offsets for each block)

We will ignore the block index throughout this document (please see the [Low-Level file layout documentation for the asdf-standard](https://asdf-standard.readthedocs.io/en/latest/file_layout.html) for details about the block index and the ASDF file format).

In [3]:
def read_tree_and_blocks(filename):
    """
    Split the bytes that make up an ASDF file and split these
    bytes into:
      - tree (decoded as ascii)
      - a list of blocks (as bytes containing block headers and contents)
    """
    with open(filename, 'rb') as f:
        asdf_bytes = f.read()


    # if these bytes don't contain any block 'magic' bytes
    # (which mark the start of a block) then the written bytes
    # contain no blocks (and only an ASDF tree)
    if b'\xd3BLK' not in asdf_bytes:
        return asdf_bytes.decode('ascii'), []

    # split the written bytes into the tree (which occurs first in the file)
    # and any blocks based on the block 'magic' bytes
    tree, *blocks = asdf_bytes.split(b'\xd3BLK')
    blocks[-1] = blocks[-1].split(b'#ASDF BLOCK INDEX')[0]
    return tree.decode('ascii'), blocks

asdf_tree, asdf_blocks = read_tree_and_blocks(example_fn)

print("=== ASDF Tree ===")
print(asdf_tree)

print("=== ASDF Blocks ===")
print(asdf_blocks)

=== ASDF Tree ===
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev307+gb0c9a50}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
array0: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [42]
array1: !core/ndarray-1.0.0
  source: 1
  datatype: int64
  byteorder: little
  shape: [7]
array2: !core/ndarray-1.0.0
  source: 2
  datatype: int64
  byteorder: little
  shape: [26]
...

=== ASDF Bloc

Since we're in Jupyter we can take this a step further and make the output syntax highlighted (for the ASDF tree YAML) and make Jupyter aware that we want to view AsdfFile objects using our new function.

In [4]:
from IPython.display import display, Markdown


def n_bytes_to_human(n_bytes):
    """Convert a number (of bytes) to a more 'human readable' format
    by adding a SI prefix and unit"""
    _units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
    i = 0
    while n_bytes > 1024:
        n_bytes /= 1024
        i += 1
    return f"{n_bytes:0.0f} {_units[i]}"

def display_asdf_file(af):
    """
    Use Jupyter Markdown formatting to display the ASDF
    tree and blocks produced by writing an AsdfFile, af
    """
    tmp_fn = tmp_path / 'temp.asdf'
    af.write_to(tmp_fn)
    tree, blocks = read_tree_and_blocks(tmp_fn)
    # for the first 10 blocks, show the first 10 bytes and size of each block
    block_text = "\n".join([
        f"\tBlock{i}: {blk[:10]}...[{n_bytes_to_human(len(blk))}]"
        for (i, blk) in enumerate(blocks[:10])
    ])
    # if there are more than 10 blocks, show a message that the remaining
    # blocks were not displayed
    if len(blocks) > 10:
        block_text += f"\n... skipping display of {len(blocks)-10} blocks"
    # render the tree and block text as Markdown
    display(Markdown(f"""
### ASDF Tree

```yaml
{tree}
```

### ASDF Blocks [N = {len(blocks)}]
{block_text}
    """))

# register our 'show_file' function to display AsdfFile instances
get_ipython().display_formatter.formatters['text/plain'].for_type(
    asdf.AsdfFile, lambda n, p, cycle: display_asdf_file(n))

# ending this cell with our AsdfFile object will cause Jupyter to use our new function
example_af


### ASDF Tree

```yaml
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev307+gb0c9a50}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
array0: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [42]
array1: !core/ndarray-1.0.0
  source: 1
  datatype: int64
  byteorder: little
  shape: [7]
array2: !core/ndarray-1.0.0
  source: 2
  datatype: int64
  byteorder: little
  shape: [26]
...

```

### ASDF Blocks [N = 3]
	Block0: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[386 B]
	Block1: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[106 B]
	Block2: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[258 B]
    



Using our helper functions, we can observe that writing out our example AsdfFile produces an ASDF tree containing our 3 arrays, each with a `source` containing the index of the ASDF block that contains the corresponding array data. 

Each ASDF block consists of a header and a binary representation of the array data (more information can be found in the [Low-Level file layout documentation for the asdf-standard](https://asdf-standard.readthedocs.io/en/latest/file_layout.html)).

Before returning to discussing asdf array storage let's define one more helper function to track the number of bytes read from a file (this will be useful when comparing some of the storage options).

In [5]:
class ReadMonitoringFile(asdf.generic_io.RandomAccessFile):
    """
    Class to record the number of bytes read from an ASDF file
    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.n_bytes = 0

    def read(self, *args, **kwargs):
        bs = super().read(*args, **kwargs)
        self.n_bytes += len(bs)
        return bs

import contextlib

@contextlib.contextmanager
def asdf_open_tracking_reads(fn):
    with open(fn, 'rb') as f:
        with ReadMonitoringFile(f, 'r') as mf:
            with asdf.open(mf) as af:
                yield af, mf

## Lazy loading blocks
When the ASDF file is loaded, array data is (by default) "lazy loaded" meaning the ASDF block contents will not be read until the array is accessed.

Note that this default behavior can be disabled by setting the keyword argument `lazy_load` to `False` in `asdf.open`.

In [6]:
# construct an AsdfFile object with 3 arrays
example_af = asdf.AsdfFile({'array0': np.arange(42), 'array1': np.arange(7), 'array2': np.arange(26)})

# write out our example file
example_fn = tmp_path / 'example.asdf'
example_af.write_to(example_fn)

# and a reminder of what it contains
example_af


### ASDF Tree

```yaml
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev307+gb0c9a50}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
array0: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [42]
array1: !core/ndarray-1.0.0
  source: 1
  datatype: int64
  byteorder: little
  shape: [7]
array2: !core/ndarray-1.0.0
  source: 2
  datatype: int64
  byteorder: little
  shape: [26]
...

```

### ASDF Blocks [N = 3]
	Block0: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[386 B]
	Block1: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[106 B]
	Block2: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[258 B]
    



In [7]:
# open with lazy loading (enabled by default)
with asdf.open(example_fn) as af:
    print("Prior to access the array is unloaded")
    print(f"\tArray = {af['array1']}")

    first_element = af['array1'][0]
    print(f"\nAccessing the first element({first_element}) of the array causes asdf to load the data")
    print(f"\tArray = {af['array1']}")

Prior to access the array is unloaded
	Array = <array (unloaded) shape: [7] dtype: int64>

Accessing the first element(0) of the array causes asdf to load the data
	Array = [0 1 2 3 4 5 6]


In [8]:
# open with lazy loading disabled
with asdf.open(example_fn, lazy_load=False) as af:
    print("With lazy_load=False the array data is immediately loaded")
    print(f"\tArray = {af['array1']}")

With lazy_load=False the array data is immediately loaded
	Array = [0 1 2 3 4 5 6]


## Memory mapping
For applications where only a small portion of the array data is needed, loading the entire ASDF block when any portion of the array data is requested would be inefficient.

For local files, asdf will memory map arrays allowing the operating system to read only the portion of the array that is requested.

Note that array memory mapping can be disabled by setting the keyword argument `copy_arrays` to `True` in `asdf.open`.

In [9]:
# open our example file with memory mapping (enabled by default)
with asdf.open(example_fn) as af:
    print("With default copy_arrays=False") 
    # the array data has a base array that is of type numpy.memmap
    print(f"\tarray 0 base array type = {type(af['array0'].base)}")

With default copy_arrays=False
	array 0 base array type = <class 'numpy.memmap'>


In [10]:
# open our example file with memory mapping disabled
with asdf.open(example_fn, copy_arrays=True) as af:
    print("With copy_arrays=True")
    print(f"\tarray 0 base array type = {type(af['array0'].base)}")

With copy_arrays=True
	array 0 base array type = <class 'numpy.ndarray'>


## Chunked array storage

Some applications benefit from breaking up large arrays into smaller chunks.

Each chunk can be stored in a different ASDF block which can be loaded only when the data for that chunk is required.

This can be useful when:
  - memory mapping isn't possible (such as network file access) 
  - memory mapping is inefficient (macOS has poor memory mapping performance for arrays approaching 800 MB)
  - paying per byte read (AWS S3)

To support chunked storage, asdf uses [Zarr](https://zarr.readthedocs.io/en/stable/), a widely used, open source chunked array storage format.

To compare the pros and cons of chunked array storage let's generate an AsdfFile containing a large non-chunked array.

In [11]:
large_array = np.ones((3, 4096, 4096), dtype='uint16')
large_example_af = asdf.AsdfFile({'arr': large_array})

# write it out to an ASDF file
large_example_fn = tmp_path / 'large_example.asdf'
large_example_af.write_to(large_example_fn)

In [12]:
# our large array is stored in one ASDF block
large_example_af


### ASDF Tree

```yaml
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev307+gb0c9a50}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
arr: !core/ndarray-1.0.0
  source: 0
  datatype: uint16
  byteorder: little
  shape: [3, 4096, 4096]
...

```

### ASDF Blocks [N = 1]
	Block0: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[96 MB]
    



Storing this entire large array in one block means that any array access will result in reading the corresponding ASDF block.

In [13]:
# use a helper function to monitor bytes read from the file
with asdf_open_tracking_reads(large_example_fn) as (af, mon):
    af['arr'][0, 0, 0]
    print(f"Accessing one array element resulted in reading {n_bytes_to_human(mon.n_bytes)}")

Accessing one array element resulted in reading 96 MB


If we "chunk" this large array using Zarr we can store each chunk in a different ASDF block.

When an array element is read from the Zarr array, asdf will only need to load the ASDF block that contains the corresponding chunk.

This can drastically reduce the amount of data read from disk.

In [14]:
import zarr

# make a Zarr array from the large NumPy array
# Zarr, unless told otherwise, will pick a default chunk shape based on some
# simple heuristics that aim to produce 1MB uncompressed chunks
# we are setting the compressor to None to disable compression
za = zarr.array(large_example_af['arr'], compressor=None)

# add this Zarr array to an AsdfFile
chunked_example_af = asdf.AsdfFile({'arr': za})

# write it to an ASDF file for later use
chunked_example_fn = tmp_path / 'chunked_example.asdf'
chunked_example_af.write_to(chunked_example_fn)

In [15]:
# print out the ASDF file produced with this Zarr array
chunked_example_af


### ASDF Tree

```yaml
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev307+gb0c9a50}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev307+gb0c9a50}
  - !core/extension_metadata-1.0.0
    extension_class: asdf_zarr.extensions.ZarrExtension
    extension_uri: asdf://stsci.edu/example-project/tags/zarr-1.0.0
    software: !core/software-1.0.0 {name: asdf-zarr, version: 0.0.1}
arr: !<asdf://stsci.edu/example-project/tags/zarr-1.0.0>
  .zarray:
    chunks: [1, 512, 1024]
    compressor: null
    dtype: <u2
    fill_value: 0
    filters: null
    order: C
    shape: [3, 4096, 4096]
    zarr_format: 2
  chunk_block_map: 96
...

```

### ASDF Blocks [N = 97]
	Block0: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block1: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block2: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block3: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block4: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block5: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block6: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block7: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block8: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
	Block9: b'\x000\x00\x00\x00\x00\x00\x00\x00\x00'...[1 MB]
... skipping display of 87 blocks
    



Zarr will default to chunks of ~1 MB.

When indexed or sliced, the corresponding blocks will be read, combined and returned as a standard NumPy array.

In [16]:
with asdf_open_tracking_reads(chunked_example_fn) as (af, mon):
    v = af['arr'][0, 0, 0]
    print(f"Accessing one array element resulted in reading {n_bytes_to_human(mon.n_bytes)}")
    print(f"\treturn type = {type(v)}")

Accessing one array element resulted in reading 1 MB
	return type = <class 'numpy.uint16'>


## Applications for chunked array storage

### Mosaic cutouts

Taking "cutouts", accessing small sub-arrays, of a large mosaic is an obvious application where chunked array storage can be beneficial.

In [17]:
# generate a large 2D montage (~380 MB)
mosaic = zarr.array(np.zeros((10000, 10000), dtype='float32'), compressor=None)
mosaic_af = asdf.AsdfFile({'mosaic': mosaic})

# write it out to disk
mosaic_example_fn = tmp_path / 'mosaic.asdf'
mosaic_af.write_to(mosaic_example_fn)

Opening this large file and taking a small cutout involves very little data actually being read from disk.

In [18]:
with asdf_open_tracking_reads(mosaic_example_fn) as (af, mon):
    v = af['mosaic'][30:60, 90:120]
    print(f"Accessing one cutout resulted in reading {n_bytes_to_human(mon.n_bytes)}")

Accessing one cutout resulted in reading 2 MB


### Spectral cubes

High dimensional data (like spectral cubes) can also benefit from chunked array storage.

For this application, selection of chunk shape can dramatically impact the performance.

In [19]:
# generate a fake spectral cube with 2 spatial and one spectral dimension
shape = (32, 32, 2560)
cube_array = np.zeros(shape, dtype='float32')
print(f"Cube consumes {n_bytes_to_human(cube_array.nbytes)}")

Cube consumes 10 MB


In [20]:
# chunk the spatial dimensions, preserve the spectral dimension
chunks = (8, 8, 2560)  # array shape = 32, 32, 2560
chunked_spatial_fn = tmp_path / "chunked_spatial.asdf"
cube = zarr.array(cube_array, compressor=None, chunks=chunks)

# write to an ASDF file
cube_af = asdf.AsdfFile({'cube': cube})
cube_af.write_to(chunked_spatial_fn)

In [21]:
with asdf_open_tracking_reads(chunked_spatial_fn) as (af, mon):
    # accessing all pixels at one wavelength we will end up loading all chunks
    v = af['cube'][:, :, 0]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 10 MB


In [22]:
with asdf_open_tracking_reads(chunked_spatial_fn) as (af, mon):
    # accessing all wavelengths for one pixel results in loading a single chunk
    v = af['cube'][0, 0, :]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 671 KB


In [23]:
# chunk the spectral dimension, preserve the spatial dimensions
chunks = (32, 32, 256)  # array shape = 32, 32, 2560
chunked_spectral_fn = tmp_path / "chunked_spectral.asdf"
cube = zarr.array(cube_array, compressor=None, chunks=chunks)

# write to an ASDF file
cube_af = asdf.AsdfFile({'cube': cube})
cube_af.write_to(chunked_spectral_fn)

In [24]:
with asdf_open_tracking_reads(chunked_spectral_fn) as (af, mon):
    # accessing all pixels for one wavelength results in loading one chunk
    v = af['cube'][:, :, 0]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 1 MB


In [25]:
with asdf_open_tracking_reads(chunked_spectral_fn) as (af, mon):
    # accessing all wavelengths for one pixel results in loading all chunks
    v = af['cube'][0, 0, :]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 10 MB


In [26]:
# generate a larger spectral cube
shape = (160, 160, 2560)
large_cube_array = np.zeros(shape, dtype='float32')
print(f"Cube consumes {n_bytes_to_human(large_cube_array.nbytes)}")

# chunk the spectral dimension, preserve the spatial dimensions
chunks = (32, 32, 256)  # array shape = 160, 160, 2560
chunked_cube_fn = tmp_path / "chunked_cube.asdf"
large_cube = zarr.array(large_cube_array, compressor=None, chunks=chunks)

# write to an ASDF file
large_cube_af = asdf.AsdfFile({'cube': large_cube})
large_cube_af.write_to(chunked_cube_fn)

Cube consumes 250 MB


In [27]:
with asdf_open_tracking_reads(chunked_cube_fn) as (af, mon):
    # accessing all pixels for one wavelength results in loading one chunk
    v = af['cube'][:, :, 0]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 25 MB


In [28]:
with asdf_open_tracking_reads(chunked_cube_fn) as (af, mon):
    # accessing all wavelengths for one pixel results in loading all chunks
    v = af['cube'][0, 0, :]
    print(f"read {n_bytes_to_human(mon.n_bytes)}")

read 10 MB


The selection of the chunk size and shape is critical.

Chunking across the wrong dimensions or selecting a too small or too large chunk size can result in extra reads and poor performance.

Think about how the data will be accessed.

Treating the array as if it exists in RAM can result in poor performance and high storage access costs.

## Questions?