## Outline

- Why a new data format?
- Main features of the ASDF standard
- Working with ASDF files
  - Read a file and look at the contents
  - Modify a file
  - Save a file to disk
  - File introspection
- Other functionality
- Documentation
- Exercise


### Why a New Data Format

Astronomy has long had a standard format called FITS (Flexible Image Transport System) created in 1981.

- Metadata based on 80 character cards (remnants of punch cards era)
  - Limits size of keyword names to 8 characters; values and comments must fit on the rest of the card.
- Structure of file consists of a list of header/binary items, however:
  - More complex organization must be by convention or using nonstandard extensions
  - Astronomical data sets continue to become more complex
- The specific motivation for developing the standard was that FITS WCS conventions proved unusable for raw HST data that included complex distortion models and required high accuracy. The experience with HST showed that those conventions will not work with the much more complex JWST WCS transforms.
  

The issues with FITS have been documented in a paper by B. Thomas, et al. (Learning from FITS: Limitations in use in modern astronomical research, Astron. Comput. (2015), 10.1016/j.ascom.2015.01.009, arXiv:1502.00996v2).
  
  

### Main Features of ASDF


- It has a hierarchical metadata structure, made up of basic dynamic data types such as strings, numbers, lists and mappings.

- Attribute names and values are not constrained by size as is the case for FITS header cards.

- It has human-readable metadata that can be edited directly in place in the file.

- The structure of the files can be automatically validated using associated schema files.

- It’s designed for extensibility: new conventions may be used without breaking backward compatibility with tools that do not understand those conventions. Versioning systems are used to prevent conflicting with alternative conventions.

- The binary array data (when compression is not used) is a raw memory dump, and techniques such as memory mapping can be used to efficiently access it.

- It is possible to read and write the file in as a stream, without requiring random access.

- It’s built on top of industry standards, such as YAML and JSON Schema to take advantage of a larger community
working on the core problems of data representation. This also makes it easier to support ASDF in new programming languages and environments by building on top of existing libraries.

- Since every ASDF file has the version of the specification to which it is written, it will be possible, through careful planning, to evolve the ASDF format over time, allowing for files that use new features while retaining backward compatibility with older tools.

### Who Uses ASDF?

- James Webb Space Telescope (JWST)
- Nancy Grace Roman Space Telescope
- Daniel K Inoue Solar Telescope (DKIST) 
- Vera Rubin Telescope as a WCS exchange format.
- Other non-institutional projects using it in astronomy and other fields. 

In [None]:
import numpy as np

from astropy.modeling import models
from astropy.modeling.core import Model
from astropy.time import Time
from astropy import units as u

### Working with ASDF files

In this section we'll look at the structure of ASDF files and learn how to read, write, create and modify them. ASDF files have an extension `.asdf`. Since it's a human redable format we can simply look at the file with a comman line shell tool or an editor.

#### Anatomy of an ASDF file

ASDF is a hybrid text and binary format. The text uses YAML. The general layout of the file is

- Header 
- Tree (optional)
  The tree is a dictionary. Most Python types can be serialized directly, using YAML, as {key: value} pairs in the tree. 
- Binary blocks (optional)
- Binary block index (optional)

The header, tree and block index are text, while the blocks are raw binary.

Python primitives are supported natively in YAML.
 

In [None]:
!cat primitives.asdf

The Python ASDF library is a standalone package distributed through PyPi and conda-forge.

To open a file use the `open` function. It accepts several keyword arguments; there are options specifying in what mode a file should be opened or whether it should be validated during opening. For this example we will use the default behavior and look at the object. By default asdf opens files in memory mapping mode but there's an option to read the entire file in memory when opening it.

In [None]:
import asdf

In [None]:
af = asdf.open('primitives.asdf')

In [None]:
af.tree

#### Create an ASDF file

The tree is a Python dictionary. The code to create the above file is

In [None]:
tree = {
    'number': 6.0,
    'boolean': True,
    'integer': 11,
    'string': 'goodbye world',
    'list': [1, 4, 9, 16],
    'dictionary': {'x': [1, 3, 5], 'y': {'nests': True, 'top': False}}
    }

af = asdf.AsdfFile(tree)
# af.write_to('primitives.asdf')

Before writing the file to disk, let's look at what other types can be serialized to ASDF without writing custom code. These include

- numpy arrays
- many astropy types
  - models
  - coordinate frames
  - tables
  - Time objects
  - Units and Quantities
- Generalized World Coordinate (gwcs) objects

Adding new objects to the file is done by assigning to the `tree`. Note that assigning to the tree is equivalent to assigning to the AsdfFile object.

In [None]:
ar = np.random.randn(20)

af.tree['array_1'] = ar
af['array_2'] = ar
af['array_3'] = ar+1

# af.tree

In [None]:
gauss = models.Gaussian1D(amplitude=3.4, mean=2.3, stddev=1.6)
p = models.Polynomial1D(1, c0=0.2, c1=.1)
af['model'] = gauss + p

af['time'] = Time.now().isot

#af.tree

#### Add one or more `History`  entries to the file

In [None]:
af.add_history_entry("This file was generated during AAS.")

In [None]:
af.write_to('other_types.asdf')

In [None]:
!cat other_types.asdf

In [None]:
af1 = asdf.open('other_types.asdf')
af1.tree

#### Things to note

- Arrays are not loaded to memory until accessed

`'array_1': <array (unloaded) shape: [20] dtype: float64>`

- Identical objects are not saved as copies but references 

```
array_1: &id001 !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [20]
array_2: *id001
```

- Tags

The use of YAML tag syntax provides a mechanism to the asdf library to do something
special with the content that follows. The library has machinery that links tags to code that knows how to turn the content into a Python object, as well as find an associated schema to validate that content in the ASDF file. Every custom type has an associated code that performs the conversion, called `Converter`. Reading a file recreates the serialized objects in memory. For example, the model we read in is ready to be evaluated.

In [None]:
print(af['model'](1.2))

Anyone can define their own tags and write their own converters for turning tags into Python objects.

In [None]:
# It's possible to save arrays "inline", i.e. as text and not as a binary block, by passing
# a keyword argument to the "write_to" method. This is OK for small arrays and deteriorates 
# performance for large ones.

# af.write_to('other_types_inline.asdf', all_array_storage='inline')

# !cat other_types_inline.asdf

#### Getting information about an ASDF file

There are two functions that allow file introspection, `info` and `search`. They are available as methods on the `AsdFile` object or on the command line. Both are configurable through multiple parameters.

In [None]:
af1.info()

In [None]:
af.search('array')

### Other capabilities

##### compression

ASDF supports array compression using **zlib**, **bzp2**, **lz4**, and there's mechanism to add custom compression algorithms.

To specify which compression algorithm to use, pass the code to the *set_array_compression* method.

In [None]:
comp = asdf.AsdfFile()
ar_zeros = np.zeros((4000, 4000))
comp['compressed'] = ar_zeros
comp.set_array_compression(ar_zeros, 'bzp2')
comp.write_to('with_compression.asdf')

In [None]:
!cat with_compression.asdf

In [None]:
c=asdf.open('with_compression.asdf')

In [None]:
c.tree


##### command line tool

There's a command line tool, `asdftool` which does many of the operations shown sofar outside the Python interpreter. Check the options using `asdftool --help`.



##### validation using json schemas

ASDF uses JSON schema to validate the contents of the files. If used this is a powerful way to make sure files are correct.

##### exploded form

ASDF supports the so called exploded form. ASDF files can be split into one for the YAML content and one for each of the binary blocks contained within the file, facilitating easier editor access to the YAML, and independent program access to the binary data.

#### Future work

- Add support for chunking arrays using **zarr**

- Add support for efficient access of large files in the cloud

- Visualization suport

- A C/C++ library, an IDL library?

- Add more compression options


#### Documentation

- Original ASDF Paper: https://www.sciencedirect.com/science/article/pii/S2213133715000645
- Standard Documentation: https://asdf-standard.readthedocs.io/en/latest/
- Python package documentation: https://asdf.readthedocs.io/en/stable/
- Tutorial at Scipy, 2022: https://github.com/spacetelescope/scipy2022tutorial

### Exercise

Reading and accessing data

- Open the file jwst.asdf in the 02_Working_With_ASDF_Files directory. 
- Look at the info method's help and display the file using some of the arguments to show more contents.
- Search for a few attributes - wcs, data (wcs stands for World Coordinate System. In astronomy it represents the transform from pixel coordinates to sky coordinates or some physical system.)
- Retrieve the wcs object following the path showed by the search method
- Look at the wcs object and print `wcs.forward_transform`
- Use matplotlib to display the data array
- Look at the data array and modify the value of data[0, 0] to 999.