# Introduction to ASDF

The [Advanced Scientific Data Format](https://asdf-standard.readthedocs.io/en/latest/index.html) (ASDF) is a next-generation interchange format for scientific data.

It has the following features:
* A hierarchical, human-readable metadata format (implemented using YAML)
* Numerical arrays are stored as binary data blocks which can be memory mapped. Data blocks can optionally be compressed.
* The structure of the data can be automatically validated using schemas (implemented using JSON Schema)
* Native Python data types (numerical types, strings, dicts, lists) are serialized automatically
* Can be extended to serialize custom data types

The reference implementation of ASDF is written in Python. It can be installed locally using `pip`:

    $ pip install asdf
    
This tutorial will demonstrate the basic features and functionality of ASDF in Python.

## First Steps

In [1]:
import asdf

ASDF enables the storage of arbitrarily nested data structures to disk. 

The fundamental data object in ASDF is the tree, which is a nested combination of basic data structures: dictionaries, lists, strings and numbers. In practice, the top-level object that is passed to ASDF is a Python **dict**, which represents the tree.

Let's begin with an example that contains a data array and some basic metadata:

In [2]:
import numpy as np
# Create a 2D data array
data = np.ones((10, 10))
# Create a perfectly arbitrary data structure containing metadata
import time
info = {'name': "My Data", 'author': "Me", 'time': time.time()}

Now we create the top-level tree object (just a **dict**) which will be stored to ASDF:

In [3]:
tree = {'data': data, 'metadata': info}

The names of the attributes that we use in the tree are perfectly arbitrary, as well as the organization of the data and metadata. (**This is very powerful!**)

Now let's create an **AsdfFile** object from our tree:

In [4]:
af = asdf.AsdfFile(tree)

We can access the original tree by using the **AsdfFile.tree** attribute:

In [5]:
af.tree

{'data': array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]),
 'metadata': {'name': 'My Data', 'author': 'Me', 'time': 1553703366.963867}}

In [6]:
af.tree['data']

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [7]:
af.tree['metadata']['time']

1553703366.963867

Let's store the **AsdfFile** object to disk using the **write_to** method:

In [8]:
af.write_to('mydata.asdf')

In [9]:
!cat mydata.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.3.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0
  author: Space Telescope Science Institute
  homepage: http://github.com/spacetelescope/asdf
  name: asdf
  version: 2.3.2
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software:
      name: asdf
      version: 2.3.2
data: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape:
  - 10
  - 10
metadata:
  author: Me
  name: My Data
  time: 1553703366.963867
...
�BLK 0                             #v�7Yw�"	���*�      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      �?      

Notice that the structure of the **tree** is reflected in the YAML portion of the file. There is metadata about the data array in the tree, but the actual contents of the array are stored in a binary blob at the end of the file. There is also some file-level metadata that wasn't put there by us, but instead was written by **asdf**.

Optionally, it is possible to force the array to be stored inline in the YAML portion of the tree:

In [10]:
af.write_to('mydata.inline.asdf', all_array_storage='inline')

In [22]:
!cat mydata.inline.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.3.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0
  author: Space Telescope Science Institute
  homepage: http://github.com/spacetelescope/asdf
  name: asdf
  version: 2.3.2
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software:
      name: asdf
      version: 2.3.2
data: !core/ndarray-1.0.0
  data:
  - - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
  - - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
  - - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
  - - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
  - - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    - 

When we read the file back in from disk, we see our original data structure restored (with a bit of additional metadata):

In [27]:
new_af = asdf.open('mydata.asdf')
new_af.tree

{'asdf_library': {'author': 'Space Telescope Science Institute',
  'homepage': 'http://github.com/spacetelescope/asdf',
  'name': 'asdf',
  'version': '2.3.2'},
 'history': {'extensions': [<asdf.tags.core.ExtensionMetadata at 0x12191eb70>]},
 'data': <array (unloaded) shape: [10, 10] dtype: float64>,
 'metadata': {'author': 'Me', 'name': 'My Data', 'time': 1553703366.963867}}

Note that for performance reasons, the data array remains unloaded until it is accessed:

In [28]:
new_af.tree['data']

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

## More Complex Data

The ability to store and retrieve data arrays and arbitrary metadata structures is extremely useful. But the real power of ASDF is demonstrated by its ability to process more complex datatypes. In this section we will show examples of storing Astropy data in ASDF.

In [12]:
import astropy

### Data with units

Astropy provides many useful datatypes for astronomical data analysis and processing. Let's revisit the example from above that used a data array. Oftentimes it's useful to assign units to numerical arrays. Astropy allows us to do this:

In [19]:
import astropy.units as u
# Assign units to this data array, which creates a Quantity object
data = np.random.random(50) * u.Hz
print(type(data))
data

<class 'astropy.units.quantity.Quantity'>


<Quantity [0.5633423 , 0.06029182, 0.69691139, 0.97480791, 0.75570379,
           0.48918971, 0.7076413 , 0.79867789, 0.50716963, 0.68414445,
           0.72724503, 0.87505299, 0.92609204, 0.34484137, 0.95147688,
           0.07425507, 0.29291977, 0.81816169, 0.36425154, 0.1123006 ,
           0.60257235, 0.60565052, 0.54116743, 0.74275911, 0.70298961,
           0.08386269, 0.07718973, 0.96375796, 0.53064432, 0.55856835,
           0.01829612, 0.76210144, 0.19030687, 0.96365407, 0.40018301,
           0.27332319, 0.12697452, 0.74975221, 0.69431617, 0.58626024,
           0.60984303, 0.58041828, 0.58334713, 0.42434962, 0.70904853,
           0.45235555, 0.4690009 , 0.40581133, 0.59498298, 0.46091564] Hz>

We can store this new **Quantity** object to an ASDF file:

In [20]:
# Create the tree
tree = {'data': data}
af = asdf.AsdfFile(tree)
af.write_to('quantity.asdf')

In [21]:
!cat quantity.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.3.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0
  author: Space Telescope Science Institute
  homepage: http://github.com/spacetelescope/asdf
  name: asdf
  version: 2.3.2
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software:
      name: asdf
      version: 2.3.2
  - !core/extension_metadata-1.0.0
    extension_class: astropy.io.misc.asdf.extension.AstropyAsdfExtension
    software:
      name: astropy
      version: 3.1.2
data: !unit/quantity-1.1.0
  unit: !unit/unit-1.0.0 Hz
  value: !core/ndarray-1.0.0
    source: 0
    datatype: float64
    byteorder: little
    shape:
    - 50
...
�BLK 0              �      �      ��,{Q�û�V�.h�o��?P��ޮ?�M�?��L[�1�?�&}��.�?�8[�N�?C4^���?��Ď�?�j�̻:�?������?݀4_�E�?�H�!o �?McƋ��?�x���?���r�?��ka�?򂘍2��?� �ja.�?�]\��O�?x)w���?eB��EH�?�(i5}a�?�� [>Q�?�

When we read the file back in, the **Quantity** object is restored:

In [30]:
new_af = asdf.open('quantity.asdf')
new_af.tree

{'asdf_library': {'author': 'Space Telescope Science Institute',
  'homepage': 'http://github.com/spacetelescope/asdf',
  'name': 'asdf',
  'version': '2.3.2'},
 'history': {'extensions': [<asdf.tags.core.ExtensionMetadata at 0x121920198>,
   <asdf.tags.core.ExtensionMetadata at 0x121996048>]},
 'data': <Quantity [0.5633423 , 0.06029182, 0.69691139, 0.97480791, 0.75570379,
            0.48918971, 0.7076413 , 0.79867789, 0.50716963, 0.68414445,
            0.72724503, 0.87505299, 0.92609204, 0.34484137, 0.95147688,
            0.07425507, 0.29291977, 0.81816169, 0.36425154, 0.1123006 ,
            0.60257235, 0.60565052, 0.54116743, 0.74275911, 0.70298961,
            0.08386269, 0.07718973, 0.96375796, 0.53064432, 0.55856835,
            0.01829612, 0.76210144, 0.19030687, 0.96365407, 0.40018301,
            0.27332319, 0.12697452, 0.74975221, 0.69431617, 0.58626024,
            0.60984303, 0.58041828, 0.58334713, 0.42434962, 0.70904853,
            0.45235555, 0.4690009 , 0.40581133, 

In [32]:
new_af.tree['data'].unit

Unit("Hz")

### Tabular data

It is often useful to organize data into tables. Astropy provides the ability to create tables consisting of various datatypes. The following example is taken from the [astropy documentation](http://docs.astropy.org/en/stable/table/construct_table.html#list-of-columns):

In [34]:
from astropy.table import Table
a = np.array([1, 4], dtype=np.int32)
b = [2.0, 5.0]
c = ['x', 'y']
t = Table([a, b, c], names=('a', 'b', 'c'))
t

a,b,c
int32,float64,str1
1,2.0,x
4,5.0,y


Astropy tables can be stored to ASDF files transparently:

In [36]:
tree = {'table': t}
af = asdf.AsdfFile(tree)
af.write_to('table.asdf')

The table is restored when we read the file:

In [38]:
new_af = asdf.open('table.asdf')
new_af.tree

{'asdf_library': {'author': 'Space Telescope Science Institute',
  'homepage': 'http://github.com/spacetelescope/asdf',
  'name': 'asdf',
  'version': '2.3.2'},
 'history': {'extensions': [<asdf.tags.core.ExtensionMetadata at 0x12191ea20>,
   <asdf.tags.core.ExtensionMetadata at 0x12192a710>]},
 'table': <Table length=2>
   a      b     c  
 int32 float64 str1
 ----- ------- ----
     1     2.0    x
     4     5.0    y}

In [39]:
new_af.tree['table']

a,b,c
int32,float64,str1
1,2.0,x
4,5.0,y


## A Note About Extensions

ASDF is able to store several datatypes out of the box. These include all native Python datatypes, as well as **numpy** arrays.

In order to process more complex datatypes, such as those from Astropy, it is necessary to write custom "extension" code for ASDF, which consist of "tags" and "schemas". Astropy provides an ASDF extension for several key datatypes (including **Quantity** and **Table** shown above). The extension is registered as an ASDF plugin, which enables ASDF to recognize how to process these types when Astropy is installed.

More details on extending ASDF in this way can be found in the [documentation](https://asdf.readthedocs.io/en/latest/asdf/extensions.html).