Creating ASDF Files
===

This tutorial will provide an overview of creating ASDF files using the python library [asdf](https://pypi.org/project/asdf/).

In [1]:
import asdf
import numpy as np

np.random.seed(42)

# Introduction

Creating an ASDF file starts with creating an instance of the [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) class.

In [2]:
af = asdf.AsdfFile()

# 🌲 The "tree"
Data within an ASDF file is organized in a "tree" structure, an arbitrarily nested mapping of key/value pairs (think of this as a Python dict). This allows data to be hierarchically organized within the file.

For example if you have some "data" values and some "meta" describing the condition of the data this can be organized within the tree under "data" and "meta" keys.

In [3]:
af.tree["meta"] = {"my": {"nested": "metadata"}}
af.tree["data"] = [1, 2, 3, 4]
print(af.tree)

{'meta': {'my': {'nested': 'metadata'}}, 'data': [1, 2, 3, 4]}


For ease-of-use the [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) instance can be used like a dictionary (removing the need to always access the `tree` attribute).

In [4]:
af["meta"]

{'my': {'nested': 'metadata'}}

# 🍁 Tree contents

Many of the Python builtin types are supported by ASDF and largely match the basic types in [YAML](https://yaml.org/spec/1.1/).
| Python type | YAML type |
| --- | --- |
| `dict` | `mapping` |
| `list` | `sequence` |
| `str` | `string` |
| `float` | `float` |
| `int` | `int` |
| `None` | `null` |

# Exercise 1: Make an ASDF tree
Create an [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) instance and build a tree containing all of the above supported types.

# 💾 Saving to disk
[AsdfFile.write_to](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.write_to) is the main method used to save ASDF files to disk.

In [5]:
af.write_to("my_tree.asdf")
!ls my_tree.asdf

my_tree.asdf


As ASDF files contain a plain-text header, simple trees can result in files that are human-readable.

In [6]:
af = asdf.AsdfFile()
af["trunk"] = {"branches": [0, 1, 2], "roots": ["a", "b", "c"]}
af.write_to("trunk.asdf")
!cat trunk.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
trunk:
  branches: [0, 1, 2]
  roots: [a, b, c]
...


# Exercise 2: Save your tree
Recreate (if necessary) your custom tree containing all of the supported types and write it to an ASDF file. Open the file in a text editor and view the contents.

# 📋 Standard metadata
In the above example we didn't add the `asdf_library` and `history` keys that appear in the file. These are standard metadata keys added to every ASDF file. They record:

- `asdf_library`: Software library used to produce the file
- `history`: ASDF extensions used to produce the file (and optional user-added history entries)

We won't cover these in more depth. Please see the documentation for more details:
- [AsdfFile.add_history_entry](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.add_history_entry)
- [asdf-1.0.0 schema](https://asdf-standard.readthedocs.io/en/1.1.1/generated/stsci.edu/asdf/core/asdf-1.0.0.html)

# 🪄 "Tagged" Types
For more complicated types, ASDF supports "tagged" objects (as does YAML). By adding a "tag" to an object and saving this to the file the asdf software knows this "tagged" object can be deserialized to a more complicated type. To cover this topic we'll use `complex` numbers.

In [7]:
af = asdf.AsdfFile()
af["z"] = complex(1, 1)
af.write_to("complex.asdf")
!cat complex.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
z: !core/complex-1.0.0 (1+1j)
...


asdf stored our `complex` number `z` as a "tagged" string "(1+1j)" in the file using the `core/complex-1.0.0` tag. When we open this file asdf will see the tag recreate the `complex` number for us from the string.

In [8]:
af = asdf.open("complex.asdf")
type(af["z"])

complex

The mapping between tags and objects is handled by the asdf extension API. Support for new objects can be added by pip installing packages (like [asdf-astropy](https://pypi.org/project/asdf-astropy/)) or users can create and register their own extensions.

We won't go into details about creating an extension here but please see the documentation if there are objects you would like to store in an ASDF file (that aren't already supported):
https://asdf.readthedocs.io/en/latest/asdf/extending/extensions.html

# Exercise 3: Tagged objects
Open one of the ASDF files created above. What is the type of value stored with the "asdf_library" library in the tree?

# 🔢 N-Dimensional arrays
In addition to plain-text representations, ASDF files can contain binary data often used to store arrays of numerical data. It is efficient to read and write and doesn't suffer from loss of precision which might occur for numerical types converted to and from text.

Binary data is stored in "blocks" written after the ASDF tree. Objects in the tree may contain referencees to binary "blocks", the most common being [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) the class asdf uses for `numpy.ndarray` instances.

In [9]:
af = asdf.AsdfFile()
af["data"] = np.arange(10)
af.write_to("binary_data.asdf")
!cat binary_data.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
data: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [10]
...
�BLK 0               P       P       P�� Q��o3s�l��                                                               	       #ASDF BLOCK INDEX
%YAML 1.1
---
- 665
...


When we read this file back in we'll find a [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) instance for `data`.

In [10]:
af = asdf.open("binary_data.asdf")
type(af["data"])

asdf.tags.core.ndarray.NDArrayType

This can mostly be treated the same as a `numpy.ndarray` but provides a few asdf-specific features. By default the array is "lazy loaded". This means [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) will only load the binary data from disk when the array contents are accessed (to reduce disk IO and improve performance).

In [11]:
print(af["data"])

<array (unloaded) shape: [10] dtype: int64>


# Exercise 4: Saving arrays
Generate an ASDF file with 3 arrays and save it to disk. Examine the file contents.

# 👀 Array views
Array views will be stored in ASDF files as views of an ASDF block. For a file with multiple views of the same array this can save space on disk.

In [12]:
af = asdf.AsdfFile()
af["base_array"] = np.zeros((100, 100))
af["view"] = af["base_array"][0]
af.write_to("shared_array.asdf")
!cat shared_array.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
base_array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [100, 100]
view: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [100]
...
�BLK 0             8�     8�     8��B#|T���ek���                                                                                                                                                                                

# Exercise 5: Saving views
Save an ASDF file with a large array and a small view of the array. Open this file and change the view contents. What happens to the large array?

# 🗄 Storage options
For small arrays it is sometimes helpful to "inline" the array data. An "inline" array is stored as human-readable text in the YAML header instead of an ASDF block. This is controlled by calling [AsdfFile.set_array_storage](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.set_array_storage).

In [13]:
af = asdf.AsdfFile()
af["small_array"] = np.arange(5)
af.set_array_storage(af["small_array"], "inline")
af.write_to("inline_array.asdf")
!cat inline_array.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
small_array: !core/ndarray-1.0.0
  data: [0, 1, 2, 3, 4]
  datatype: int64
  shape: [5]
...


For large arrays it may be preferable to compress the ASDF block. Every installation of asdf supports  [bzp2](http://www.bzip.org/) and [zlib](http://www.zlib.net/) compression algorithsm (more can be added via extensions). To tell asdf to compress an array provide a supported 4 character code to [AsdfFile.set_array_compression](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.set_array_compression).

In [14]:
af = asdf.AsdfFile()
af["compressed_array"] = np.zeros((1000, 1000))
af.set_array_compression(af["compressed_array"], "bzp2")
af.write_to("compressed.asdf")
!cat compressed.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
compressed_array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [1000, 1000]
...
���ц� �o��BZh91AY&SY%~�� =F@@�  @  0�	�i��!�"����"�(H�C� #ASDF BLOCK INDEX
%YAML 1.1
---
- 687
...


# Exercise 6: Array storage options
Generate an ASDF file with:
- one array compressed with "zlib"
- a second array that is uncompressed

What happens if you read and then rewrite the file to a new filename?

# ⭐ The asdf-astropy extension

As mentioned above, asdf provides an extension API that can be used to save custom objects to ASDF files. [asdf-astropy](https://pypi.org/project/asdf-astropy/) is an ASDF extension that supports many [Astropy](https://www.astropy.org/) objects. To set up the extension all you need to do is `pip install asdf-astropy`.

Once the extension is installed many [Astropy](https://www.astropy.org/) objects will be supported including:

- [unit](https://docs.astropy.org/en/stable/units/ref_api.html#module-astropy.units) and [quantity](https://docs.astropy.org/en/stable/units/quantity.html) objects
- (most) [modeling](https://docs.astropy.org/en/stable/modeling/index.html) objects
- [time](https://docs.astropy.org/en/stable/time/index.html) objects
- [coordinate](https://docs.astropy.org/en/stable/coordinates/index.html) objects
- [tables](https://docs.astropy.org/en/stable/table/index.html)

For example, to save an astropy [Table](https://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table) simply add it to the ASDF tree:

In [15]:
from astropy.table import Table

af = asdf.AsdfFile()
af["table"] = Table(dtype=[("a", "f4"), ("b", "i4"), ("c", "S2")])
af.write_to("table.asdf")
!cat table.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.6.1}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://astropy.org/astropy/extensions/astropy-1.1.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.6.1}
table: !<tag:astropy.org:astropy/table/table-1.1.0>
  colnames: [a, b, c]
  columns:
  - !core/column-1.0.0
    data: !core/ndarray-1.0.0
      source: 0
      datatype: float32
 

Notice above that the Table is broken down into several nested "tagged" mappings. When loaded asdf will reconstruct the Table without the user needing to be aware of any of this serialization and deserialization

In [16]:
af = asdf.open("table.asdf")
print(type(af["table"]))
af["table"]

<class 'astropy.table.table.Table'>


a,b,c
float32,int32,bytes2


# Exercise 7: Saving Astropy objects

Write an ASDF file containing the following `astropy` objects:
1. [Quantity](https://docs.astropy.org/en/stable/units/quantity.html)
2. A [model](https://docs.astropy.org/en/stable/api/astropy.modeling.Model.html#astropy.modeling.Model)

   Hint: The [astropy.modeling](https://docs.astropy.org/en/stable/modeling/index.html) package provides a framework for representing models and performing model evaluation and fitting. Models are initialized using their parameters like in the following example for [Gaussian1D](https://docs.astropy.org/en/stable/api/astropy.modeling.functional_models.Gaussian1D.html#astropy.modeling.functional_models.Gaussian1D):
   ```
   from astropy.modeling import models
   gauss = models.Gaussian1D(amplitude=10, mean=3, stddev=1.2)
   ```
3. A [Time](https://docs.astropy.org/en/stable/time/index.html) object

    Hint: The [astropy.time](https://docs.astropy.org/en/stable/time/ref_api.html#module-astropy.time) package provides functionality for manipulating times and dates. To initialize it supply a string and a [format](https://docs.astropy.org/en/stable/time/index.html#id3), or supply a datetime object.
    
4. A [ICRS](https://docs.astropy.org/en/stable/api/astropy.coordinates.ICRS.html) coordinate object.