Creating ASDF Files
===

(60 min)

- Summary of serializing basic types and numpy arrays
  - the ASDF "tree"
  - bool, str, int, None, complex (and limitations)
  - list, dict (and recursion)
  - arrays
    - storage type (worth covering?)
    - ndarray vs NDArrayType
    - shared data
- Exercises
  - Write an asdf file with
- Serializing custom types
  - astropy objects
    - Time
    - Quantity (Unit)
    - SkyCoord
    - Table
    - Models (more on that later)
- Exercises
  - 3, 4, 5, 6 and 7

In [1]:
import asdf
import numpy as np

np.random.seed(42)

# 3 - Creating ASDF Files

## Introduction

Creating an ASDF file involves creating an instance of the [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) class.

In [2]:
af = asdf.AsdfFile()

## The "tree"
Data within an ASDF file is organized in a "tree" structure, an arbitrarily nested mapping of key/value pairs (think of this as a Python dict). This allows data to be hierarchically organized within the file. For example if you have some "data" values and some "meta" describing the condition of the data this can be organized within the tree under "data" and "meta" keys.

In [3]:
af.tree["meta"] = {"my": {"nested": "metadata"}}
af.tree["data"] = [1, 2, 3, 4]
print(af.tree)

{'meta': {'my': {'nested': 'metadata'}}, 'data': [1, 2, 3, 4]}


For ease-of-use the [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) instance can be used like a dictionary (removing the need to always access the `tree` attribute).

In [4]:
af["meta"]

{'my': {'nested': 'metadata'}}

## Tree contents

Many of the Python builtin types are supported by ASDF and largely match the basic types in [YAML](https://yaml.org/spec/1.1/).
| Python type | YAML type |
| --- | --- |
| `dict` | `mapping` |
| `list` | `sequence` |
| `str` | `string` |
| `float` | `float` |
| `int` | `int` |
| `None` | `null` |

# Exercise 1: Make an ASDF tree
Create an [AsdfFile](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile) instance and build a tree containing all of the above supported types.

## Saving to disk
[AsdfFile.write_to](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.write_to) is the main method used to save ASDF files to disk.

In [5]:
af.write_to("my_tree.asdf")
!ls

Creating ASDF Files.ipynb from_scipy.ipynb          random.asdf
all_compressed.asdf       hello.asdf                scipy_2.ipynb
binary_data.asdf          inline.asdf               selected_compressed.asdf
complex.asdf              model.asdf                table.asdf
compound.asdf             my_tree.asdf              trunk.asdf
foo.asdf                  partial_inline.asdf


As ASDF trees contain a plain-text header, simple trees can result in files that are human-readable.

In [6]:
af = asdf.AsdfFile()
af["trunk"] = {"branches": [0, 1, 2], "roots": ["a", "b", "c"]}
af.write_to("trunk.asdf")
!cat trunk.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
trunk:
  branches: [0, 1, 2]
  roots: [a, b, c]
...


# Exercise 2: Save your tree
Recreate (if necessary) your custom tree containing all of the supported types and write it to an ASDF file. Open the file in a text editor and view the contents.

## Standard metadata
In the above example you may notice that we didn't add the `asdf_library` and `history` keys yet they appeared in the file. These are standard metadata keys added to every ASDF file to help record:

- `asdf_library`: Software library used to produce the file
- `history`: ASDF extensions used to produce the file (and optional user-added history entries)

We won't cover these in more depth. Please consult the documentation for more details:
- [AsdfFile.add_history_entry](https://asdf.readthedocs.io/en/latest/api/asdf.AsdfFile.html#asdf.AsdfFile.add_history_entry)
- [asdf-1.0.0 schema](https://asdf-standard.readthedocs.io/en/1.1.1/generated/stsci.edu/asdf/core/asdf-1.0.0.html)

## "Tagged" Types
For more complicated type, ASDF supports "tagged" objects (as does YAML). By adding a "tag" to an object and saving this to the file we can inform the asdf software that this "tagged" object can be deserialized to a more complicated type. To cover this topic we'll use `complex` numbers.

In [7]:
af = asdf.AsdfFile()
af["z"] = complex(1, 1)
af.write_to("complex.asdf")
!cat complex.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
z: !core/complex-1.0.0 (1+1j)
...


To store our `complex` number `z` asdf stored a string "(1+1j)" in the file and added the `core/complex-1.0.0` tag to string. When we reload this file asdf will see the tag and string and recreate the `complex` number for us.

In [8]:
af = asdf.open("complex.asdf")
type(af["z"])

complex

The mapping between tags and objects is handled by the asdf extension API. Support for new objects can be added by pip installing package (like [asdf-astropy](https://pypi.org/project/asdf-astropy/)) or users can create and register their own extensions.

We won't go into details about creating an extension here but please see the documentation if there are objects you would like to store in an ASDF file (that aren't already supported):
https://asdf.readthedocs.io/en/latest/asdf/extending/extensions.html

# Exercise 3:
?

## N-Dimensional Arrays
In addition to plain-text representations, ASDF files can contain binary data. Although this isn't human readable it can be efficiently read and written and doesn't suffer from loss of precision which might occur for numerical types converted to and from text.

Binary data is stored in "blocks" that are written after the ASDF tree. Objects in the tree may contain referencees to binary "blocks", the most common being [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) the class asdf uses for `numpy.ndarray` instances.

In [9]:
af = asdf.AsdfFile()
af["data"] = np.arange(42)
af.write_to("binary_data.asdf")
!cat binary_data.asdf

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf, version: 3.5.0}
data: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [42]
...
�BLK 0              P      P      P�=�;q��fC�kF��                                                               	       
                                                                                                                                             !       "       #       $       %       &  

When we read this file back in we'll get a [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) instance for `data`.

In [10]:
af = asdf.open("binary_data.asdf")
type(af["data"])

asdf.tags.core.ndarray.NDArrayType

This can mostly be treated the same as a `numpy.ndarray` but provides a few asdf-specific features. By default the array is "lazy loaded". This means [NDArrayType](https://asdf.readthedocs.io/en/latest/api/asdf.tags.core.NDArrayType.html#asdf.tags.core.NDArrayType) will only load the binary data from disk when the array contents are accessed (to reduce disk IO and improve performance).

In [11]:
print(af["data"])

<array (unloaded) shape: [42] dtype: int64>


# Exercise 4: Saving arrays

TODO: views and shared blocks

# Exercise 5: Saving views

TODO: array storage options and compression

# Exercise 6: Array storage options

## Serializing Other Objects

As mentioned above, other types of objects can also be serialized by ASDF including objects outside 
the ASDF-standard; However, support for these objects requires the creation of an ASDF extension, which
we will describe in a later tutorial.

For our current purposes recall that these objects are denoted in the `yaml` metadata via a `yaml`
tag. Indeed some of the objects already discussed are tagged in the metadata. These tags are used by
ASDF to determine which extension to use when reading an ASDF file. This enables the "seamless" nature
of reading objects from an ASDF file, provided the necessary ASDF extension is installed. Note that
when a tagged object is present in an ASDF file, but no extension can be found to handle that tag ASDF
will raise a warning and return that "object" in its "raw" form, meaning you will get the nested dictionary
object rather than a fully realized instance of the object you wrote.

On the other hand, ASDF extensions specify what Python objects they support. This is how ASDF can
seamlessly recognize a complex object and serialize it with no input from the user (other than installing
the correct ASDF extensions).

For example, as part of the install for this course we installed the `asdf-astropy` package, which provides
extensions for writing many `astropy` objects. Indeed `asdf-astropy` enables ASDF support for

- `astropy` `unit` and `quantity` objects.
- (Most) `astropy` model objects.
- `astropy` `Time` objects.
- `astropy` coordinate and frame objects.
- `astropy` `Table` objects.

Thus serializing an `astropy` `Table` object:

In [12]:
from astropy.table import Table

tree = {"table": Table(dtype=[("a", "f4"), ("b", "i4"), ("c", "S2")])}
af = asdf.AsdfFile(tree)
af.write_to("table.asdf")

Notice how no additional effort was needed to write the ASDF file since `asdf-astropy` was installed 
already. Now lets perform a cursory inspection of the `table.asdf` file:

In [13]:
with open("table.asdf", "r", encoding="unicode_escape") as f:
    print(f.read())

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.5.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    manifest_software: !core/software-1.0.0 {name: asdf_standard, version: 1.1.1}
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.6.1}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://astropy.org/astropy/extensions/astropy-1.1.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.6.1}
table: !<tag:astropy.org:astropy/table/table-1.1.0>
  colnames: [a, b, c]
  columns:
  - !core/column-1.0.0
    data: !core/ndarray-1.0.0
      source: 0
      datatype: float32
 

# Exercise 7: Saving `astropy` objects

Write an ASDF file containing the following `astropy` objects:
1. `Quantity`
2. A `model`

   Hint: The `astropy.modeling` package provides a framework for representing models and performing model evaluation and fitting. Models are initialized using their parameters
   ```
   from astropy.modeling import models
   gauss = models.Gaussian1D(amplitude=10, mean=3, stddev=1.2)
   ```
3. A `Time` object

    Hint: The `astropy.time` package provides functionality for manipulating times and dates. To initialize it supply a string and a format, or supply a datetime object.
    
4. A Celestial coordinate object (astronomy specific).