In [None]:
import asdf
import numpy as np

# 3 - Creating ASDF Files

## Introduction

ASDF files store their information using a tree structure. This allows the stored information be be
hierarchically organized within the file. Without any extensions, this tree is a nested combination
of basic data structures: maps, lists, arrays, strings, Booleans, and numbers. In Python these types
correspond to `dict`, `list`, `np.ndarray`, `str`, `bool`, and `int`, `float`, `complex`. Where
`np.ndarray` is treated in a special way. These data types can be extended to other types and objects
using extensions.

In the ASDF python library, this tree can be created using a Python dictionary using key/value pairs.
Indeed, one can interact with any `AsdfFile` tree as if it were a dictionary. Note that due to limits
imposed by Python, keys must be `bool`, `int`, or `str` types only, while data information can be any
of the above data types.

## Creating ASDF files using basic python types

Lets first create an ASDF file with the key/value pair `"hello": "world"`:

In [None]:
tree = {"hello": "world"}
af = asdf.AsdfFile(tree)
af.write_to("hello.asdf")
af["hello"]

Open the `hello.asdf` file in your favorite text editor. You should see a something that looks like:

In [None]:
with open("hello.asdf") as f:
    print(f.read())

Notice that the file contains more information than just the `"hello": "world"` key value that we
entered. It contains information on the library used to create the file under `asdf_library`, and
information on what the ASDF library needs (schemas, extensions, etc.) to deserialize the stored 
data under `history`. 

Next lets create a file that stores information using all the other basic python types
(avoiding arrays for now):

In [None]:
tree = {
    "hello": "world",
    "foo": 42,
    "bar": 3.14,
    "true": False,
    "imaginary": complex(2, 3),
    "animals": ["cat", "dog", "bird"],
    "data": {"mean": 3.14, "std": 2.71},
}
af = asdf.AsdfFile(tree)
af.write_to("example.asdf")

Now open `example.asdf` in your text editor. You should see something like:

In [None]:
with open("example.asdf") as f:
    print(f.read())

Again observe that all the data we added to our tree is contained within our asdf file. Notice in
particular the `imaginary` data now has a yaml tag denoting that the data is a complex number, this
tag will be used by ASDF to correctly deserialize this data as a `complex` type later.

### Updating your ASDF files

Recall that opening ASDF files can be simply done with the `asdf.open` command much like the standard
Python `open` command. Note that typically one must explicitly use the `mode` keyword to when specifying
the open method, this is because the `uri` keyword input is before the `mode` keyword input in the open
interface.

For example:

In [None]:
with asdf.open("example.asdf") as af:
    print(f'{af["hello"]=}')
    print(f'{af["foo"]=}')
    print(f'{af["bar"]=}')
    print(f'{af["imaginary"]=}')
    print(f'{af["animals"]=}')
    print(f'{af["data"]=}')

ASDF files can also be updated in the in a similar way. This time by opening the
file as writable and calling the `update` method:

In [None]:
with asdf.open("example.asdf", mode="rw") as af:
    af["new"] = "cool new stuff"
    af.update()

Note that updates can be expensive if they require rewriting the whole file. This can be mitigated
by padding strategies in both the metadata section and within binary blocks.

In [None]:
with open("example.asdf", "r") as f:
    print(f.read())

## Creating ASDF files with numpy arrays

Beyond the maps, lists, strings, and numbers built into Python, ASDF can save arrays, in particular
numpy arrays (`nd.array`). Indeed, much of ASDF is dedicated to efficiently saving arrays.

For example if suppose we want to save a random 8x8 numpy array:

In [None]:
tree = {"random_array": np.random.rand(8, 8)}
af = asdf.AsdfFile(tree)
af.write_to("random.asdf")

Now opening this file in your text editor will result in something like:

In [None]:
with open("random.asdf", "r", encoding="unicode_escape") as f:
    print(f.read())

Observe that at the end of the file that there is apparently some binary data. This binary data contains the information
in the random array we wrote. Indeed, when ASDF writes arrays to the file it stores them as binary data in a block after
the YAML section of the file rather in the section itself. Note that `random_array` in the YAML section stores some
information about the nature of the array and includes the `source` key. This `source` value references which binary block 
(in this case block `0`) the data is stored in.

Indeed if we update the file with another array to the file we get a second block:

In [None]:
with asdf.open("random.asdf", mode="rw") as af:
    af.tree.update({"new_array": np.random.rand(10, 10)})
    af.update()

Opening `random.asdf` in your text editor will give something like:

In [None]:
with open("random.asdf", encoding="unicode_escape") as f:
    print(f.read())

This now has a `new_array` key, which contains a second `source: 1` meaning there is a second binary block.

Now observe that ASDF is smart about storing arrays as binary data; meaning that, if arrays are shared between
entries in the tree, the same binary block is used. Indeed, this extends to sharing views on the data:

In [None]:
duplicated_array = np.random.rand(8, 8)
multi_view_array = np.random.rand(8, 8)
tree = {
    "duplicated_array_0": duplicated_array,
    "duplicated_array_1": duplicated_array,
    "multi_view_array": multi_view_array,
    "new_view": multi_view_array[2:4, 3:6],
}
with asdf.open("random.asdf", mode="rw") as af:
    af.tree.update(tree)
    af.update()

Opening `random.asdf` in your text editor once again gives something like:

In [None]:
with open("random.asdf", encoding="unicode_escape") as f:
    print(f.read())

First, note that `duplicated_array_1` simply listed using a yaml anchor which points to `duplicated_array_0`. Second,
observe that the `source` in both `multi_view_array` and `new_array` are the same value (`2`) rather than distinct values.
So ASDF did not unnecessary duplicate the binary data.

## Serializing other objects

The ASDF library supports writing extensions for objects outside of the ASDF-standard, which we will explain
in detail in another lecture. Assuming that one has installed an ASDF extension to support some custom Python
objects, ASDF will be able to seamlessly save those objects with no additional effort.

For example, as part of the install for this course we installed the `asdf-astropy` package, which provides
extensions for writing many `astropy` objects.

In [None]:
import astropy

quantity = 50 * astropy.units.m
model = astropy.modeling.models.Gaussian2D(quantity, 2, 3, 4, 5)
time = astropy.time.Time("J2000")
coord = astropy.coordinates.ICRS(ra=1 * astropy.units.deg, dec=2 * astropy.units.deg)

tree = {
    "quantity": quantity,
    "model": model,
    "time": time,
    "coord": coord,
}

af = asdf.AsdfFile(tree)
af.write_to("astropy.asdf")

Notice that since `asdf-astropy` was installed, no extra effort was required to write these objects
even though they clearly fall outside the ASDF-standard objects we previously discussed.

Moreover examining `astropy.asdf` in your text editor results in:

In [None]:
with open("astropy.asdf", encoding="unicode_escape") as f:
    print(f.read())

Which clearly shows that we were successful in saving all the `astropy` objects we intended to.