In [None]:
import asdf
import numpy as np

# 3 - Creating ASDF Files

## Introduction

ASDF files store their information using a tree (nested key/value) structure. This allows the stored
information be be hierarchically organized within the file. Without any extensions, this tree is a
nested combination of basic data structures:
- maps,
- lists,
- arrays,
- strings,
- booleans,
- and numbers.

All of which are stored using `yaml`. Note that more complex structures (ones not directly supported
by `yaml`) are denoted using `yaml` tags. However, those tagged "sub-trees" are still comprised of the
above basic structures and other tagged sub-trees. Additional tagged tagged objects are supported via
ASDF extensions.

The Python analogs for these types are:
- maps -> `dict`,
- lists -> `list`,
- arrays -> `np.ndarray`,
- strings -> `str`,
- booleans -> `bool`,
- and numbers -> `int`, `float`, `complex` (depending on the type of number).

Where `np.ndarray` are treated in a special way distinct from regular `yaml` (binary blocks). Note
that due to limits imposed by Python, dictionary keys are limited to `bool`, `int`, or `str` types
only, while value information can be any of the above data types.

Typically, when creating an ASDF file using the python library, one begins by creating a nested Python
dictionary which corresponds to the nested tree structure one wants the file to have. Indeed, one can
interact with any `AsdfFile` object as if it were a dictionary representing this tree structure.

## Creating ASDF files using basic python types

Lets first create an ASDF file with the key/value pair `"hello": "world"`:

In [None]:
tree = {"hello": "world"}
af = asdf.AsdfFile(tree)
af.write_to("hello.asdf")
af["hello"]

Open the `hello.asdf` file in your favorite text editor. You should see a something that looks like:

In [None]:
with open("hello.asdf") as f:
    print(f.read())

Notice that the file contains more information than just the `"hello": "world"` key value that we
entered. It contains information on the library used to create the file under `asdf_library`, and
information on what the ASDF library needs (schemas, extensions, etc.) to deserialize the stored 
data under `history`. 


### Exercise 1
Create an ASDF file that stores information using all the basic Python types
Except `np.ndarray`:

## Creating ASDF files with `np.ndarray`

Beyond the maps, lists, strings, and numbers built into Python, ASDF can save arrays, in particular
numpy arrays (`nd.array`). Indeed, much of ASDF is dedicated to efficiently saving arrays.

For example if suppose we want to save a random 8x8 numpy array:

In [4]:
tree = {"random_array": np.random.rand(8, 8)}
af = asdf.AsdfFile(tree)
af.write_to("random.asdf")

Now opening this file in your text editor will result in something like:

In [5]:
with open("random.asdf", "r", encoding="unicode_escape") as f:
    print(f.read())

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.12.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.12.0}
random_array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [8, 8]
...
ÓBLK 0                             ¤ØíÀâ-kîÇÏ§Ðó½ â?iì*å?Ê 
CbÓ?ôsGÄ? ÕmÑÌÇ?Ü_é?i]Ï\è?@?_Ë«½?¼w8T
å?²¹<QöÕ?gÑÙñÛ?º!§FÚ?Ô¿Ú$ã?üX7^Öé?of¶ÓGï?vU:ûë?[7î¥oÌâ?j»ÈÜ?Dtò/ÛÉ?GEù4xà?µ|ÝÑüSï?õz&íì?nÜØöIí?´w)Á?Ì¥Îó9ùè?ðdbn·?apÿ¶í½ç?L#ïúÞë? Ê¥mÁ?¡mKðÿå?ÿXö¸Æá?KÊ»]¦Ñ?ì/[µ×â?à¦_î?àü#&ê?ÏæSÒ­í?ð!?Ôô²?dtÂ¾ Ò?q¨j\zä?\}@À?V5{aÔ?Jæ¸²á?Á²¨Ø¤å?rfbïj·í?8>ÇÎ4±? ÂÌõ«p?,»¶oã?è¿Ã±r·â?:ÛO9W¾Ý?y Ò?ÛÂHþä?&Ó@ÏL(ç?©Þtê?HÀ<":

Observe that at the end of the file that there is apparently some binary data. This binary data contains the information
in the random array we wrote. Indeed, when ASDF writes arrays to the file it stores them as binary data in a block after
the YAML section of the file rather in the section itself. Note that `random_array` in the YAML section stores some
information about the nature of the array and includes the `source` key. This `source` value references which binary block 
(in this case block `0`) the data is stored in.

Note that ASDF will store this data in an efficient manner. By this we mean that arrays shared between different objects
stored in the ASDF tree, will only be stored once as a binary block with both entries in the yaml metadata will both
reference the same binary block. Moreover, this extends to objects which reference a different view of the same data,
meaning the views will all still reference the same binary block, only storing information on the view itself.

### Exercise 2

Create tree containing the same `np.ndarray` twice, and multiple views on the same `np.ndarray`.

## Serializing Other Objects

As mentioned above, other types of objects can also be serialized by ASDF including objects outside 
the ASDF-standard; However, support for these objects requires the creation of an ASDF extension, which
we will describe in a later tutorial.

For our current purposes recall that these objects are denoted in the `yaml` metadata via a `yaml`
tag. Indeed some of the objects already discussed are tagged in the metadata. These tags are used by
ASDF to determine which extension to use when reading an ASDF file. This enables the "seamless" nature
of reading objects from an ASDF file, provided the necessary ASDF extension is installed. Note that
when a tagged object is present in an ASDF file, but no extension can be found to handle that tag ASDF
will raise a warning and return that "object" in its "raw" form, meaning you will get the nested dictionary
object rather than a fully realized instance of the object you wrote.

On the other hand, ASDF extensions specify what Python objects they support. This is how ASDF can
seamlessly recognize a complex object and serialize it with no input from the user (other than installing
the correct ASDF extensions).

For example, as part of the install for this course we installed the `asdf-astropy` package, which provides
extensions for writing many `astropy` objects. Indeed `asdf-astropy` enables ASDF support for

- `astropy` `unit` and `quantity` objects.
- (Most) `astropy` model objects.
- `astropy` `Time` objects.
- `astropy` coordinate and frame objects.
- `astropy` `Table` objects.

Thus serializing an `astropy` `Table` object:

In [8]:
from astropy.table import Table

tree = {"table": Table(dtype=[('a', 'f4'), ('b', 'i4'), ('c', 'S2')])}
af = asdf.AsdfFile(tree)
af.write_to("table.asdf")

Notice how no additional effort was need to write the ASDF file since `asdf-astropy` was installed 
already. Now lets perform a cursory inspection of the `gaussian.asdf` file:

In [10]:
with open("table.asdf",  "r", encoding="unicode_escape") as f:
    print(f.read())

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.12.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.12.0}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://astropy.org/astropy/extensions/astropy-1.0.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.1}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf-astropy, version: 0.2.1}
table: !<tag:astropy.org:astropy/table/table-1.0.0>
  colnames: [a, b, c]
  columns:
  - !core/column-1.0.0

### Exercise 3

Write an ASDF file containing the following `astropy` objects:
1. `Quantity`
2. A different model
3. A `Time` object
4. A coordinate object.