## Introduction

This is intended for those that are only interested in some of the details. It is not a full description, but should be treated as a basic introduction. There are more advanced aspects that will be addressed in other tutorials.

### Contrasting with FITS

As a refresher, FITS files consist of a list of one or more HDU units (Header Data Units). Each header consists of a multiple of 36 80 byte ASCII records (i.e., each header has a size that must be a multiple of 2880 bytes). Most of these 80 byte records (a remnant of the old punched cards size) consist of a keyword and a value, with an optional comment. Keywords are limited to 8 characters, the values may have type Boolean, integer, float or string and may be as long as as there are spaces left on the record. Comments can take up the remaining unused character spots.

The data unit must also be of a size that is a multiple of 2880 bytes but the data itself doesn't have to be that size. Data may be an array, table, ascii table and some other more specialized types for special fields (e.g., Radio Astronomy). What type it is and the type of the numbers and elements contained within are specified in the associated header.

### ASDF structure

ASDF files have only one header, which is pure YAML (Yaml Ain't Markup Language) followed by 0 or more binary blocks.

Yaml is a descriptive text format meant to be reasonably readable by people (as opposed to, say, XML). JSON, a common web interchange format is a subset of YAML. YAML is capable of representing complex organization of information by the means of two basic structures (Python terminology will be used here although this will mention the corresponding YAML terms once), these being dictionaries (maps in YAML) and lists (arrays in YAML). YAML is a bit schizophenic in that it has two different syntaxes for representing such structure; one is Python like and uses indentation, the other is more C like and uses grouping delimiters such as `{` an `}`. Both can be used within the same file.

Advantages of YAML over FITS headers:

- equivalent of keywords can be very long (between 128 and 1024 characters depending on the library)
- values lengths are not restricted, and are not restricted to be simple values, values may be dictionaries or lists
- tag feature allows libraries to handle specific contents in a custom way (more later)
- references
- and that implicit grouping is implied by the use of dictionaries and lists, something FITS has no equivalent mechanism other then defining special conventions for specific data sets.

The ASDF standard adds specific extra features to YAML capabilities and the ability to include binary data  that will be summarized later.

### Examples

First an example without binary data, viewed as a simple text file

In [None]:
import asdf
from astropy.utils.data import download_file
asdf_file_url = 'https://data.science.stsci.edu/redirect/ASDF/asdf_tut_simple_1.asdf'
filepath = download_file(asdf_file_url)

In [None]:
with open(filepath) as asdffile: print(asdffile.read())

The lines appearing before the line starting with `---` are used to indicate that it is an ASDF file, the versions of the appropriate components, and lastly an indication of where tags are defined. We will ignore tags for now. The actual yaml starts with the `---` and the end of the yaml is indicated by the `...`, which in this case is the end of the file. The yaml starts with the indication of the libraries involved and the homepage (the `asdf_library and history` items). The user content starts with the `target` entry. While this is a very simple file, it is apparent that the first `name` key that appears is associated with `target` and that the second `name` key that appears is associated with `proposer` by dint of the indentation and proximity.

Now to access this file in Python

In [None]:
af = asdf.open(filepath)
af.info()

The `info` method on the opened ASDF file basically shows the same information as the raw text does along with the type of the values. Accessing particular items is simple:

In [None]:
af['target']

In [None]:
af['target']['frame']['name']

In [None]:
af['co-Is'][1]

Let's add a couple small data arrays and write out to a new file.

In [None]:
import numpy as np
af['data1'] = np.ones((10,10))
af['data2'] = np.zeros((10,))
af.info()

In [None]:
af.write_to('tut_with_data.asdf')

In [None]:
# Open as an ordinary file first
with open('tut_with_data.asdf','rb') as taf2: print(taf2.read())

In [None]:
# Print as string, but requires conversion of text part
with open('tut_with_data.asdf','rb') as taf2: print(taf2.read()[:852].decode("utf-8"))

## Tags

Tags are the mechanism asdf uses to indicate what follows is to be interpreted a special way, and in Python this usually means it should be converted to a specific Python object. There is
only one kind of tag used in this example (besides the core/asdf-1.1.0, core/software-1.0.0 
and core/extension_metadata-1.0.0 tags used in the previous example) and it is to indicate
that the following items are to be interpreted as an array and thus converted to a numpy
array when reading. This particular tag is !core/ndarray-1.0.0 that the Python library
uses to identify numpy arrays. note what follows the tag indicates the type, shape and 
byte order of the array, as well as which binary block to find the binary data (other
attributes are possible in more complex cases). When reading this file, this information
becomes part of the resulting numpy array.

In [None]:
# Now open as an asdf file
af2 = asdf.open('tut_with_data.asdf')
af2.info()

In [None]:
af2['data1'].shape

In [None]:
af2['data2']

## Growing the asdf structure

Once created, the asdf structure can be dynamically extended in arbitrary ways. The following will exhibit the flexibility available

In [None]:
# Add a new dict node
af2['misc'] = {'my_very_lucky_numbers': [3, 7]}

In [None]:
af2.info(max_rows=None)

In [None]:
af2['misc']

In [None]:
# Modify that node in the asdf tree
af2['misc']['my_very_lucky_numbers'] += [13, 27]

In [None]:
af2['misc']

In [None]:
# Add more items to misc
import astropy.units as u
cornbread = {
    'flour': 0.25 * u.l,
    'yellow cornmeal': 0.25 * u.l,
    'white sugar': 1/6 * u.l,
    'salt': 5 * u.ml,
    'egg': 1,
    'milk': 0.25 * u.l,
    'vegetable oil': 0.25 * u.l
}
af2['misc']['ingredients_for_cornbread'] = cornbread
af2['misc']['speech'] = '''
Four score and seven years ago our fathers brought forth on this continent,
a new nation, conceived in Liberty,
and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war,
testing whether that nation, or any nation so conceived and so dedicated,
can long endure. We are met on a great battle-field of that war.
We have come to dedicate a portion of that field, as a final resting
place for those who here gave their lives that that nation might live.
It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate
-- we can not hallow -- this ground. The brave men, living and dead,
who struggled here, have consecrated it, far above our poor power to
add or detract. The world will little note, nor long remember what we
say here, but it can never forget what they did here.
It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here have thus far so nobly advanced.
It is rather for us to be here dedicated to the great task remaining before us
-- that from these honored dead we take increased devotion to that cause for
which they gave the last full measure of devotion -- 
that we here highly resolve that these dead shall not have died in vain --
that this nation, under God, shall have a new birth of freedom --
and that government of the people, by the people, for the people,
shall not perish from the earth.
'''
# Delete a node
del af2.tree['target']

In [None]:
af2.info()

In [None]:
af2['misc']

In [None]:
print(af2['misc']['speech'])

## References

This provides a way of sharing the same item among different attributes or lists in the file without duplicating the information. Within the python library this is handled automatically if you assign exactly the same object to two different attributes or list items.

In [None]:
# Illustrate with a small array
refdata = np.array([1, 1, 2, 3, 5, 8, 13, 21])
af2['ref1'] = refdata
af2['misc']['ref2'] = refdata
# change one of these
af2['ref1'][1] = -100
# print the other
af2['misc']['ref2']

In [None]:
# Save to a new file
af2.write_to('tut_with_data2.asdf')
# Read back in
af3 = asdf.open('tut_with_data2.asdf',mode='rw')
af3['ref1']


In [None]:
af3['misc']['ref2']

In [None]:
af3['misc']['ref2'][0] = 333
af3['ref1']

Note that after saving and reading back in, these two attributes still share the same object

In [None]:
# Let's look at the corresponding raw contents of the last file'
with open('tut_with_data2.asdf','rb') as taf2: print(taf2.read()[:3204].decode("utf-8"))

The way references are handled is to identify the first occurance with a 
special identifier (here &id001) that later can be referred to using the syntax *id001,
which indicate that attribute or list item is to use exactly the same object. The same
reference may be used any number of times.

A practical example of utilizing references is to use the same mask or data quality array
for all the arrays associated with the data such as error arrays, net integration time,
etc.

Also note the appearance of the tags to indicate items that are astropy quantities
(i.e., values with units, !unit/quantity-1.1.0), and to identify the units themselves 
(e.g., !unit/unit-1.0.0)

Finally, all arrays can be saved as text instead using the "inline" option (or individually designated as inline).

In [None]:
af3.write_to('tut_with_inline_data.asdf', all_array_storage='inline')

In [None]:
# Read as text
with open('tut_with_inline_data.asdf','rb') as tafinline: print(tafinline.read().decode("utf-8"))

## Exercise 1

Using the original example file, set the proposer's name to Zaphod Beeblebrox and the proposer's institution to HHGTTG, and then save the ASDF under a different name, read it back in and verify the new name and institution using info.

## Exercise 2

Using the file saved for Exercise 1, add a third data array with the attribute 
name of data3 that is 3 a dimensional
boolean array with dimensions (10, 10, 10).
Delete data2 and save the under a different name. Open the saved file and verify
it contains the new items.