# The Advanced Scientific Data Format (ASDF)

## A Practical Guide

## Outline

### Working With ASDF Files
- Reading files
- Accessing and modifying metadata
- Writing files
- Exercise 1

### Serializing Custom Objects To ASDF

- Transforms
- Coordinate Frames
- Time
- Tables
- Units and Quantities

### Generalized World Coordinate System (GWCS)

- Motivation for GWCS
- Features
- Imaging Example

### Reading Files

The Python ASDF library is a standalone package distributed through PyPi and conda-forge.

In [5]:
import asdf

To open a file use the *open* function. It is useful to look up the keyword arguments it accepts, there are options specifying in what mode a file should be opened or whether it should be validated during opening. For this example we will use the default behavior and look at the object.

In [3]:
import os
from pathlib import Path
from astropy.utils.data import download_file
filename = "r0000101001001001001_01101_0001_WFI01_cal.asdf"

import requests
import shutil

REMOTE_URL = "https://data.science.stsci.edu/redirect/Roman/Roman_Data_Workshop/ExampleData/Build14/"
remote_path = "r0000101001001001001_01101_0001_WFI01_cal.asdf"
LOCAL_DIRECTORY = Path(os.curdir) # Path(__file__).parent
local_path = LOCAL_DIRECTORY / Path(remote_path)

In [None]:
filename = download_file(REMOTE_URL + remote_path)
shutil.move(filename, local_path)

In [66]:
afr = asdf.open(local_path)



In [11]:
asdf.info(af, max_rows=200)

[1mroot[0m (AsdfObject)
[2m├─[0m[1masdf_library[0m (Software)
[2m│ ├─[0m[1mauthor[0m (str): The ASDF Developers
[2m│ ├─[0m[1mhomepage[0m (str): http://github.com/asdf-format/asdf
[2m│ ├─[0m[1mname[0m (str): asdf
[2m│ └─[0m[1mversion[0m (str): 3.1.0
[2m├─[0m[1mhistory[0m (dict)
[2m│ └─[0m[1mextensions[0m (list)
[2m│   ├─[0m[[1m0[0m] (ExtensionMetadata)[3m ...[0m
[2m│   ├─[0m[[1m1[0m] (ExtensionMetadata)[3m ...[0m
[2m│   ├─[0m[[1m2[0m] (ExtensionMetadata)[3m ...[0m
[2m│   ├─[0m[[1m3[0m] (ExtensionMetadata)[3m ...[0m
[2m│   ├─[0m[[1m4[0m] (ExtensionMetadata)[3m ...[0m
[2m│   └─[0m[[1m5[0m] (ExtensionMetadata)[3m ...[0m
[2m└─[0m[1mroman[0m (WfiImage)[2m[3m # The Schema for WFI Level 2 Images.[0m[0m
[2m  ├─[0m[1mmeta[0m (dict)
[2m  │ ├─[0m[1maperture[0m (Aperture)[2m[3m # Aperture Information[0m[0m[3m ...[0m
[2m  │ ├─[0m[1mcal_step[0m (L2CalStep)[2m[3m # Level 2 Calibration Status[0m[0m[3m ...[

In [70]:
af.search(key='exposure')

[2m[3mNo results found.[0m[0m

Modifying a value is done by assigning it.

In [23]:
print(af['roman']['meta']['exposure']['exposure_time'])
af['roman']['meta']['exposure']['exposure_time'] = 200
print(af['roman']['meta']['exposure']['exposure_time'])

133.76
200


In [45]:
af['roman']['meta']['exposure']['exposure_time'] = 200
af.write_to('test.asdf')

ValidationError: 'FW23' is not one of ['F062', 'F087', 'F106', 'F129', 'F146', 'F158', 'F184', 'F213', 'GRISM', 'PRISM', 'DARK']

Failed validating 'enum' in schema['properties']['optical_element']:
    {'$schema': 'asdf://stsci.edu/datamodels/roman/schemas/rad_schema-1.0.0',
     'description': 'Name of the filter element used. See the RDox Optical '
                    'Element page for more\n'
                    'details on available optical elements and their '
                    'properties.\n',
     'enum': ['F062',
              'F087',
              'F106',
              'F129',
              'F146',
              'F158',
              'F184',
              'F213',
              'GRISM',
              'PRISM',
              'DARK'],
     'id': 'asdf://stsci.edu/datamodels/roman/schemas/wfi_optical_element-1.0.0',
     'title': 'Optical Element',
     'type': 'string'}

On instance['optical_element']:
    'FW23'

**Exercise 1**

- Look up the parameters to the *asdf.open* function and experiemnt with opening the same file passing different parameters
  - `memmap=True` which memory maps the file on disk
  - `lazy_tree=True` which will not load objects in memory until they are accessed.
- Use `asdf.info` with different parameters to see most of the file.
- Use `search` to find the path to `optical_element` and change it.
- Run `asdf.validate` to validate the change in memory

ASDF uses schemas to validate files. In the case of `optical_element` the schema includes an enumerated list of the allowed values. Although the type is correct (string) the file does not validate because the assigned value is not one of the allowed ones.

### Creating ASDF Files


ASDF files store their information using a tree (nested key/value) structure. This allows the stored information to be hierarchically organized within the file. Without any extensions, this tree is a nested combination of basic data structures:

- maps
- lists
- arrays
- strings
- booleans
- numbers

All of which are stored using yaml. 

The Python analogs for these types are:
maps -> dict,
lists -> list,
arrays -> np.ndarray,
strings -> str,
booleans -> bool,
and numbers -> int, float, complex (depending on the type of number).

Where np.ndarray are treated in a special way distinct from regular yaml (binary blocks). Note that due to limits imposed by Python, dictionary keys are limited to bool, int, or str types only, while value information can be any of the above data types.

Typically, when creating an ASDF file using the python library, one begins by creating a nested Python dictionary which corresponds to the nested tree structure one wants the file to have. Indeed, one can interact with any AsdfFile object as if it were a dictionary representing this tree structure.

Note that more complex structures (ones not directly supported by yaml) are denoted using yaml tags. However, those tagged "sub-trees" are still comprised of the above basic structures and other tagged sub-trees. Additional tagged objects are supported via ASDF extensions.

In [49]:
af = asdf.AsdfFile()
tree = {'greetings': 'Hello'}
af.tree = tree
af.write_to("basic_types.asdf")

ASDF can save arrays, in particular numpy arrays (np.array). Indeed, much of ASDF is dedicated to efficiently saving arrays.

For example, saving a random 8x8 numpy array:

In [54]:
# note the shortcut - tree is used silently
import numpy as np

af = asdf.AsdfFile()
af["random_array"] = np.random.rand(8, 8)
af.write_to("random.asdf")

**Exercise 2: Create an ASDF file which serializes** 

- an example of each of the above primitive types
- a numpy array
- Bonus: Save a compressed array to disk. *Hint:* Look at the write_to parameters to see the available compression algorithms.

In [71]:
# Enter solution here


As mentioned above, other types of objects can also be serialized by ASDF including objects outside the ASDF-standard. Support for these objects requires the creation of an ASDF extension. The process of creating an ASDF extension is describes in the documentation and is beyond the scope of this tutorial. However, several extensions already exist and next we'll look at what custom types can be serialized.
In general, using the extensions is transparent to end users. All one needs to do is to initialize a supported object in memory and assign it to the ASDF tree. Serializing is taken care of by the extension. Once a file is read the extension has utilities to deserialize the object and regenerate it in memory.

Bonus: Save a compressed array to disk. Look at the `write_to` parameters to see the available compression algorithms.

In [55]:
# Enter solution here

#### Transforms

The astropy.modeling package provides a framework for representing models and performing model evaluation and fitting. All of the astropy.modeling classes are serializable in ASDF.

**Exercise 3: Serialize a transform**

Serialize a Gaussian model. 
*Hint:* All models are under common namespace. Models are initialized by passing parameters to them. The name of the parameters is in the `param_names` attribute.

In [57]:
from astropy.modeling import models

models.Gaussian1D.param_names

('amplitude', 'mean', 'stddev')

In [None]:
# Enter solution here

#### Coordinate Frames

The `astropy.coordinates` package provides classes for representing a variety of celestial/spatial coordinates and their velocity components, as well as tools for converting between common coordinate systems in a uniform way. It also has a `SpectralCoordinate` and `StokeCoordinate` classes.

In [61]:
from astropy import coordinates as coord

sky = coord.SkyCoord(5.6, -70.2, unit=('deg', 'deg'))
af = asdf.AsdfFile()
af['sky'] = sky
af.write_to('sky.asdf')

**Exercise 4: Serialize a coordinate frame**

*Hint:* `astropy.coordinates` has classes for commonly used celestial frames. Serialize an FK5 frame, no data.

In [None]:
# Enter solution here

#### Time

The astropy.time package provides functionality for manipulating times and dates. To initialize it supply a string and a format, or supply a datetime object.

**Exercise 5: Serialize a Time object in isot format**

*Hint:* To generate a Time object use
```
from astropy.time import Time

times = ['1999-01-01T00:00:00.123456789', '2010-01-01T00:00:00']
t = Time(times, format='isot', scale='utc')
```

In [63]:
# Enter solution here



#### Units and Quantities

`astropy.units` handles defining, converting between, and performing arithmetic with physical quantities, such as meters, seconds, Hz, etc. 

**Exercise 6: Serialize an array of wavelengths in microns**

*Hint:* To generate a quantity representing an array of wavelengths in microns use

```
from astropy import units as u

q = np.linspace(.5,  2.6, 10) * u.um
```

In [65]:
# Enter solution here



#### Tables

`astropy.table` provides functionality for storing and manipulating heterogeneous tables of data in a way that is familiar to numpy users.


**Exercise 7: Serialize an astropy table with quantities**


*Hint:* Astropy supports tables with quantities. To generate a table use

```
from astropy.table import QTable
import astropy.units as u
import numpy as np

a = np.array([1, 4, 5], dtype=np.int32)
b = [2.0, 5.0, 8.5]
c = ['x', 'y', 'z']
d = [10, 20, 30] * u.m / u.s

t = QTable([a, b, c, d],
           names=('a', 'b', 'c', 'd'),
           meta={'name': 'first table'})
```

Look at the resulting file.



In [None]:
# Enter solution here



#### Generalized World Coordinate System (GWCS)

Overview

We call "WCS" the mapping from "pixel" coordinates to some "real-world" physical coordinates - celestial, spectral, time, etc. GWCS is a generalized implementation of WCS aiming to avoid the limitations of the FITS WCS standard. It is a flexible toolkit for expressing and evaluating transformations between pixel and world coordinates, as well as intermediate coordinates. The GWCS object supports a data model which includes the entire transformation pipeline from input pixel coordinates to world coordinates (and vice versa).

GWCS is based on astropy and supports the Common Interface for WCS. The WCS "pipeline" is a list of steps, where each step is a tuple of coordinate frame and a transform to the next frame. The transform in the last step is None representing the final coordinate frame of the WCS pipeline.

Transforms are based on astropy.modeling and include support for coordinate units. Coordinate frames utilize astropy.coordinates. The GWCS object is serialized to ASDF using the ASDF WCS and transforms extensions.

GWCS objects are serializable to ASDF. Currently GWCS is used by JWST, DKIST and Roman.
Let's look atthe fiole we opened initially, which is a Roman simulation for the WFI instrument, and search for the string "wcs".

In [69]:
afr.search(key='wcs')

[1mroot[0m (AsdfObject)
[2m└─[0m[1mroman[0m (WfiImage)[2m[3m # The Schema for WFI Level 2 Images.[0m[0m
[2m  └─[0m[1mmeta[0m (dict)
[2m    ├─[0m[1mcal_step[0m (L2CalStep)[2m[3m # Level 2 Calibration Status[0m[0m
[2m    │ └─[0m[1massign_wcs[0m (str): COMPLETE[2m[3m # Assign WCS Step[0m[0m
[2m    ├─[0m[1mwcs[0m (WCS)
[2m    ├─[0m[1mwcs_fit_results[0m (dict): {'<rot>': 2.1166473207256526e-06, '<scale>': 1.0, 'center': [-[2m[3m (truncated)[0m[0m
[2m    └─[0m[1mwcsinfo[0m (Wcsinfo): {'v2_ref': 1546.3846181707652, 'v3_ref': -892.7916365721071, 'vpari[2m[3m (truncated)[0m[0m

In [68]:
print(afr['roman']['meta']['wcs'])

  From                  Transform                
-------- ----------------------------------------
detector                            CompoundModel
    v2v3 JWST tangent-plane linear correction. v1
v2v3corr                                 v23tosky
   world                                     None


**Example: Imaging WCS**
