In [None]:
import asdf
import os
import numpy as np

from dataclasses import dataclass
from pathlib import Path

# 6 - Creating Custom ASDF Extensions

Often we want to be able to save our "custom" python objects to ASDF in a "seamless"
fashion. Earlier we were able to save various `astropy` objects using `asdf-astropy`
in [tutorial 3](../03-Creating_ASDF_Files/03-Creating_ASDF_Files.ipynb). Here we will
discuss how to create the necessary ASDF extension(s) to support doing this for
a "custom" object so that ASDF can read (deserialize) and write (serialize) that
object. Note that for a given object, we typically expect an ASDF extension
supporting that object to "round-trip" that object, meaning the object can be serialized
to ASDF and then deserialized from ASDF to an object which is an exact functional
copy of the original object.

## Example Object

Let's create a relatively simple Python object, which we would like to handle seamlessly with ASDF. For our purposes lets consider a geometric ellipse described by its
- semi-major axis
- semi-minor axis

In [None]:
@dataclass
class Ellipse:
    """An ellipse defined by semi-major and semi-minor axes.

    Note: Using a dataclass to define the object so that we get `==` for free.
    """

    semi_major: float
    semi_minor: float

Note that ASDF will handle objects contained inside the objects you wish to serialize provided that those objects are handled intrinsically by ASDF or an extension which handles each particular object is available for ASDF to use. For example, if we wanted to specify the axes of the ellipse using astropy Quantity objects (to attach units), so long as asdf-astropy is installed, ASDF would handle this transparently.

## Introduction to Writing an Extension

An ASDF extension requires two components to function properly:

1. A `tag` for the object, so that ASDF identify/validate the object to deserialize it from an ASDF
file.
2. A `Converter` for the object, so ASDF knows how to serialize and deserialize the object to and from
an ASDF file.

The `tag` is defined through the schemas and related resources for ASDF to use. While the `Converter`
is a python object which provides the code the ASDF library executes in order to handle the serialization
or deserialization process. The `Converter` is then wrapped inside an ASDF `Extension` object (which
can contain several different `Converter`s), which is then added to ASDF (typically) via an entry
point.

### Creating a `tag`

Recall that ASDF supports the use of schemas for validating the correctness of the information stored within its files.
Often one wishes to create a schema for a specific object so that the particular object the schema description can be
reused in other schemas. A `tag` is a reference to a specific schema or set of schemas that a particular value in an ASDF
file tree need to satisfy. This `tag` is then used within the `yaml` metadata to identify the sub-tree which represents
the object within the ASDF file. Thus the tag serves two purposes:

1. Identifying the schema used to validate a sub-tree of the ASDF tree.
2. Identifying the object a particular sub-tree describes.

This means that in order to create a `tag` for a given Python object we really need to create resource `yaml` files for ASDF
to do two things:

1. Contain schema(s) used by that `tag`.
2. Create an association between the schema(s) and the `tag`.

#### Creating a Schema

Recall that in tutorial 4, we discussed in depth how to create ASDF schemas. In particular note that schemas are typically
stored in `yaml` files which are then loaded into ASDF via an entry point.

To begin with lets create the schema for `Ellipse` dynamically (without needing to package our code for an entry point):

In [None]:
ellipse_uri = "asdf://example.com/example-project/schemas/ellipse-1.0.0"

ellipse_schema_content = f"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: {ellipse_uri}

type: object
properties:
  semi_major:
    type: number
  semi_minor:
    type: number
required: [semi_major, semi_minor]
...
"""

This can then be dynamically added to ASDF using the `add_resource_mapping`. This adds a map (`dict`) between a `uri`
(universal resource identifier) string and the content of the resource to ASDF. Later, when working with the entry
points directly, we will need to specify how to build these mappings.

Note we highly recommend as best practice to always have the `id `for any resource and the `uri` string be the same. This
is to limit the possibility of confusing how to look-up the given schema as JSON schema (the base language/library used
for ASDF schemas) uses the `id` field to reference resources among one-another, while ASDF uses the `uri` as keys to find
those resources on disk. One does not have to follow this practice, but it is highly discouraged.

In [None]:
asdf.get_config().add_resource_mapping({ellipse_uri: ellipse_schema_content})

Lets now load and check that the schema we just created is a valid schema:

In [None]:
schema = asdf.schema.load_schema(ellipse_uri)
asdf.schema.check_schema(schema)

Note that `asdf.schema.check_schema` will work directly on any `yaml` file loaded through the `pyyaml` interface.
Lets also attempt to validate a portion of an ASDF tree for `Ellipse` against this schema:

In [None]:
# Valid tree

test_ellipse_object = {"semi_major": 1.0, "semi_minor": 2.0}
asdf.schema.validate(test_ellipse_object, schema=schema)

In [None]:
# Invalid tree

test_ellipse_object = {"semi_major": 3.0}
asdf.schema.validate(test_ellipse_object, schema=schema)

Note that ASDF provides a `pytest` plugin which can be configured to automatically generate unit tests which will
check and validate all of the schemas in your package. In fact, you can include "examples" of the tree the schema
is checking and the plugin will test that those examples do correctly validate against the schema itself.

##### Exercise 1

Create, add, and check a schema for the `Rectangle` object below:

In [None]:
@dataclass
class Rectangle:
    base: float
    height: float

#### Creating the `tag` Itself

ASDF uses a special `schema` to specify the `tag`s for a given ASDF extension. This special type of `schema` is called a
`manifest` which lists each `tag` as a pair of `uris`:

- `tag_uri`, the `uri` which will be used for the `tag`.
- `schema_uri`, the ASDF `uri` used to reference the specific `schema`(s) involved.

This allows for a given `schema` to be reused for multiple `tag`s. Such as for objects which contain the same serializable
data, but have different Python functionalities which need to be distinguished.

The following is an example for creating/adding a manifest for an extension which has the resources for the `Ellipse` object:

In [None]:
shapes_manifest_uri = "asdf://example.com/example-project/manifests/shapes-1.0.0"
shapes_extension_uri = "asdf://example.com/example-project/extensions/shapes-1.0.0"
ellipse_tag = "asdf://example.com/example-project/tags/ellipse-1.0.0"

shapes_manifest_content = f"""
%YAML 1.1
---
id: {shapes_manifest_uri}
extension_uri: {shapes_extension_uri}

title: Example Shape extension 1.0.0
description: Tags for example shape objects.

tags:
  - tag_uri: {ellipse_tag}
    schema_uri: {ellipse_uri}
...
"""

asdf.get_config().add_resource_mapping({shapes_manifest_uri: shapes_manifest_content})

We again add this `manifest` to ASDF via the `add_resource_mapping` interface. Note that the `extension_uri` field defines
the `uri` that the whole `Extension` (resource(s) combined with Converter(s)) uses within ASDF. The `extension_uri` will
be referenced later by the `Extension` object so that the extension code will be available when the correct resources are
available and vice-versa.

Since the `manifest` is just like any other schema we can check it in the same way:

In [None]:
# check
schema = asdf.schema.load_schema(shapes_manifest_uri)
asdf.schema.check_schema(schema)
asdf.schema.validate(shapes_manifest_content, schema=schema)

##### Exercise 2

Add a `tag` for your `rectangle` schema to the `shapes-1.0.0` manifest and add your manifest to ASDF.

### Create a `Converter`

All converters should be constructed as subclasses of the abstract type `asdf.extension.Converter`,
which requires that you define two methods:

1. `to_yaml_tree`: which converts a Python object into an ASDF tree.
2. `from_yaml_tree`: which converts an ASDF tree into a python object.

Note that these methods can account for the type/tag of the objects attempting to be converted.

Moreover your converter also needs to define the following two variables:

1. `tags`: A list of tags that this converter will use when reading ASDF.
2. `types`: A list of Python (object) types that this converter will use when writing ASDF.

Note that these lists do not need to be indexed with respect to each other, and that in order for
the converter to actually be used by ASDF, at least one of the `tags` needs to be registered as a
resource with ASDF (usually via the entry point).

An example converter for `Ellipse`:

In [None]:
class EllipseConverter(asdf.extension.Converter):
    tags = [ellipse_tag]
    types = [Ellipse]

    def to_yaml_tree(self, obj, tag, ctx):
        return {
            "semi_major": obj.semi_major,
            "semi_minor": obj.semi_minor,
        }

    def from_yaml_tree(self, node, tag, ctx):
        return Ellipse(semi_major=node["semi_major"], semi_minor=node["semi_minor"])

Recall that the converter itself will be added to ASDF via an `Extenstion` not as a resource like the
`schema`s above.  Note that, for performance of the entry points, one will normally defer the `import`
of the object (`Ellipse` in this case) until `from_yaml_tree` is actually called.

#### Exercise 3

Create a converter for the `Rectangle` object.

### Create the Full Extension

Now lets dynamically create an extension for ASDF to support the `Ellipse` object using the `EllipseConverter` we just
created and the `ellipse_tag` we created earlier.

This can be accomplished via using the `asdf.extensions.ManifestExtension.from_uri` constructor, which in our case requires
two arguments:

1. The `manifest_uri`, the `uri` the manifest was added under.
2. The `converters`, a list of instances of `Converter` classes.

Note that one can also pass a list of `Compressor` objects (ASDF objects to handle custom binary block compression).

An instance of the extension object can then be dynamically added to ASDF using the `add_extension` method.

In [None]:
shapes_extension = asdf.extension.ManifestExtension.from_uri(
    shapes_manifest_uri, converters=[EllipseConverter()]
)
asdf.get_config().add_extension(shapes_extension)

Alternately one can create an `Extension` object directly by extending `asdf.extension.Extension` and
specifying the variables:

1. `extension_uri`, the `extension_uri` specified within the `manifest` in question.
2. `converters`, the list of `Converter` objects making up the extension.
3. `tags`, the list of tags those `Converter` objects use from the `manifest` referenced.

which can then be dynamically added in exactly the same way.

Note that the `from_uri` constructor, figures out all this information from the `uri` and
`Converter` objects themselves.

In [None]:
class EllipseExtension(asdf.extension.Extension):
    extension_uri = shapes_extension_uri
    converters = [EllipseConverter()]
    tags = [ellipse_tag]


asdf.get_config().add_extension(EllipseExtension())

#### Testing the `Ellipse` Extension

Lets now check that we can round-trip an `Ellipse` object through ASDF:

In [None]:
ellipse = Ellipse(1.0, 2.0)

with asdf.AsdfFile() as af:
    af["ellipse"] = ellipse
    af.write_to("ellipse.asdf")

Let's examine the contents of the ASDF file and then read/compare them to our original object:

In [None]:
with open("ellipse.asdf") as f:
    print(f.read())

with asdf.open("ellipse.asdf") as af:
    print(af["ellipse"])
    assert af["ellipse"] == ellipse

##### Exercise 4

Create, add, and test an extension for your `RectangleConverter`.

## Using Entry-Points to Automatically Extend ASDF

Obviously, having to dynamically add all the resources and extensions to ASDF every time you want to work with a custom
object is tedious. Indeed, `asdf-astropy` only needs to be installed so that its `Extensions` are available for ASDF to
use. This is accomplished by using Python entry-points (mechanism for one python package to communicate information to
another Python package), to enable automatic discovery and loading of resources and extensions for ASDF.

Since entry-points are a means for Python packages to communicate with one-another, their use requires you to package
your Python code, which is can be a complex issue. Thus we will assume that you have an existing Python package, that
you wish to add our example ASDF extension to.

To create our entry-points we will need to make three modifications to the packaging components of the existing Python
package:

1. Create an entry point to add the resources to ASDF.
2. Create an entry point to add the extensions to ASDF.

Note that we will assume that you are using the `setup.cfg` file to configure your python package. This can also be
done via the `pyproject.toml` file in a similar fashion (see ASDF docs for details).

### Create an Entry-Point for the Resources

ASDF treats the information it receives from the entry-points it checks for resources as a function that it can evaluate
to get a list of resource mappings. To begin suppose that there your package is called `asdf_shapes` and the function you
need to call in order to get this list of mappings is `called get_resource_mappings` and is located in the `integration`
module, that is you need to import `get_resource_mappings` from `asdf_shapes.integration`. Given this setup you will need
to add the following to your `setup.cfg`:

```
[options.entry_points]
asdf.resource_mappings =
    asdf_shapes_schemas = asdf_shapes.integration:get_resource_mappings
```

Breaking this down:

- The entry-point ASDF checks for resources is `asdf.resource_mappings`.
- The identifier for your package's resources in this case is `asdf_shapes_schemas`.
- The method to that needs to be executed is `asdf_shapes.integration:get_resource_mappings` which corresponds to the form
`module:function`.

Now lets talk about how to create the `get_resource_mappings` function. First, lets go ahead a create the `yaml` files
for the resources we used in our example in order to illustrate an example organization of these resource files:

In [None]:
schema_root = "resources/schemas"
manifest_root = "resources/manifests"

os.makedirs(schema_root, exist_ok=True)
os.makedirs(manifest_root, exist_ok=True)

with open(f"{schema_root}/ellipse-1.0.0.yaml", "w") as f:
    f.write(ellipse_schema_content)

with open(f"{manifest_root}/shapes-1.0.0.yaml", "w") as f:
    f.write(shapes_manifest_content)

Normally we organize the resource files into a directory structure (as we just did) which can be parsed to form part of
the URI (`id`) used for each resource document as part of the file path. This is done so that adding resources can be
performed by ASDF by crawling these directory structures where the directory structure helps to determine the `uri`.

ASDF provides the `asdf.resource.DirectoryResourceMapping` object to crawl resource directories.  It allows us to turn
these directory structures into resource mappings, which can subsequently be added to ASDF using the entry-points.

These objects require two input parameters:

1. A path to the root directory which contains the resources to be added.
2. The prefix that will be used together with the file names to generate the `uri` for the resource in question.

There are some optional inputs:
1. `recursive`: (default `False`) which determines if the object will search recursively through subdirectories.
2. `filename_pattern`: (default: `*.yaml`) Glob pattern for the files that should be added.
3. `stem_filename`: (default: `True`) determine if the file extension should be removed when creating the `uri`.

In this case we do not need to set any of the file, here we need to do only the following:

In [None]:
# In module asdf_shapes.integration
def get_resource_mappings():
    schema_prefix = "asdf://example.com/example-project/schemas/"
    manifest_prefix = "asdf://example.com/example-project/manifests/"
    return [
        asdf.resource.DirectoryResourceMapping(schema_root, schema_prefix),
        asdf.resource.DirectoryResourceMapping(manifest_root, manifest_prefix),
    ]

Which can then be referenced by the entry-point. Note that for performance reasons, we suggest you limit the top-level
imports of the file(s) you load your entry points from to as few as possible, going as far as deferring imports to inside
the entry-point functions when possible. This is because asdf will import all of these models immediately when `asdf.open`
is called meaning large imports will cause noticeable delays especially when using the command-line interface.

### Create an Entry-Point for the Extensions

In a similar fashion to resources, ASDF assumes the entry-points it checks for extensions as functions which return
lists of `asdf.extension.Extension` objects. Thus lets assume your function is called `get_extensions` and is in the
`asdf_shapes.integration` module alongside get_resource_mappings. Adding the entry-point to `setup.cfg` for this would
look something like:

```
[options.entry_points]
asdf.extensions =
    asdf_shapes_extensions = asdf_shapes.integration:get_extensions
```

Breaking this down:

- The entry-point that ASDF checks for `Extension`s is `asdf.extensions`.
- The identifier for your package's extensions in this case is `asdf_shapes_extensions`.
- The method to that needs to be executed is `asdf_shapes.integration:get_extensions` which corresponds to the form
`module:function`.

The structure of `get_extensions` will be similar to that for `get_resource_mappings`:

In [None]:
# In module asdf_shapes.integration
def get_extensions():
    # import EllipseConverter inside this function
    return [
        asdf.extension.ManifestExtension.from_uri(
            shapes_manifest_uri, converters=[EllipseConverter()]
        )
    ]

Once your package is installed with these changes, ASDF will automatically detect and use your ASDF extension as needed
in a seamless fashion.

## Extending Our Example Object

Suppose that we want to extend our object so that it represents an Ellipse rotated around the origin off the XY plane,
that is add a position angle:

In [None]:
@dataclass
class RotatedEllipse(Ellipse):
    position_angle: float

### Extend the Schema

JSON schema does not support the concept of inheritance, which makes "extending" an existing schema somewhat awkward.
What we do instead is create a schema which adds attributes to the existing schema via the `allOf` operation. In this case,
we can define the a schema for `RotatedEllipse` by adding a `position_angle` property:

In [None]:
rotated_ellipse_uri = "asdf://example.com/example-project/schemas/rotated_ellipse-1.0.0"

rotated_ellipse_schema_content = f"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: {rotated_ellipse_uri}

allOf:
  - $ref: {ellipse_uri}
  - properties:
      position_angle:
        type: number
    required: [position_angle]
...
"""

asdf.get_config().add_resource_mapping(
    {rotated_ellipse_uri: rotated_ellipse_schema_content}
)

# check
schema = asdf.schema.load_schema(rotated_ellipse_uri)
asdf.schema.check_schema(schema)

test_rotated_ellipse_object = {
    "semi_major": 1.0,
    "semi_minor": 2.0,
    "position_angle": 3.0,
}
asdf.schema.validate(test_rotated_ellipse_object, schema=schema)

#### Exercise 5

Create a schema for the `RectanglularPrism` object below using the one for `Rectangle`:

In [None]:
@dataclass
class RectangularPrism(Rectangle):
    depth: float

### Create an Updated Manifest

Now lets extend the `shapes-1.0.0` manifest to include a `rotated_ellipse-1.0.0`
tag. Note that if a manifest is already released and in use, it is recommended that
one create a new manifest whenever schemas or tags need to be modified.

In [None]:
rotated_ellipse_tag = "asdf://example.com/example-project/tags/rotated_ellipse-1.0.0"

shapes_manifest_content = f"""
%YAML 1.1
---
id: {shapes_manifest_uri}
extension_uri: {shapes_extension_uri}

title: Example Shape extension 1.0.0
description: Tags for example shape objects.

tags:
  - tag_uri: {ellipse_tag}
    schema_uri: {ellipse_uri}

  - tag_uri: {rotated_ellipse_tag}
    schema_uri: {rotated_ellipse_uri}
...
"""

asdf.get_config().add_resource_mapping({shapes_manifest_uri: shapes_manifest_content})


# check
schema = asdf.schema.load_schema(shapes_manifest_uri)
asdf.schema.check_schema(schema)
asdf.schema.validate(shapes_manifest_content, schema=schema)

##### Exercise 6

Update the manifest with a tag for `RectangularPrism`.

### Create an Updated `Converter`

The "simplest" approach to creating a `Converter` for `RotatedEllipse` would be to simply
create a new converter as we did above for `Ellipse`; however, we can also take advantage
of the fact that multiple `tags` and `types` can be listed. Note that when multiple tags
are handled by the same `Converter`, we need to also implement a `select_tag` method:

In [None]:
class RotatedEllipseConverter(asdf.extension.Converter):
    tags = [ellipse_tag, rotated_ellipse_tag]
    types = [Ellipse, RotatedEllipse]

    def select_tag(self, obj, tag, ctx):
        if isinstance(obj, RotatedEllipse):
            return rotated_ellipse_tag
        elif isinstance(obj, Ellipse):
            return ellipse_tag
        else:
            raise ValueError(f"Unknown object {type(obj)}")

    def to_yaml_tree(self, obj, tag, ctx):
        tree = {
            "semi_major": obj.semi_major,
            "semi_minor": obj.semi_minor,
        }

        if tag == rotated_ellipse_tag:
            tree["position_angle"] = obj.position_angle

        return tree

    def from_yaml_tree(self, node, tag, ctx):
        if tag == ellipse_tag:
            return Ellipse(**node)
        elif tag == rotated_ellipse_tag:
            return RotatedEllipse(**node)
        else:
            raise ValueError(f"Unknown tag {tag}")

#### Exercise 7

Create a converter to support `Rectangle` and `RectagularPrism`.

### Creating an Updated `Extension`

We can now use this converter to create a new "updated" extension:

In [None]:
shapes_extension = asdf.extension.ManifestExtension.from_uri(
    shapes_manifest_uri, converters=[RotatedEllipseConverter()]
)
asdf.get_config().add_extension(shapes_extension)

### Checking the New Extension

Lets check this new extension by writing both an `Ellipse` and `Ellipse3D` object
to ASDF:

In [None]:
ellipse = Ellipse(1.0, 2.0)
rotated_ellipse = RotatedEllipse(1.0, 2.0, 3.0)

with asdf.AsdfFile() as af:
    af["ellipse"] = ellipse
    af["rotated_ellipse"] = rotated_ellipse
    af.write_to("rotated_ellipse.asdf")

# Check
with open("rotated_ellipse.asdf") as f:
    print(f.read())

with asdf.open("rotated_ellipse.asdf") as af:
    print(af["ellipse"])
    assert af["ellipse"] == ellipse

    print(af["rotated_ellipse"])
    assert af["rotated_ellipse"] == rotated_ellipse

#### Exercise 8

Create and check an extension supporting `Ellipse`, `RotatedEllipse`, `Rectangle`, and `RectangularPrism`.