Writing your second ASDF converter tutorial
===================================

This tutorial is intended to include the use of ASDF schemas
for the purposes of validating the ASDF files generated and
read by this converter.

What are ASDF schemas for?
----------------------------------

As such, it represents a different level of sophistication and
consequently, more work. But it has the benefit of ensuring that
the files it tries to read are legal against the expectations
of the converter. For example, if someone hand edited an ASDF
file where elements of the file include your serialized object,
and they modify that object in such a way as to invalidate the
object, the schema mechanism provides a way of checking that
they haven't broken it. This makes it simple to help ensure that
a file that is causing problems isn't with the file itself.

An aside: the schema helps eliminate many problems, but not all
necessarily. For example if one of the attributes of your object
requires that it be a prime number, there is no schema mechanism
to ensure it is a prime number, but it can ensure that the file
has the required attributes, that they are of the right type,
and can place some constraints on their values (e.g., min and 
max values, or that strings match some regular expression, or 
are one of an enumerated set of permitted values).

Goal of this tutorial
-----------------------

To write converters that use schemas, and to ensure that they
be language independent (i.e., don't require installing Python)
we will be building a framework to allow that (if you ever 
suspect that that the schemas will be used by other libraries
that are not in Python). Doing so adds some added complexity,
so this tutorial is aimed mostly at those that will be writing
schemas for pipeline code that generate products that should
not intrinsically require Python to read. (A separate tutorial
may be written for cases where schemas are desired, but only 
for Python).

It takes a bit of effort to understand how to write schemas. 
If they seem odd and a bit confusing, that is normal. They 
are their own language of a sort and it takes a little time
to get used to them.

The schemas also involve providing mechanisms to link the schemas
to tags so that the library knows how to associate the the schemas
and tags with each other.

Finally, we will be using schema manifests, which provide a
more convenient way of handling multiple schemas and tags,
and to illustrate, we will use two tags this time around.
 
Groundwork
--------------

This tutorial requires making two separate packages, one for
the schemas, a separate one for the converters (again, so that
the schemas are not tied directly to Python; but note there
will be a small amount of code in the schema package for Python,
and we expect that if other languages are used, code for those
may appear there as well).

A utility has been developed so that the schema package is 
fairly easily generated and this tutorial will use that utility.
The structure of the converter package will be very similar to
that of the first tutorial.

Required Software
---------------------

- numpy
- asdf v2.8 or higher

The following is recommended when creating your own package but 
not directly needed for this tutorial

- cookiecutter (do a `pip install cookiecutter`)

Software uninstall
---------------------

If you did the first tutorial (Your_first_ASDF_converter), in a terminal window 
type `pip uninstall mfconverter` since this tutorial will reuse the same 
photoID class and we don't want confusion between the two converters.

Create schema package
----------------------------

First ensure you are in the directory that you want to create both packages in, by default it will be your home directory below, but change it to what you want before executing.

In [None]:
cd ~/converter_tutorial


The following utility will build a schema package. It will
prompt for answers to a series of questions. Here is an 
explanation of what is expected for these questions. **You must
run this utility from a terminal window.** Type the command as
shown in this directory and provide the answers as shown below.
If this is rerun you may get an initial prompt mentioning the
template has been downloaded before; just type return to accept
the default `yes`

`
cookiecutter gh:asdf-format/schemas-package-template
`

`
package_name [asdf-example-schemas]: myschemas
`

This will be the directory name it will go in.

`
module_name [myschemas]: 
`

Accept the default (providing a different answer will create a
different module name within the package

`
Short_description [mfconverter]: my first schemas
`

Self evident

`
author_name [Author Name]: Bullwinkle Moose
`

Provide a phony answer. You don't really want people to know who 
wrote this, do you?

`
author_email [author@example.com]: bullwinkle@stsci.edu
`

`
github_project [github-org/myschemas]: [bullwinkle/myschemas]
`

This won't actually create a repository on github, but is useful
if you will add it to github

`
project_url [https://github.com/bullwinkle/myschemas]]: 
`

`
uri_authority [example.com]: stsci.edu 
`

`
uri_project [example-project]:
`

But since this is jupyter-based, we will do part of what it does
manually later, partly because editing files within jupyter is 
not simple within the notebook for a tutorial.

Setting up the converters
-----------------------------

This will be similar to the first tutorial, but with some changes
the structure. There will be two converters to illustrate how to
handle multiple converters.

In [None]:
mkdir myconverters

In [None]:
mkdir myconverters/src

In [None]:
mkdir myconverters/src/myconverters

In [None]:
cd myconverters

Add the usual files

In [None]:
%%writefile src/myconverters/photo_id.py
class PhotoID:
    "Holds Phot ID information"

    def __init__(self, last_name, first_name, image):
        "expects a monochromatic numpy array for image"
        self.last_name = last_name
        self.first_name = first_name
        self.photo = image
    
    def name(self):
        return self.last_name + ', ' + self.first_name

Add a second class that references the first.

In [None]:
%%writefile src/myconverters/traffic_citation.py
from .photo_id import PhotoID


class TrafficCitation:
    "Record of a traffic violation"

    def __init__(self, ociffer, violation, date, time, photo_id):
        self.ociffer = ociffer
        self.violation = violation
        self.date = date
        self.time = time
        if not isinstance(photo_id, PhotoID):
            raise ValueError("not a PhotoID instance")
        self.photo_id = photo_id

In [None]:
%%writefile src/myconverters/converters.py
from asdf.extension import Converter
from .photo_id import PhotoID
from .traffic_citation import TrafficCitation


class PhotoIDConverter(Converter):
    tags = ["asdf://stsci.edu/example-project/tags/photo_id-*"]
    types = ["myconverters.photo_id.PhotoID"]
    # The above registers the tag that the converter is used for, as well as
    # associating the class that the converter is used for.

    # This method converts from the Python object to yaml
    def to_yaml_tree(self, obj, tags, ctx):
        # The yaml conversion expects a dictionary returned
        node = {}
        node['first_name'] = obj.first_name
        node['last_name'] = obj.last_name
        node['photo'] = obj.photo
        return node

    # This method converts from yaml to the Python object
    def from_yaml_tree(self, node, tag, ctx):
        return PhotoID(node['last_name'],
                       node['first_name'],
                       node['photo'])


class TrafficCitationConverter(Converter):
    tags = ["asdf://stsci.edu/example-project/tags/traffic_citation-1.0.0"]
    types = ["myconverters.traffic_citation.TrafficCitation"]

    def to_yaml_tree(self, obj, tags, ctx):
        node = {}
        node['ociffer'] = obj.ociffer
        node['violation'] = obj.violation
        node['date'] = obj.date
        node['time'] = obj.time
        node['photo_id'] = obj.photo_id
        return node

    def from_yaml_tree(self, node, tag, ctx):
        return TrafficCitation(node['ociffer'],
                               node['violation'],
                               node['date'],
                               node['time'],
                               node['photo_id'])

In [None]:
%%writefile src/myconverters/extensions.py
from asdf.extension import ManifestExtension
from .converters import (
    PhotoIDConverter,
    TrafficCitationConverter,)

MY_CONVERTERS = [
    PhotoIDConverter(),
    TrafficCitationConverter(),
]

MY_EXTENSIONS = [
    ManifestExtension.from_uri(
        "asdf://stsci.edu/example-project/manifests/allmyschemas-1.0",
        converters=MY_CONVERTERS)
]

Notice the changes to the last file as compared to the first tutorial.
Now it references a manifest that is elsewhere and provides a list
of converter instances instead. This manifest will be located in the
schema package and is used to associate schemas with tags.

In addition, an additional file is created for the purposes of this
module's entry point.

In [None]:
%%writefile src/myconverters/integration.py
def get_extensions():
    from . import extensions
    return extensions.MY_EXTENSIONS

Add the `__init__.py`

In [None]:
%%writefile src/myconverters/__init__.py



And now to add the setup files. Note the change to integrations.py for get_extensions

In [None]:
%%writefile setup.cfg
[metadata]
name = myconverters
description = TODO
long_description = TODO
author = Bullwinkle
version='0.1.0'
license = BSD-3-Clause

[options]
zip_safe = True
python_requires = >=3.6
setup_requires =
    setuptools_scm
install_requires =
    jsonschema>=3.0.2
    asdf>=2.8
    psutil>=5.7.2
    numpy>=1.16
package_dir =
    =src
packages = find:

[options.entry_points]
asdf.extensions =
    my_extension = myconverters.integration:get_extensions

[options.packages.find]
where = src

In [None]:
%%writefile setup.py
#!/usr/bin/env python3
from setuptools import setup

setup()

Dealing with the Schemas
------------------------------
The utility previously mentioned would have contructed a directory tree for us with many files populated

Only the strictly necessary parts will be done manually, partly to minimize what has to be done, but also it isn't easy to edit files in a tutorial notebook.

In [None]:
cd ..

In [None]:
mkdir myschemas

In [None]:
cd myschemas

In [None]:
mkdir src

In [None]:
mkdir src/myschemas

In [None]:
mkdir src/myschemas/resources

In [None]:
mkdir src/myschemas/resources/manifests

In [None]:
mkdir src/myschemas/resources/schemas

Create the setup files

In [None]:
%%writefile setup.py
#!/usr/bin/env python3
from setuptools import setup

setup()

In [None]:
%%writefile setup.cfg
[metadata]
name = myschemas
description = Schemas for second converter tutorial
long_description = TODO
author = Bullwinkle
version='0.1.0'
license = BSD-3-Clause

[options]
zip_safe = True
python_requires = >=3.6
setup_requires =
    setuptools_scm
install_requires =
    jsonschema>=3.0.2
    asdf>=2.8
    psutil>=5.7.2
    numpy>=1.16
package_dir =
    =src
packages = find:

[options.entry_points]
asdf.resource_mappings =
    myschemas = myschemas.integration:get_resource_mappings

[options.package_data]
myschemas.resources = manifests/*.yaml, schemas/*.yaml

[options.packages.find]
where = src


Note that the form of the entry points is different now that schemas are involved.
Also there is a need to add an indication that the schema and manefits files must
installed along with the software.

Now create the two needed schema files. The explanation of how the schema works
will follow the creation of the schemas.

In [None]:
%%writefile src/myschemas/resources/schemas/photo_id-1.0.0.yaml
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: asdf://stsci.edu/example-project/schemas/photo_id-1.0.0

title: Photo ID information
type: object
properties:
  last_name:
    title: Last name of the ID holder
    type: string
  first_name:
    title: First name of ID holder
    type: string
  photo:
    title: Monochromatic photo of ID holder
    tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
    datatype: int8
    ndim: 2

propertyOrder: [last_name, first_name, photo]
flowStyle: block
required: [last_name, first_name, photo]
...


In [None]:
%%writefile src/myschemas/resources/schemas/traffic_citation-1.0.0.yaml
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: asdf://stsci.edu/example-project/schemas/traffic_citation-1.0.0

title: Traffic Citation Information
type: object
properties:
  ociffer:
    title: Name of ociffer issuing citation
    type: string
  violation:
    title: Type of traffic violation
    type: string
    enum: ["speeding", "DWI", "Failure to Signal", "Driving like a jerk"]
  date:
    title: Date of violation
    type: string 
  time:
    title: Time of day of violation
    type: string
  photo_id:
    title: photo_id of the driver committing violation
    tag: asdf://stsci.edu/example-project/tags/photo_id-1.0.0

propertyOrder: [ociffer, violation, date, time, photo_id]
flowStyle: block
required: [ociffer, violation, date, time, photo_id]
...


As mentioned previously, the purpose of the schemas is to indicate what 
is required of the associated ASDF yaml content corresponding to the
tag that identifies that content.

The schemas follow the the definition of schema elements defined by
JSON Schema, which is a standard for schemas. Nominally this standard
expects schemas to be written in JSON, but just to confuse you, we 
write schemas in YAML to be consistent with the fact that we are
heavily centered on YAML (essentially there is a simple one-to-one
correspondence between the elements defined for the JSON representation
and the YAML representation; in fact, JSON is a subset of YAML).

A full and fairly accessible explanation of JSON Schema can be found
in this online book: https://json-schema.org/understanding-json-schema/

What follows is an annotated version of the photo ID schema 
```
%YAML 1.1
---
# The previous is a standard YAML header with --- indicating the start
# of the YAML content

# The following line identifies the meta-schema that applies to the 
# schema itself. It is possible to customize that for more specialized
# versions
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
# The following line identifies the schema uniquely
id: asdf://stsci.edu/example-project/schemas/photo_id-1.0.0

# Wherever you see title as an attribute it is intended as a short
# description of the contents
title: Photo ID information

# "type: object" means that the contents matching this schema should
essentially be a dictionary at the first level (there can be more 
than one level in an object schema.
type: object

# The properties attribute lists all the attributes that have any 
# information about them, and particularly any constraints on their
# types or valuse
properties:
  last_name:
    title: Last name of the ID holder
    # type indicates the type that the actual value in the YAML content
    # must have. It will fail validation if the value is not the right type.
    type: string
  first_name:
    title: First name of ID holder
    type: string
  photo:
    title: Monochromatic photo of ID holder
    # tag is an alternate way of specifying the type. In this case the
    # appearance of this type means that the content of the photo
    # attribute must be consistent with what this tag specifies (through
    # the corresponding schema). In this example, a 2-d byte array is
    # required, and the datatype and ndim attributes are extensions of
    # the JSON Schema machinery for ASDF (they are not part of the basic
    # JSON Schema system.)
    tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
    datatype: int8
    ndim: 2

# In the following, only "required" is part of validation. It means these
# attributes must be present in the content covered by the schema. Without
# this, content with type or value restrictions is generally optional.
# PropertyOrder and flowStyle are indications to the library as to the
# desired way the YAML content is to be written to a file. PropertyOrder
# specifies the order that attributes should appear in the ASDF file, and
# flowStyle indicates which of the two ways of representing YAML should be
# used to write the file (block means using an indented style rather than
# one using block delimiters such as brackets of any kind).
propertyOrder: [last_name, first_name, photo]
flowStyle: block
required: [last_name, first_name, photo]
...
```

Regarding the traffic_citation schema, all the above applies as well,
but there is one new element for the violations attribute.

```
  violation:
    title: Type of traffic violation
    type: string
    # The following enum attribute indicates that the required string can
    # only have four allowed values, those listed for enum. Any other 
    # value will cause a validation error.
    enum: ["speeding", "DWI", "Failure to Signal", "Driving like a jerk"]
```

The rest of the content of this package involves making the necessary
connections to ASDF and to the associated tags.

We will start with the manifest file. This file makes it simpler to 
put all the connections in one place, and it allows sharing schemas
between different types.

In [None]:
%%writefile src/myschemas/resources/manifests/allmyschemas-1.0.yaml
%YAML 1.1
---
id: asdf://stsci.edu/example-project/manifests/allmyschemas-1.0
extension_uri: asdf://stsci.edu/example-project/extensions/allmyschemas-1.0
title: All my schemas 1.0
description: |-
  A set of tags for serializing all my schemas.
asdf_standard_requirement:
  gte: 1.1.0
tags:
# Object Modules
- tag_uri: asdf://stsci.edu/example-project/tags/photo_id-1.0.0
  schema_uri: asdf://stsci.edu/example-project/schemas/photo_id-1.0.0
  title: Photo ID information
  description: |-
    Name on Photo ID and the photo from the Photo ID
- tag_uri: asdf://stsci.edu/example-project/tags/traffic_citation-1.0.0
  schema_uri: asdf://stsci.edu/example-project/schemas/traffic_citation-1.0.0
  title: Information about an issued traffic citation
  description: |-
    Information about an issued traffic citation including ociffer name, type of violation,
    date, time, and photo ID.
...




Note that the manifest associates a tag with a schema, and ASDF will use
that to map from one to the other.

The last file to add is the Python code used for entry points to do just
that.

In [None]:
%%writefile src/myschemas/integration.py
import sys

from asdf.resource import DirectoryResourceMapping

if sys.version_info < (3, 9):
    import importlib_resources
else:
    import importlib.resources as importlib_resources


def get_resource_mappings():
    """
    Get the resource mapping instances for myschemas
    and manifests.  This method is registered with the
    asdf.resource_mappings entry point.

    Returns
    -------
    list of collections.abc.Mapping
    """
    from . import resources
    resources_root = importlib_resources.files(resources)

    return [
        DirectoryResourceMapping(
            resources_root / "schemas", "asdf://stsci.edu/example-project/schemas/"),
        DirectoryResourceMapping(
            resources_root / "manifests", "asdf://stsci.edu/example-project/manifests/"),
    ]


The key element in this file are the last few lines. What this is 
indicating is that when a schema id is encountered and the beginning
of that schema_id matches the last string, the remainder of that
schema id is taken from the schema ID and appended to the local
path for the resource directory plus the "schemas" subdirectory
to construct the path to the actual schema file.

Likewise for the manifest, where the manifest id is located in the 
extensions.py file for the converter package. Basically, it tells the
converter package, where the manifest file may be found on the filesystem,
and from that, where the schemas may be found on the local filesystem.

Did we say last file? Not quite. We need an empty `__init__.py` to make
this a package.

In [None]:
%%writefile src/myschemas/__init__.py



In [None]:
%%writefile src/myschemas/resources/__init__.py



Installing the packages
---------------------------

In [None]:
pip install --editable .

In [None]:
cd ../myconverters

In [None]:
pip install --editable .

Testing the converters
--------------------------

The following code can restart the converter without having
go through the notebook again

In [None]:
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)
restartkernel()

In [None]:
cd ~

In [None]:
import numpy as np
import asdf
from myconverters.photo_id import PhotoID
from myconverters.traffic_citation import TrafficCitation

In [None]:
im = np.zeros((10,10), dtype=np.int8)
p = PhotoID('man', 'invisible', im)
tc = TrafficCitation('Dudley Doright','speeding','2021-7-1', '11:11:11', p)
af = asdf.AsdfFile()
#af.tree = {'id': p}
af.tree = {'citation': tc}
af.write_to('test.asdf', auto_inline=200)

In [None]:
more test.asdf

In [None]:
af2 = asdf.open('test.asdf') # See what happens when we read it back in.

In [None]:
type(af2.tree['citation'])

In [None]:
type(af2.tree['citation'].photo_id)

In [None]:
af2.tree['citation'].ociffer

What happens if we break the law?

In [None]:
af2.tree['citation'].ociffer = 27

In [None]:
af2.write_to('test2.asdf')

In [None]:
asdf.__version__