Writing your first ASDF converter tutorial
=================================

This tutorial is intended to illustrate the basics of writing
Python converters for Python objects to serialize to and from
ASDF files.

The first part will only deal with the actual converter aspect.
A second tutorial will use schemas for converters.
The purpose of the schema is to provide data for validation
machinery so that any particular ASDF file that contains this 
tag is consistent with the schema.

What is an ASDF converter?
--------------------------------

The converter is the machinery for turning a Python object
into the yaml content in the ASDF file (and in the case of 
data also the binary content in the ASDF file),
and visa versa, being able to read an ASDF 
file that has the object serialized, and turn it back into
a Python object equivalent to the one it started out as.

Note: The object to be serialized to ASDF must consist
of only elements that ADSF knows how to serialize. ASDF does
know how to serialize numpy arrays and all standard 
Python primative types (e.g., strings, numbers, booleans,
lists and dictionaries), as well as objects that have 
serialization defined (e.g., astropy models and GWCS).
For example, if you define a class that has as one of
its attributes another Python class instance that doesn't
have a converter defined, the conversion will fail.

Goal of this tutorial
----------------------

There are several different ways of adding converters, some
simpler than this tutorial. This tutorial, however, expects
that the user of the ASDF extension wants to do as little
extra work to use the extension with ASDF, ideally
none at all, and this means using entry points. 
Furthermore, although much of this particular case can be
placed in fewer files, there is a good reason to use the 
separate files used in this tutorial.

Groundwork
--------------

For ASDF to be aware of the converters, it is necessary 
to include in the setup.cfg file information for an entry
point for ASDF. Entry points are
a very useful Python tool for making plug-ins for packages
easy for users of the plug-in to use, both in the installation
and usage aspect.
Entry points remove the need for the core package to be continually 
updated with new extension packages that it has to be aware of.

This information is provided in the converter package's
setup.cfg (it could be in
setup.py, but the .cfg file is the usual place to put this
information). What happens when the package is installed is
that information about entry points is saved by Python. Python
provides an API to the core package for it to discover what 
entry points have been designated for that package so that it can
make use of them.

In this example we will bundle the converters with the class
definitions that the converters will serialize. It is not
required that the class definitions must be in the same
package as the converters. For simplicity, they are for this
tutorial.

We will name the converter package mfconverter (for My First
Converter) so this will create a package of that name in
the current directory. This tutorial will have the bare bones
files needed for that. The following will generate
such a barebones directory structure.

Since this tutorial effectively is editing files and needs
to move between different directories, it is more awkward than
usual for a notebook

Required Software
----------------------

- numpy
- asdf v2.8 or higher

Let's begin
-------------

This will create a converter_tutorial directory in your home directory.
If you wish it to be elsewhere, make the appropriate changes to the
dependent commands, but if there is no strong reason to change it,
leave it be.

In [None]:
mkdir ~/converter_tutorial

In [None]:
cd ~/converter_tutorial

In [None]:
# Record current directory for restarts
import os
curdir = os.getcwd()
print(curdir)

In [None]:
mkdir mfconverter

In [None]:
cd mfconverter

In [None]:
mkdir src

In [None]:
mkdir src/mfconverter

And now we will create a module that has a very, very simple photo ID class and add a package `__init__.py`

In [None]:
%%writefile src/mfconverter/photo_id.py
class PhotoID:
    "Holds Photo ID information"

    def __init__(self, last_name, first_name, photo):
        "expects a monochromatic numpy array for photo"
        self.last_name = last_name
        self.first_name = first_name
        self.photo = photo
    
    def name(self):
        return self.last_name + ', ' + self.first_name

In [None]:
%%writefile src/__init__.py



Next we create the file that contains the converter code

In [None]:
%%writefile src/mfconverter/converter.py
from asdf.extension import Converter

class PhotoIDConverter(Converter):
    tags = ["asdf://stsci.edu/example-project/tags/photo_id-*"]
    types = ["mfconverter.photo_id.PhotoID"]
    # The above registers the tag that the converter is used for, as well as
    # associating the class that the converter is used for.
    
    # This method converts from the Python object to yaml
    def to_yaml_tree(self, obj, tags, ctx):
        # The yaml conversion expects a dictionary returned
        node = {}
        node['first_name'] = obj.first_name
        node['last_name'] = obj.last_name
        node['photo'] = obj.photo
        return node
    
    # This method converts from yaml to the Python object
    def from_yaml_tree(self, node, tag, ctx):
        from .photo_id import PhotoID # Deferred import to avoid always importing 
                                      # when ASDF gets entry points.
        return PhotoID(node['last_name'],
                       node['first_name'],
                       node['photo'])

In the above example, the class attribute `types` is a list since it is
possible for multiple types to use the same converter. Secondly, it is
possible to supply the type as either a string (as done above) as the 
actual class itself. The former is preferred for peformance reasons as
using the actual class itself forces the ASDF package to import the
module containing the class, even if it is never used in the program
using ASDF.

The wildcard for the tag entry indicates that this converter will work
with all versions of the tag. It isn't strictly needed here, but generally 
a good practice if one wants the converter code to handle multiple versions.
(How to handle versioning is a topic in its own right and not covered here)

We need to create a module for the entry point handling

In [None]:
%%writefile src/mfconverter/extensions.py
from asdf.extension import Extension
from .converter import PhotoIDConverter

class MFExtension(Extension):
    extension_uri = "asdf://stsci.edu/example-project/photo_id-1.0.0"
    tags = ["asdf://stsci.edu/example-project/tags/photo_id-1.0.0"]
    converters = [PhotoIDConverter()]

# The following will be called by ASDF when looking for ASDF entry points    
def get_extensions():
    return [MFExtension()]

And finally, the entry point reference in setup.cfg, provided here as a whole file.

In [None]:
%%writefile setup.cfg
[metadata]
name = mfconverter
description = TODO
long_description = TODO
author = TODO
version='0.1.0'
license = BSD-3-Clause

[options]
zip_safe = True
python_requires = >=3.6
setup_requires =
    setuptools_scm
install_requires =
    jsonschema>=3.0.2
    asdf>=2.8
    psutil>=5.7.2
    numpy>=1.16
package_dir =
    =src
packages = find:

[options.entry_points]
asdf.extensions =
    mfconverter = mfconverter.extensions:get_extensions

[options.packages.find]
where = src


If you wish to learn more about entry points, see:
https://packaging.python.org/guides/creating-and-discovering-plugins/

We need the setup.py file too

In [None]:
%%writefile setup.py
#!/usr/bin/env python3
from setuptools import setup

setup()

Install package
------------------

This is best done from a terminal window using the command

`pip install --editable .`

But this notebook will do that, but a consequence is that if the reinstallation must be done, the Jupyter kernel must be restarted to pick up the new installation.

In [None]:
!pip install --editable .

Now restart the kernel (you don't need to understand this cell).

In [None]:
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)
restartkernel()

In [None]:
import asdf
af = asdf.AsdfFile()

Before trying a full test, it is good to see if asdf knows about this converter.
Since the first thing that one will do is create an ASDF file with this object,
does ASDF know about this type?

In [None]:
af.extension_manager._converters_by_type

Depending on how many ASDF extensions you already have installed, 
the output may be fairly long. If your new converters is known by ASDF,
you should see it in this list. It is not in this list, and that is likely
due to some mismatch regarding the tag definitions and the uri's.
Looking carefully at converter.py one can see that stsci.edu in the 
tag if followed by a `/` rather than a `:` as is expected by ASDF.


Now you should see our converter in the list associated with the class.

We will test making an ASDF file, instructing ASDF to write the
array inline intead of binary, to keep the resulting file all text.

In [None]:
import asdf
import numpy as np
import mfconverter.photo_id as pid
image = np.zeros((10,10), dtype=np.byte)
p = pid.PhotoID('man', 'invisible', image)
af = asdf.AsdfFile()
af.tree = {'id': p}
af.write_to('test1.asdf', auto_inline=200)

Let's look at the file contents

In [None]:
!cat test1.asdf

One can see that under the `id` attribute that there are three components:
first and last name, and a small image representing a black and white photo.

Next we read in the file to see if it comes back as we expect (i.e., the 
same type of object it was originally with the values it was created with
originally.

In [None]:
af2 = asdf.open('test1.asdf')
p2 = af2.tree['id']
print(type(p2))
print(p2.name())
print(p2.photo)