## Your First Metador Plugin

In this tutorial you will learn how Metador plugins are defined by creating a new Python package
that provides a simple custom metadata schema. We will keep it simple and focus on the general
aspects which apply to any kind of Metador plugin. In later tutorials, we will discuss
all the specifics for different kinds of plugins in much more depth.

**Prerequisites:**

* Working knowledge of the Linux shell (basic navigation and using CLI tools)
* Working knowledge of version control with `git` (creating and managing a repository)

**Learning Goals:**

* Learn how to create a new Python package that provides new metador plugins
* Learn to define and register a new plugin providing a simple metadata schema

### Creating a New Python Package

Metador plugins use the standard `entrypoint` system which is widely supported
and used in various Python projects. In order to use this system and correctly register plugins,
you cannot just write some Python files, but must organize them as a proper **Python package**.

<div class="alert alert-block alert-info">
    If you already know how to create a Python package
    (i.e. something you can upload to PyPI and <tt>pip install</tt>)
    and know what an entrypoint is, feel free to create a fresh Python package in
    the way which is familiar or comfortable to you. In that case, adapt all general
    steps to your setup. If you do this, we cannot provide any assistance if you encounter problems.
    Therefore, if you are not sure, stick with the tutorial for reproducible results.
</div>


We recommend to create Python packages using a tool called `poetry`.
It is also the tool used for the development of `metador-core`.
If you have no experience with entrypoints and Python packages, don't worry - we will guide you through the
process of setting up a new package. 

<div class="alert alert-block alert-info">
    This is <b>not</b> a Linux shell, <tt>git</tt> or <tt>poetry</tt> tutorial!
    If you have general questions about these things, consult the corresponding documentation.
</div>

#### Install Poetry

First, make sure that you have `poetry` available. To install it, follow the 
steps in the [documentation](https://python-poetry.org/docs/)
(this is important - it is **not** supposed to be installed 
using `pip install`).  Check that poetry is installed by running 
`poetry --version`, which should reply with something like `Poetry version 1.1.14`.

#### Create the Package

<div class="alert alert-block alert-warning">
  Metador requires Python 3.8 or newer, consider using pyenv to install a recent version if you do not have one available!
</div>

First, navigate on the command line to the directory where
you want to create your new metador plugin package, e.g.:

```bash
cd ~/projects                 # <- replace with path where your project should live
poetry new my-metador-plugin
cd my-metador-plugin          # <- this directory was created by poetry for you
```

You should find yourself in your fresh project directory which already contains:

* a directory `my_metador_plugin` where your code will live
* a directory `tests` where you can place all the tests for your code (hopefully many!)
* an empty file `README.rst` (as a reminder that you should write a README)
* a `pyproject.toml` file that tracks all dependencies to other packages (**!!!**)

#### Put the directory under version control

We will not discuss this further, but at this point you
probably should run `git init` in the project directory, add a `.gitignore`
which is suitable for Python projects
(e.g. [this one](https://github.com/github/gitignore/blob/main/Python.gitignore))
and do the first commit. In the future, also make sure to include changes
to a file called `poetry.lock` in your commits (if there is one).
We assume that you are able to take care of proper version control for your project
and will not mention it anymore.

#### Add general information to `pyproject.toml`

Make sure that the automatic `authors` entry is correct and has an e-mail address 
that can be used to contact you.

Write a brief `description` of what this package will provide.

If you already have connected your local directory
to a remote public git repository hosting service, such as GitLab or GitHub, also
add the URL as `repository_url` just under the other fields under `[tool.poetry]` - this
information is used by Metador to help users to get your plugin in case they need it.

If you have some time, feel free to also add other [package metadata](https://python-poetry.org/docs/pyproject/).

<div class="alert alert-block alert-warning">
    Please remember to keep this package metadata information up-to-date!
</div>

#### Enter the virtual environment

Run `poetry shell` to activate a project-specific virtual environment.
Poetry will create one for you, if it does not exist yet.

You see that you are in a virtual environment, because your command prompt in the terminal will begin with something like `(my-metador-plugin-KQKVg0oX-py3.8)`. The name of all virtual environments which poetry creates is always starting with your project name and will contain some automatic identifier (like `KQKVg0oX`).

In case you are used to running `activate` and `deactivate` for your virtual environments - these commands are not used together with poetry. Instead:
* Use `poetry shell` inside the project directory to activate (whenever you want to work on your package)
* Use `exit` anywhere to deactivate (e.g. when you want to switch to another project or some custom environment)

#### Add `metador-core` as a dependency

Run the following command to add `metador-core` as a dependency to your project:

```bash
poetry add git+ssh://git@github.com:Materials-Data-Science-and-Informatics/metador-core.git
```

<div class="alert alert-block alert-warning">
    When metador-core is published on PyPI, you can simply add <tt>metador-core</tt> to your dependencies.
    <br />
    Until then, as an early adopter you install the current version from the <tt>main</tt> branch.
    <br />
    Make sure that your public ssh key is properly registered in Github to access the private repository.
</div>

If everything went smoothly (it can take a couple of minutes), then: 

* the output of the poetry command contains a line like: ` * Installing metador-core (...)`
* the `pyproject.toml` has a new entry for `metador-core` under `[tool.poetry.dependencies]`

Now we are ready to get started with development!

### The Metador Plugin System

Before we go on and define our schema plugin, let us first take a closer look at the Metador plugin system.

#### Plugin Groups: Bags of Similar Plugins

Every Metador plugin belongs to a **plugin group**, even *plugin groups themselves* are just plugins of a plugin group called `plugingroup` and can define new kinds of objects that can be provided as plugins (but this is an advanced topic that you most likely never will need to worry about). Probably the most important plugin group is the `schema` plugin group, as everything in Metador depends on schemas.

In general, if you want to define a new well-behaved plugin, but do not know the specifics of the relevant plugin group yet and no other guidance is provided, a good place to start is the documentation of the class where the plugin group itself is defined, if no other documentation is provided. It should contain all important information for writing a suitable plugin, that is, one that provides the **expected interface and behavior**.

The plugin system will try to check all plugins and validate them, to catch simple but common implementation mistakes. This includes things such as forgetting to implement a method or to set a required attribute. Nevertheless, you cannot rely on the automatic superficial checks. **You are responsible for the correctness of your plugins**, because most higher-level requirements for plugins cannot be checked automatically.

<div class="alert alert-block alert-warning">
    Strictly follow the requirements for the interface and behavior of plugins expected by a plugin group, always write tests for your plugins.
</div>

#### Anatomy of a Plugin

Each plugin group defines their own requirements for what each plugin must provide, but there is a minimum of requirements shared by *all* plugins - they must provide a special inner class that defines at least the plugin name and version. So the common skeleton of a shared plugin looks like this:

In [1]:
class MyNewPlugin:
    
    class Plugin:
        name = "my.newplugin"
        version = (0, 1, 0)
        # (... possibly other plugin group specific declarations ...)
        
    # (... all methods required by the plugin group ...)
    
    # (... implementation details of YOUR plugin ...)

#### Plugin naming

The name of a plugin, aside of being meaningul, must satisfy two properties:

##### The plugin entrypoint must be equal to the `name__x.y.z`, i.e. name and version of the plugin

Luckily, this is one property that the plugin system will check for all plugins and warn you when you try to load them. Soon you will see how to declare the entrypoint for your plugin.

##### The plugin name must be unique within its plugin group

This property cannot be checked automatically. To avoid problems, you should not use too general or too short names for your plugins. Otherwise, there is a risk that someone else will pick the same name for *their* plugin, which can lead to serious problems. To avoid or at least constrain this problem, you must add a "namespace prefix" to the names of all plugins that you define.

<div class="alert alert-block alert-warning">
    Pick a suitable namespace prefix that you consistently use for all plugins that you develop!
    <br />
    This means: All your plugins should have names of the form <tt>MYPREFIX.PLUGINNAME</tt>,
    where <tt>MYPREFIX</tt> is your chosen "namespace prefix".
</div>

The prefix that you pick should be something that most likely other people will not use, but something short enough that people are not too annoyed by typing out the name of your plugin by hand (remember that metadata in containers is inspected by accessing the metadata based on the schema plugin name).

Suitable choices for a prefix are:

* your last name
* your username on Github
* the short name of your employer organization or institute

If you work for a larger organization, feel free to refine this namespacing approach to a suitable level that will prevent plugin name collisions, e.g. use plugin names such as `my-org.my-dept.my-plugin`.

#### Plugin versioning

All plugins must respect [semantic versioning](https://semver.org/), so a version is a triple `MAJOR.MINOR.PATCH`. For different plugin groups this translates to slightly different requirements, which are usually explicitly spelled out for the specific context of a plugin group. In general, semantic versioning means:

* You increase `MAJOR` by 1 whenever other things could break if they update to the new version
* You increase `MINOR` by 1 whenever you added new features without breaking anything
* You increase `PATCH` by 1 whenever you fixed problems in your plugin without changing its intended behaviour

The initial version you pick for your plugin is not important, but for consistency we recommend setting the first version to `0.1.0` (written as `(0, 1, 0)` in the example above), which is common practice in most software projects.

### Defining the Schema Plugin

Now after gaining insight into the general workings of the Metador Plugin system, we are finally ready and can get to work.

#### Write the Schema

In your project, create a new file `schemas.py` inside the `my_metador_plugin` directory (which currently only contains an `__init__.py` file) and add the following contents:

In [2]:
"""Metador schema plugins provided by this package."""
from metador_core.schema import MetadataSchema
from metador_core.schema.types import Int, Str

class MyFirstSchema(MetadataSchema):

    class Plugin:
        name = "dummy.my-first-schema"
        version = (0, 1, 0)

    magic_number: Int
    some_text: Str = "(no text)"

You can see that the minimal schema plugin is very close to the general "plugin skeleton" we discussed above.
What makes this a schema plugin is that our plugin class is a subclass of `MetadataSchema` and the inner `Plugin` class is a subclass of `SchemaPlugin` (which itself is a subclass of `PluginBase` that you have seen above). These are the minimal requirements imposed by the schema plugin group.

Schemas are defined mostly by using Python type hints - if you are familiar with [dataclasses](https://docs.python.org/3/library/dataclasses.html), then think of schemas as *very fancy dataclasses*. In another tutorial we will take a deep dive into schema development, but for now it is enough to know that our schema expects a metadata object that requires a field called `magic_number`, which must be an integer value, and also supports an optional field `some_text`, which if provided will override the default value `"(no text)"`.

#### Declare the Entrypoint

Now open your `pyproject.toml` file and define the entry point by adding these two lines (e.g. just before the `[build-system]` section):

```toml
[tool.poetry.plugins.metador_schema]
'dummy.my-first-schema__0.1.0' = "my_metador_plugin.schemas:MyFirstSchema"
```

The first line says that we want to declare a `schema` plugin (for plugin group `X`, the section would be `[tool.poetry.plugins.metador_X]`). 

The second line declares the entry point, with our plugin name on the left and the location of our plugin on the right.

The location string on the right corresponds to how the class is imported: `from my_metador_plugin.schemas import MyFirstSchema`.

Finally, run `poetry install`, which will make poetry re-register your package and thus make the entrypoint known to the environment.

#### We are done!

In order to see if everything worked, make sure that you are still inside `poetry shell` (remember that we said that you should work inside of it!) and try to import your schema in the `python3` interpreter (lines **A** and **B** below).

If you run this notebook in the same virtual environment which is used for the plugin package, you can just restart the notebook and evaluate the following cell:

In [3]:
try:
    
    # run these two lines in your python interpreter:
    from metador_core.plugins import schemas           # A
    MyFirstSchema = schemas["dummy.my-first-schema"]   # B
    # ----
    
    print("Congratulations, your new plugin was registered correctly! :)")
except KeyError:
    print("Your plugin was not found :(")

Your plugin was not found :(


Assuming that you test in the regular `python3` shell:

* If line **A** fails, then you are probably not in the correct virtual environment, because the interpreter cannot find `metador_core`.
* If line **B** fails, than you either did a mistake with the entry point in the `pyproject.toml` file, or forgot to run `poetry install`.

You might wonder why we always access a schema through the plugin system, instead of simply importing it. One reason is that we wanted to verify that the plugin is registered correctly. Another reason is that
plugin code can be moved by developers within a package, or even into completely different package. You should not need to know where a plugin comes from in order to use it, this is one of the main purposes of the Metador plugin system.

<div class="alert alert-block alert-warning">
When working with plugins that are not provided by yourself (and even then), <b>access them through the plugin system</b> instead of importing them directly, in order to avoid breakage of your code only because something was "moved" to a different location!
</div>

If you want, now you can try instantiating your schema in various ways to get a feeling for it:

In [4]:
print(MyFirstSchema(magic_number=42))
print(MyFirstSchema(magic_number=23, nonsense=True))
print(MyFirstSchema(magic_number=1, some_text="hello"))

{
  "magic_number": 42,
  "some_text": "(no text)"
}
{
  "magic_number": 23,
  "some_text": "(no text)",
  "nonsense": true
}
{
  "magic_number": 1,
  "some_text": "hello"
}


As an exercise, you can create a `MetadorContainer` and attach an instance of your metadata schema to a node - your schema is on equal footing with the default schemas you have already seen, and it can be used in exactly the same ways. In a later tutorial, you will learn how to make use of schema inheritance and make your schema aligned with semantic standards.

### Summary

#### Python Packages

* Metador plugins must be provided by Python packages using the entrypoint system
* The easiest and most modern way to create a package is using the `poetry` tool
* Entrypoints are declared within the `pyproject.toml` file that contains all package metadata

#### Metador Plugins

* Each plugin belongs to a plugin group, which corresponds to an entry point group
* All plugins must have a unique name and possess a version tag that follows semantic versioning
* Plugins must satisfy additional requirements that depend of the respective plugin group