# Tutorial

## Before we start

Please make sure you have created a file `~/.artifactory_python.cfg`:

``` cfg
[artifactory.audeering.com/artifactory]
username = MY_USERNAME
password = MY_API_KEY
```

## Introduction

* We want to **publish** models on [Artifactory](https://artifactory.audeering.com)
* We want to **load** models from [Artifactory](https://artifactory.audeering.com)
* We want to tag models with **properties** (e.g. the sampling rate it was trained on)

The only assumption we currently make is that a model consists of one or more files that are stored in some folder on your local disk, e.g:

```
|-- <root>/
    |-- file.yaml
    |-- file.txt
    |-- bin/
        |-- another-file.pkl     
```

In addition a (possibly empty) dictionary holding the properties needs to be passed, e.g.:

``` python
params = {
    'task': 'anger,
    'rate': 8000,
}
```

If we now publish the model (e.g. as version `1.0.0`), the following will happen:

1. A random `<id>` is created.
2. The model folder is zipped and published as artifact `<id>-<version>.zip`.
3. A row is added to the lookup table with `<id>` as index and `params` as values.

And if we later download the model:

1. The `<id>` for the requested `params` is resolved from the lookup table
2. The artifact `<id>-<version>.zip` is downloaded
3. The archive is unpacked to the local model cache folder


![workflow](pics/workflow.dot.svg)

## Usage

Some includes and helper functions and we're ready to go...

In [None]:
import os
import shutil
import glob
import uuid
import audeer
import audfactory
import audmodel

# create a unique group id to not interrupt
# if another notebook is running in parallel
audmodel.config.GROUP_ID += '.audmodel.' + str(uuid.uuid1())

def create_model(name, files):
    root = os.path.join(os.getcwd(), 'models', name)
    audeer.mkdir(root)
    for file in files:
        path = os.path.join(root, file)
        audeer.mkdir(os.path.dirname(path))
        with open(path, 'w'):
            pass
    return root

def show_model(path):
    path = audeer.safe_path(path)
    for root, dirs, files in os.walk(path):
        level = root.replace(path, '').count(os.sep)
        indent = ' ' * 4 * (level)
        print('{}{}/'.format(indent, os.path.basename(root)))
        subindent = ' ' * 4 * (level + 1)
        for f in files:
            print('{}{}'.format(subindent, f))

### Publish a model

Let's create a test model first (some empty files will do)...

In [None]:
files = ['meta.yaml', 'network.txt', 'bin/mlp-weights.pkl']
root_mlp = create_model('mymodel-mlp', files)
show_model(root_mlp)

And define the properties...

In [None]:
params_mlp = {
    'task': 'anger',
    'sampling_rate': 16000,
    'network': 'mlp',
}

Ready to release `1.0.0`? Let's go...

In [None]:
uid = audmodel.publish(
    root=root_mlp, 
    name='mymodel',
    params=params_mlp, 
    version='1.0.0',
)
uid

If the operation was successful, we get the model's unique id. We can use it to check that actually two artifacts were created - a csv file containing the lookup table and a zip file containing our model...

In [None]:
url = audmodel.get_model_url(
    name='mymodel',
    uid=uid,
)
path = audfactory.artifactory_path(url).parent.parent.parent
for p in path.glob("**/*"):
    if p.is_file():
        print(p)

**Important**: There is only one lookup table per version that supports exactly one set of parameters, e.g. the following will not work...

In [None]:
try:
    audmodel.publish(
        root=root_mlp, 
        name='mymodel',
        params={
            'same': 'version',
            'different': 'parameters',
        },
        version='1.0.0',
    )
except RuntimeError as ex:
    print(ex)

### Load a model

Loading the model is just as simple...

In [None]:
root = audmodel.load(
    name='mymodel',
    params=params_mlp,
    version='1.0.0',    
)
root

On success we get the root folder where the model was unpacked. By default, models are unpacked to the default model cache directory, which can be checked by...

In [None]:
audmodel.get_default_cache_root()

Note that the default cache location (`~/audmodel`) can be overwritten by the system variable `AUDMODEL_CACHE_ROOT`. Or individually by calling `load()` with a non empty `root` argument. Within the cache the model is placed in a unique sub-folder, namely `<name>/<version>/<uid>`. As the name *cache* implies, a model is only downloaded if it does not yet exists in the cache (unless you call `load` with `force=True`).

Now, let's check if everything worked out as expected...

In [None]:
show_model(root)

### Another flavor

Let's assume that your first approach using a standard *MLP* network wasn't very successful. Hence you decide to train another model using *LSTMs*. Since it's using the same training data, you don't want to publish a new version, but rather another *flavor* of the model. Hopefully, now you will know what properties are good for...

In [None]:
files = ['meta.yaml', 'network.txt', 'bin/lstm-weights.pkl']
root_lstm = create_model('mymodel-lstm', files)
params_lstm = {
    'task': 'anger',
    'sampling_rate': 16000,
    'network': 'lstm',
}
uid = audmodel.publish(
    root=root_lstm, 
    name='mymodel',
    params=params_lstm, 
    version='1.0.0',
)
uid

Already losing track what models you have uploaded so far? The lookup table will tell you the truth...

In [None]:
df = audmodel.get_lookup_table(
    name='mymodel',
    version='1.0.0',
)
df

The lookup table is returned as a [pandas.Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and lists model flavors with their ids. If only interested in the model id, you can also do...

In [None]:
uid = audmodel.get_model_id(
    name='mymodel', 
    params=params_lstm,
    version='1.0.0')
uid

Which offers another way to load a model...

In [None]:
root = audmodel.load_by_id(
    name='mymodel',
    uid=uid)
show_model(root)

### More parameters

After doing some analysis, you find out the model will improve if you normalize the audio data during training and add a little bit of white noise. You therefore introduce a new parameter `normalize`, which is either `True` or `False`, and `noise_db`, which defines the decibel level at which noise is added (or `None` to omit)...

In [None]:
df = audmodel.extend_params(
    name='mymodel',
    version='1.0.0',
    new_params={
        'normalize': False,
        'noise_db': None,
    }
)
df

We see that the table now holds two additional columns. For the old models the new parameters are automatically set to the default values. We can now add a new model trained on normalized audio with some noise...

In [None]:
params_lstm = {
    'task': 'anger',
    'sampling_rate': 16000,
    'network': 'lstm',
    'normalize': True,
    'noise_db': -10.0,
}
audmodel.publish(
    root=root_lstm, 
    name='mymodel',
    params=params_lstm, 
    version='1.0.0',
)
df = audmodel.get_lookup_table(
    name='mymodel',
    version='1.0.0',
)
df

### A new version

Since the *LSTM* network gives promising result, you decide to retrain with more data and publish version `2.0.0`...

In [None]:
# do retraining
uid = audmodel.publish(
    root=root_lstm, 
    name='mymodel',
    params=params_lstm, 
    version='2.0.0',
)
uid

To load the model you can either explicitly ask for version `2.0.0` or just get the latest model...

In [None]:
root = audmodel.load(
    name='mymodel',
    params=params_lstm,
    version=None,  # load latest version, i.e. 2.0.0
)
root

This will also work for the old *MLP* model, as it will automatically get the latest version that matches the properties, which is still `1.0.0`...

In [None]:
params_mlp['normalize'] = False
params_mlp['noise_db'] = None
root = audmodel.load(
    name='mymodel',
    params=params_mlp,
    version=None,  # load latest version, i.e. 1.0.0
)
root

### Clean up

Finally, clean up local files and Artifactory...

In [None]:
def cleanup():
    root = os.path.join(os.getcwd(), 'models')
    if os.path.exists(root):
        shutil.rmtree(root)
    path = audfactory.artifactory_path(
        audfactory.server_url(audmodel.config.GROUP_ID,
                              name='mymodel',
                              repository='models-public-local')).parent
    if path.exists():
        path.rmdir()
    
cleanup()