# How-to train a model on Azure ML

This notebook takes you through the steps of training a model on Azure ML for The Ocean Cleanup. We train the models through Azure ML to provide us with a good registration of all performed tests, so that we can see why and how a model was created.

When the result of a training run is satisfactory, a model can be registered from there, from which point we can deploy it.

There are a few concepts to know about first:

- Workspace: The entire AzureML environment you are working in. The Workspace contains all the other elements.
- Experiment: A collection of Runs (see below). A logical container for training a model with different parameters to determine the best.
- Run: A single train/test run of a model. These are tied to an experiment. If you want to train the same model with different parameters, so you can compare them, these are different runs under the same experiment.
- Environment: The code environment used by your code. This contains things like the required Python packages. Multiple options exist here - from just using your local environment to completely curated environments directly from Azure.
- Dataset: A single dataset as registered in the AzureML workspace.

With that out of the way, lets dive right in. Looking at these components, our first step will be to get the correct Workspace:

In [1]:
from toc_azurewrapper.workspace import get_workspace

subscription_id = "a00eaec6-b320-4e7c-ae61-60a30aec1cfc"
resource_group = "MachineLearning"
workspace_name = "RiverImageAnalysis"
tenant_id = "86f9fea7-9eb0-4325-8b58-7ed0db623956"

workspace = get_workspace(subscription_id, resource_group, workspace_name, tenant_id=tenant_id)

## Create experiment

Now that we have a workspace available, we need to create an experiment. As describe above, an experiment will be the container for multiple runs, in which we can train and compare the model using different parameters.

The experiment needs a name. Use something that is descriptive and clear to anyone seeing this.

In [2]:
from toc_azurewrapper.train import create_experiment
experiment = create_experiment(workspace, "model-A-v-1-1")

## Create the environment

We will now need to create an environment. In this case, we create this from a pip requirements file. However, functionality exists to do this from conda specifications or existing conda environments as well. We set override=True to ensure any changes are applied. Note that we could have also loaded in an existing environment. This includes curated environments from Azure.

In [3]:
from toc_azurewrapper.environment import get_environment
environment = get_environment(
    workspace,
    "howto_model_deployment_environment",
    pip_requirements="examples/example_model/requirements.txt",
    override=True
)

## Prepare model wraper

Now it's time to perform our first Run of the experiment. However, before we can do this, we will need a wrapper around our model. This wrapper needs to do a few things:

- Initialize and train the model with:
  - The desired parameters
  - The desired data
- Evaluate the performance of the trained model
- Register the parameters and the performance in the Run object
- Add the generated model artifacts to the Run object

There is skeleton code for this available: `skeleton_files/train.py`. In this file you fill in what parameters you expect, you create and train and evaluate the model using these parameters and the loaded in dataset(s), and you register the results and the created artifacts with the Run.

For this how-to, we will use the example provided in `examples/example_model/train.py`. This is an implementation of the file mentioned above. It expects two parameters: `param_a` and `param_b`. The model is a dummy: It picks a label from all the encountered labels based on the length of the filepath and some calculations from the provided parameters.

## Run the experiment

Now we need to create and run the experiment. First, we fetch the desired datasets, and combine these into train- and test sets. Note that we can provide multiple sets for both training and testing. Also note that each set consists of both a label and an image dataset.

In [4]:
from azureml.core import Dataset

train_images = Dataset.get_by_name(workspace, name="campaign-26-10-2020_images")
train_labels = Dataset.get_by_name(workspace, name="campaign-26-10-2020_labels")
test_images = Dataset.get_by_name(workspace, name="campaign-22-10-2020_images")
test_labels = Dataset.get_by_name(workspace, name="campaign-22-10-2020_labels")
trainsets = [
    (train_labels, train_images),
    (test_labels, test_images)
]
testsets = [
    (test_labels, test_images)
]

We now have everything we need to perform the run locally. Lets do so! Lets choose 10 for param a, and 0.3 for param b

In [5]:
from toc_azurewrapper.train import perform_run


run = perform_run(experiment, 'train.py', 'examples/example_model', environment=environment,
                  trainsets=trainsets, testsets=testsets, parameters={'param_a': 1, 'param_b': 0.1})
run.wait_for_completion(show_output=True)

RunId: model-A-v-1-1_1605023805_a82535e7
Web View: https://ml.azure.com/experiments/model-A-v-1-1/runs/model-A-v-1-1_1605023805_a82535e7?wsid=/subscriptions/29d66431-a7ce-4709-93f7-3bdb01a243b3/resourcegroups/ExperimentationJayke/workspaces/ExperimentationJayke

Streaming azureml-logs/60_control_log.txt

[2020-11-10T15:56:46.710376] Using urllib.request Python 3.0 or later
Streaming log file azureml-logs/60_control_log.txt
Starting the daemon thread to refresh tokens in background for process with pid = 11453
Running: ['/bin/bash', '/tmp/azureml_runs/model-A-v-1-1_1605023805_a82535e7/azureml-environment-setup/conda_env_checker.sh']
Materialized conda environment not found on target: /home/jayke/.azureml/envs/azureml_2ae71643b30d234dc8dfd07ee35de96c


Logging experiment preparation status in history service.
Running: ['/bin/bash', '/tmp/azureml_runs/model-A-v-1-1_1605023805_a82535e7/azureml-environment-setup/conda_env_builder.sh']
Running: ['conda', '--version']
conda 4.8.4

Creating co

{'runId': 'model-A-v-1-1_1605023805_a82535e7',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-11-10T15:59:58.938582Z',
 'endTimeUtc': '2020-11-10T16:02:15.424058Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '6299703a-f309-47dc-b76d-1295b204a69a',
  'azureml.git.repository_uri': 'git@github.com:TheOceanCleanup/AIDataPipeLine.git',
  'mlflow.source.git.repoURL': 'git@github.com:TheOceanCleanup/AIDataPipeLine.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'b58905affe064302dee06388a902a8dfcacb808d',
  'mlflow.source.git.commit': 'b58905affe064302dee06388a902a8dfcacb808d',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [{'dataset': {'id': '49fac13f-e0d8-4622-a4ac-f3634ec5e35b'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'test_images_0', 'mechanism': 'Mount'}}, {'dataset': {'id': 'bc619fbb-e949-42f5-9259-41301434c740'}, 'consumptionDetails': {'type': 'RunInput', 'inpu

Finally, if the result is to our liking, we will register the model. This means we can use it for deployment. Provide a name to the model, the path to either a single artifact or to the folder containing all required artifacts, and optionally a description, properties and tags.

In [6]:
run.register_model(
    "example_model",
    model_path="outputs/model.pkl",
    description="Example model implementation",
    properties={
        "location": "somewhere",
        "time": "night"
    }
)

Model(workspace=Workspace.create(name='ExperimentationJayke', subscription_id='29d66431-a7ce-4709-93f7-3bdb01a243b3', resource_group='ExperimentationJayke'), name=example_model, id=example_model:1, version=1, tags={}, properties={'location': 'somewhere', 'time': 'night'})