Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/CMK8s-Samples/sample_notebooks/009%20ONNX-Runtime-training/onnx-runtime-training.png)

#  Use ONNX Runtime training

This tutorial is an example to submit an ORT training job. Compared with normal training job, the only thing you need is the ORT Training docker image. ITP team has built the ORT Training image to ITP container registry. Please reach out to ITP team if you need access.

## Prerequisites
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration notebook to install the Azure Machine Learning Python SDK and create a workspace.

Please make sure the following packages are installed to submit training job to ITP compute.


In [None]:
# install CMAKS SDK
pip install --disable-pip-version-check --extra-index-url https://azuremlsdktestpypi.azureedge.net/CmAks-Compute-Test/D58E86006C65 azureml-pipeline-steps azureml-contrib-pipeline-steps azureml_contrib_itp --upgrade

## Initialize workspace

Create a workspace object from the existing workspace. A [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is a class that accepts your Azure subscription and resource information. It also creates a cloud resource to monitor and track your model runs. `Workspace.from_config()` reads the file **config.json** and loads the authentication details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

## Create Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [None]:
from azureml.core import Experiment
experiment_name = 'orttraining-cmaks'
experiment = Experiment(workspace = ws, name = experiment_name)

## Specify CMAKS compute target

List all of the ITP compute targets in your workspace.

In [None]:
from azureml.core.compute import ComputeTarget
from azureml.contrib.core.compute.cmakscompute import CmAksCompute
for key, target in ws.compute_targets.items():
    if type(target) is CmAksCompute:
        print('Found compute target:{}\ttype:{}\tprovisioning_state:{}\tlocation:{}'.format(target.name, target.type, target.provisioning_state, target.location))

Specify one of your CMASK compute targets.

In [None]:
# specify your CMAKS compute
compute_name = <compute_name>
compute_target = ComputeTarget(workspace=ws, name=compute_name)

## Training on CMAKS compute

### Create project directory

Create a directory that will contain all the necessary code that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.


In [None]:
import os
import shutil

project_folder = './orttraining'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('mnist_training.py', project_folder)

### Create Estimator

To host ORT training job, you need to leverage `ContainerRegistry` to specify your custom ORT training docker image. If you don't have that image, please reach out to ITP team to get access to ITP container registry.


In [None]:
from azureml.core.container_registry import ContainerRegistry
from azureml.train.estimator import Estimator

container_registry = ContainerRegistry()

#please replace by your own container registry information
container_registry.address = "philly2aml.azurecr.io"
container_registry.username = "6b5ec445-f666-4cd1-812c-d9ebb0d83413"
# container_registry.password = "<secret>"

container_registry.password = '7aef69c0-9248-484b-96bf-2e05565445fe'


est = Estimator(
    compute_target=compute_target,
    use_gpu=True,
    image_registry_details=container_registry,
    custom_docker_image="ort_pytorch:latest",
    user_managed=True,
    source_directory=project_folder,
    entry_script="mnist_training.py"
)


### Submit training job

In [None]:
run = experiment.submit(est)
run

Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).