Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Train in a remote VM (MLC managed DSVM)
* Create Workspace
* Create Experiment
* Upload data to a blob in workspace
* Configure ACI run config
* Submit the experiment in ACI
* Register the retrained model

# Prerequisites
Make sure you go through the [00. Installation and Configuration](00.configuration.ipynb) Notebook first if you haven't.

## Install Azure ML SDK

* !pip install azureml-core
* !pip install azureml-contrib-iot
* !pip install azure-mgmt-containerregistry

## Check the conda environment
Make sure you have started the notebook from the correct conda environment

In [1]:
import os
print(os.__file__)

/home/arun/anaconda3/envs/tf_dev/lib/python3.5/os.py


In [2]:
# Check core SDK version number
import azureml.core as azcore

print("SDK version:", azcore.VERSION)

SDK version: 1.0.2


## Initialize Workspace

Initialize a workspace object from persisted configuration.

In [3]:
from azureml.core import Workspace

ws = Workspace.from_config('./aml_config/config.json')
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

Found the config file in: /home/arun/Documents/tensorflow-for-poets-2/aml_config/config.json
peabody
peabody
eastus
54646fde-e2bd-4f13-bb8a-2eb1174d1240


## Create Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [4]:
experiment_name = 'soda_cans'

from azureml.core import Experiment
exp = Experiment(workspace = ws, name = experiment_name)

## Upload data files into datastore
Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target.

In [5]:
# get the default datastore
ds = ws.get_default_datastore()
print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)

workspaceblobstore AzureBlob peabody2715213321 azureml-blobstore-f8535f97-d577-4cb0-bdc8-71e1e2d2ac94


Now let's upload the data into the default datastore:

In [6]:
data_path = experiment_name + '_training_data'
#ds.upload(src_dir='./soda_cans', target_path=data_path, overwrite=True)

If you have an existing azure storage, register it as a datastore to your workspace.

In [None]:
from azureml.core.datastore import Datastore
ds = Datastore.register_azure_blob_container(workspace=ws, 
                                         datastore_name='henry', 
                                         container_name='images',
                                         account_name='hjtrainblob', 
                                         account_key='13gRNcXIBOJ8AnwlSQAZEI7g2JA6pfvnUFENNsYKRK1yvEtZMVzNXoOs35upsXfh7VtDB/omwxC0m1n5iMmUoQ==',
                                         create_if_not_exists=False)
data_path = None # This is the path to the folder in the blob container. Set this to None to get all the contents.
print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)

## Configure for using ACI
Linux-based ACI is available in `West US`, `East US`, `West Europe`, `North Europe`, `West US 2`, `Southeast Asia`, `Australia East`, `East US 2`, and `Central US` regions. See details [here](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-quotas#region-availability).

Create a `DataReferenceConfiguration` object to inform the system what data folder to download to the copmute target.

In [7]:
from azureml.core.runconfig import DataReferenceConfiguration
dr = DataReferenceConfiguration(datastore_name=ds.name, 
                   path_on_datastore=data_path, 
                   mode='download', # download files from datastore to compute target
                   overwrite=True)

Set the system to build a conda environment based on the run configuration. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs.

In [8]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "cpucluster3"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D3', max_nodes=2)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current AmlCompute. 
print(compute_target.status.serialize())


Found existing compute target.
{'nodeStateCounts': {'preparingNodeCount': 0, 'leavingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'preemptedNodeCount': 0, 'unusableNodeCount': 0}, 'targetNodeCount': 0, 'provisioningStateTransitionTime': None, 'allocationState': 'Steady', 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D3', 'modifiedTime': '2018-12-19T01:21:19.162967+00:00', 'errors': None, 'provisioningState': 'Succeeded', 'creationTime': '2018-12-19T01:20:00.920301+00:00', 'scaleSettings': {'maxNodeCount': 2, 'nodeIdleTimeBeforeScaleDown': 'PT120S', 'minNodeCount': 0}, 'currentNodeCount': 0, 'allocationStateTransitionTime': '2019-01-05T01:38:28.772000+00:00'}


In [10]:
from azureml.core.runconfig import RunConfiguration, DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies

# create a new runconfig object
run_config = RunConfiguration(framework = "python")

# Set compute target
run_config.target = compute_target.name

# set the data reference of the run configuration
run_config.data_references = {ds.name: dr}

# enable Docker 
run_config.environment.docker.enabled = True

# set Docker base image to the default CPU-based image
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

# use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_config.environment.python.user_managed_dependencies = False

# auto-prepare the Docker image when used for execution (if it is not already prepared)
run_config.auto_prepare_environment = True

# specify CondaDependencies obj
run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['tensorflow==1.8.0'])

### Submit the Experiment
Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container.

In [20]:
from azureml.core import Run
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory = './scripts', script = 'retrain.py', run_config = run_config, 
                      # pass the datastore reference as a parameter to the training script
                      arguments=['--image_dir', str(ds.as_download()),
                                 '--architecture', 'mobilenet_1.0_224',
                                 '--output_graph', 'outputs/retrained_graph.pb',
                                 '--output_labels', 'outputs/output_labels.txt',
                                 '--model_download_url', 'https://raw.githubusercontent.com/rakelkar/models/master/model_output/',
                                 '--model_file_name', 'imagenet_2_frozen.pb'
                                ])
run = exp.submit(config=src)

### View run history details

In [21]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
soda_cans,soda_cans_1547167621516,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


In [22]:
run.wait_for_completion(show_output=True)

RunId: soda_cans_1547167621516

Streaming azureml-logs/80_driver_log.txt

https://raw.githubusercontent.com/rakelkar/models/master/model_output/mobilenet_v1_1.0_224_frozen.tgz
/tmp/imagenet
/tmp/imagenet/mobilenet_v1_1.0_224_frozen.tgz

>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.0%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.1%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.1%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.2%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.2%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.3%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.3%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.4%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.4%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.5%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.5%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.6%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.6%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 0.7%
>> Downloading mobilenet_v1_1.0_224

>> Downloading mobilenet_v1_1.0_224_frozen.tgz 88.7%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 88.8%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 88.8%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 88.9%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.0%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.0%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.1%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.1%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.2%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.2%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.3%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.3%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.4%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.4%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.5%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.5%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.6%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz 89.6%
>> Downloading mobilenet_v1_1.0_224_frozen.tgz

INFO:tensorflow:2019-01-11 00:59:05.477212: Step 3830: Train accuracy = 100.0%
INFO:tensorflow:2019-01-11 00:59:05.477461: Step 3830: Cross entropy = 0.000301
INFO:tensorflow:2019-01-11 00:59:05.538305: Step 3830: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2019-01-11 00:59:06.187961: Step 3840: Train accuracy = 100.0%
INFO:tensorflow:2019-01-11 00:59:06.188207: Step 3840: Cross entropy = 0.000170
INFO:tensorflow:2019-01-11 00:59:06.249204: Step 3840: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2019-01-11 00:59:06.912074: Step 3850: Train accuracy = 100.0%
INFO:tensorflow:2019-01-11 00:59:06.912315: Step 3850: Cross entropy = 0.000156
INFO:tensorflow:2019-01-11 00:59:06.978428: Step 3850: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2019-01-11 00:59:07.606812: Step 3860: Train accuracy = 100.0%
INFO:tensorflow:2019-01-11 00:59:07.607070: Step 3860: Cross entropy = 0.000188
INFO:tensorflow:2019-01-11 00:59:07.667221: Step 3860: Validation accuracy = 100.0% (N=1

{'logFiles': {'azureml-logs/60_control_log.txt': 'https://peabody2715213321.blob.core.windows.net/azureml/ExperimentRun/dcid.soda_cans_1547167621516/azureml-logs/60_control_log.txt?sv=2018-03-28&sr=b&sig=g%2BE7VHevlyD7VOV%2FVEDq%2Bb9vY3PN4dJgwQSb4VhMS1s%3D&st=2019-01-11T00%3A49%3A21Z&se=2019-01-11T08%3A59%3A21Z&sp=r',
  'azureml-logs/80_driver_log.txt': 'https://peabody2715213321.blob.core.windows.net/azureml/ExperimentRun/dcid.soda_cans_1547167621516/azureml-logs/80_driver_log.txt?sv=2018-03-28&sr=b&sig=vD1SPU4vXeHLWgO%2BUbwTC%2BF9oXNtlJNwCEb7qUv3LLw%3D&st=2019-01-11T00%3A49%3A21Z&se=2019-01-11T08%3A59%3A21Z&sp=r'},
 'properties': {'ContentSnapshotId': 'c4cef8da-f324-449a-bba8-2e3d704fff81',
  'azureml.runsource': 'experiment'},
 'runDefinition': {'AmlCompute': {'ClusterMaxNodeCount': 1,
   'Name': None,
   'RetainCluster': False,
   'VmPriority': None,
   'VmSize': None},
  'Arguments': ['--image_dir',
   '$AZUREML_DATAREFERENCE_workspaceblobstore',
   '--architecture',
   'mobilenet

### Register the Model

In [23]:
from azureml.core.model import Model

model = run.register_model(model_name = experiment_name, model_path = 'outputs/')
#model = Model.register(model_path = "model",
#                      model_name = "distracted_driver",
#                      tags = {"data": "Imagenet", "model": "object_detection", "type": "imagenet"},
#                      description = "Retrained with model downloaded from Github",
#                      workspace = ws)
print(model.name, model.url, model.version, model.id, model.created_time)

Deprecated, use RunHistoryFacade.assets instead.


soda_cans aml://asset/f1e20626c60d41f7826623f823708548 4 soda_cans:4 2019-01-11 01:00:17.665493+00:00


## Convert Model

In [24]:
from azureml.contrib.iot.model_converters import SnpeConverter

# submit a compile request
compile_request = SnpeConverter.convert_tf_model(
    ws,
    source_model=model,
    input_node="input",
    input_dims="1,224,224,3",
    outputs_nodes = ["final_result"],
    allow_unconsumed_nodes = True)
print(compile_request._operation_id)

f0cdfed7-f229-43b4-bf4b-aa0ffd838a7d


In [25]:
# wait for the request to complete
compile_request.wait_for_completion(show_output=True)

Running.
Failed
Operation f0cdfed7-f229-43b4-bf4b-aa0ffd838a7d completed, operation state "Failed"
sas url to download model conversion logs https://peabody2715213321.blob.core.windows.net/azureml/ExperimentRun/a9948a127d2b47488e49c7da06436f9d/soda_cans.4.dlc.tar.gz-userlog?sv=2018-03-28&sr=b&sig=6C057Tkg4ER3Y2G5bWFKo5nas17AIyW2WutVZ%2FAffiw%3D&st=2019-01-11T00%3A50%3A35Z&se=2019-01-11T09%3A00%3A35Z&sp=r
[2019-01-11 01:00:32Z]: Starting model conversion process
[2019-01-11 01:00:32Z]: Downloading model for conversion
[2019-01-11 01:00:32Z]: Conversion completed with result Failure

Model convert failed, unexpected error response:
{'code': 'ModelConvertFailed', 'details': [{'code': 'CompileModelFailed', 'message': 'aml://artifact/ExperimentRun/a9948a127d2b47488e49c7da06436f9d/soda_cans.4.dlc.tar.gz-userlog'}]}

False

In [None]:
# get the compiled model
compiled_model = compile_request.result
print(compiled_model.name, compiled_model.url, compiled_model.version, compiled_model.id, compiled_model.created_time)

In [None]:
compiled_model.download(target_dir="./converted/", exist_ok=True)

### Create Docker Image

### Show the sample application file

In [None]:
with open('./main.py', 'r') as f:
    print(f.read())


In [None]:
from azureml.core.image import Image
from azureml.contrib.iot import IotContainerImage

image_config = IotContainerImage.image_configuration(
                                 architecture="arm32v7",
                                 execution_script="main.py", 
                                 dependencies=["cameraapi.py","iot.py","ipcprovider.py","utility.py"],
                                 docker_file="Dockerfile",
                                 tags = ["mobilenet"],
                                 description = "MobileNet based demo module")
image = Image.create(name = "peabodymobilenet",
                     # this is the model object 
                     models = [compiled_model],
                     image_config = image_config, 
                     workspace = ws)

In [None]:
image.wait_for_creation(show_output = True)

### Enter your container registry credentials

#### List the image to get URI

In [None]:
container_reg = ws.get_details()["containerRegistry"]
reg_name=container_reg.split("/")[-1]
resource_group_name = ws.resource_group
container_url = "\"" + image.image_location + "\","
subscription_id = ws.subscription_id
print('{}'.format(image.image_location))
print('{}'.format(reg_name))
print('{}'.format(subscription_id))

In [None]:
from azure.mgmt.containerregistry import ContainerRegistryManagementClient
from azure.mgmt import containerregistry
client = ContainerRegistryManagementClient(ws._auth,subscription_id)
result= client.registries.list_credentials(resource_group_name, reg_name, custom_headers=None, raw=False)
username = result.username
password = result.passwords[0].value

### Build your Deployment.json file

In [None]:
%%writefile ./deploymentpb.json

{
  "modulesContent": {
    "$edgeAgent": {
      "properties.desired": {
        "schemaVersion": "1.0",
        "runtime": {
          "type": "docker",
          "settings": {
            "minDockerVersion": "v1.25",
            "loggingOptions": "",
            "registryCredentials": {
                

In [None]:
#Automatically adding your acr details
acr_details = "\"" + reg_name +"\": {\n\t\t\t\"username\": \""+ username + "\",\n\t\t\t" + "\"password\":\"" + password + "\",\n\t\t\t" + "\"address\":\"" + reg_name + ".azurecr.io\"" + ",\n\t\t}"
print('{}'.format(acr_details))
%store acr_details >> deploymentpb.json

In [None]:
%%writefile -a ./deploymentpb.json
            }
          }
        },
        "systemModules": {
          "edgeAgent": {
            "type": "docker",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-agent:1.0",
              "createOptions": "{}",
              "env": {
                "UpstreamProtocol": {
                  "value": "MQTT"
                }
              }
            }
          },
          "edgeHub": {
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-hub:1.0",
              "createOptions": "{\"User\":\"root\",\"HostConfig\":{\"PortBindings\":{\"5671/tcp\":[{\"HostPort\":\"5671\"}], \"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}]}}}",
              "env": {
                "UpstreamProtocol": {
                  "value": "MQTT "
                }
              }
            }
          }
        },
        "modules": {
          "VisionSampleModule": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": 

In [None]:
#adding your container URL
%store container_url >> deploymentpb.json

In [None]:
%%writefile -a ./deploymentpb.json
              "createOptions": "{\"HostConfig\":{\"Binds\":[\"/data/misc/camera:/app/vam_model_folder\"],\"NetworkMode\":\"host\"},\"NetworkingConfig\":{\"EndpointsConfig\":{\"host\":{}}}}"
            }
          }
        }
      }
    },
    "$edgeHub": {
      "properties.desired": {
        "schemaVersion": "1.0",
        "routes": {
          "route": "FROM /messages/* INTO $upstream"
        },
        "storeAndForwardConfiguration": {
          "timeToLiveSecs": 7200
        }
      }
    }
  }
}

## Deploy image as an IoT module

### Set subscription to the same as your workspace

In [None]:
%%writefile ./setsub
az account set --subscription 

In [None]:
iot_sub=ws.subscription_id
%store iot_sub >> setsub
!sh setsub 
print ('{}'.format(iot_sub))

### Provision Azure IoT Hub

In [None]:
#RG and location to create hub
iot_rg="vaidk_"+resource_group_name
iot_location=ws.get_details()["location"]
#temp to delete
iot_location="eastus2"
iot_hub_name="iothub-"+ ws.get_details()["name"]
iot_device_id="vadik_"+ ws.get_details()["name"]
iot_deployment_id="dpl"+ "cstmvaidk"
print('{}'.format(iot_hub_name))

In [None]:
%%writefile ./create
#Command to create hub and device


In [None]:
# Adding Intialization steps
regcommand="\n echo Installing Extension ... \naz extension add --name azure-cli-iot-ext \n"+ "\n echo CREATING RG "+iot_rg+"... \naz group create --name "+ iot_rg +" --location "+ iot_location+ "\n" +"\n echo CREATING HUB "+iot_hub_name+"... \naz iot hub create --name "+ iot_hub_name + " --resource-group "+ iot_rg +" --sku S1"
#print('{}'.format(regcommand))
%store regcommand >> create

### Create Identity for your device 

In [None]:
#Adding Device ID 
create_device="\n echo CREATING DEVICE ID "+iot_device_id+"... \n az iot hub device-identity create --device-id "+ iot_device_id + " --hub-name " +  iot_hub_name +" --edge-enabled"
#print('{}'.format(create_device))
%store create_device >> create

In [None]:
#Create command and vonfigure device 
!sh create

### Create Deployment

In [None]:
%%writefile ./deploy
#Command to create hub and device


In [None]:
#Add deployment command
deploy_device="\necho DELETING "+iot_deployment_id+" ... \naz iot edge deployment delete --deployment-id \"" + iot_deployment_id +"\" --hub-name \"" +  iot_hub_name +"\"\necho DEPLOYING "+iot_deployment_id+" ... \naz iot edge deployment create --deployment-id \"" + iot_deployment_id + "\" --content \"deploymentpb.json\" --hub-name \"" +  iot_hub_name +"\" --target-condition \"deviceId='"+iot_device_id+"'\" --priority 1"
print('{}'.format(deploy_device))
%store deploy_device >> deploy

In [None]:
#run deployment to stage all work for when the model is ready 
!sh deploy

### Use this conenction string on your camera to Initialize it

In [None]:
%%writefile ./showdetails
#Command to create hub and device

In [None]:
#Add deployment command
get_string="\n echo THIS IS YOUR CONNECTION STRING ... \naz iot hub device-identity show-connection-string --device-id  \"" + iot_device_id + "\" --hub-name \"" +  iot_hub_name+"\""
#print('{}'.format(get_string))
%store get_string >> showdetails
!sh showdetails