# Goes Through How to Train Models via AML Pipelines
First we need to get our Azure SDK imports, authentication, etc.

If you're running this, use your own tenant-id, because AzureML workspaces authenticate using InteractiveLoginAuthentication and obviously it won't work because you're not signed into my Azure account. Can find this by searching for tenant properties under services. 

1. We get our workspace
2. We grab the dataset from our registered dataset list
3. Using "as_mount" converts this into a DataSetConsumptionConfig object which is what pipelines will use to transfer data inbetween places.

The Tiny Imagenet dataset already comes prepped. So we didn't need a data-prep step. Can be an additional step here for other datasets.

In [17]:
from azureml.core import Workspace, Dataset, Datastore #azureml-core of version 1.0.72 or higher is required
from azureml.core.authentication import InteractiveLoginAuthentication
# If someone else is running this, put your tenant id here... auth is being finnicky
#interactive_auth = InteractiveLoginAuthentication(tenant_id="72f988bf-86f1-41af-91ab-2d7cd011db47")
# get/create experiment
ws = Workspace.from_config()

# get dataset (FileDataset object)
tiny_imagenet = Dataset.get_by_name(ws, name='Tiny ImageNet')
# Convert to DataSetConsumptionConfig object
#ds_input = tiny_imagenet.as_named_input("tiny_imagenet_dataset")
ds_input = tiny_imagenet.as_mount(path_on_compute="./data")

In [18]:
type(ds_input)

azureml.data.dataset_consumption_config.DatasetConsumptionConfig

In [19]:
from azureml.data import OutputFileDatasetConfig

This import "OutputFileDatasetConfig" is what lets us connect output from one script to another. In this case, I'm using it to store our saved models to our default datastore so I can just make sure everything's working. 

# Compute Clusters
If there aren't any of the specific clusters we need running in the workspace already, create them. We really only need 1 node. Ignore the max_nodes. This is a WIP.

In [20]:
from azureml.core.compute import ComputeTarget, AmlCompute

#compute_name = "compute-cluster"
#vm_size = "STANDARD_NC12"
compute_name = "GPU-METRIC"
vm_size = "STANDARD_NC16AS_T4_V3"
if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('Found compute target: ' + compute_name)
else:
    print('Creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,  # STANDARD_NC6 is GPU-enabled
                                                                vm_priority='lowpriority',
                                                                min_nodes=1,
                                                                max_nodes=4)
    # create the compute target
    compute_target = ComputeTarget.create(
        ws, compute_name, provisioning_config)

    # Can poll for a minimum number of nodes and for a specific timeout.
    # If no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=20)

    # For a more detailed view of current cluster status, use the 'status' property
    print(compute_target.status.serialize())

Found compute target: GPU-METRIC


# Configure Training Environment
Need to include the dependencies in order to train the model. Using a curated environment will use pre-built Docker images from the Microsoft Container Registry. Conda dependencies and pip dependencies do not like each other...

In [21]:
from azureml.core.runconfig import RunConfiguration, DockerConfiguration, MpiConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Environment 

# Use the 4 nodes we have
#distributed_training_config = MpiConfiguration(node_count=4)

aml_run_config = RunConfiguration()
aml_run_config.target = compute_target

USE_CURATED_ENV = False

if USE_CURATED_ENV:
    # We don't have a curated environment set up
    curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")
    aml_run_config.environment = curated_environment
else: 
    aml_run_config.environment.python.user_managed_dependencies = False
    # base docker environment
    #aml_run_config.environment.docker = DockerConfiguration(use_docker=True)
    aml_run_config.environment.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'
    
    # Add some packages relied on by data prep step
    conda_dependencies = CondaDependencies.create(
        #conda_packages=['tensorflow-gpu==2.2.0'],
        conda_packages=[], 
        pin_sdk_version=False)
    pip_packages=['joblib', 'pandas', 'tensorflow==2.2.0', 'keras', 'pillow','azureml-sdk', 'azureml-dataprep[fuse,pandas, random, math, os, warnings]'] 
    for pip_package in pip_packages:
        conda_dependencies.add_pip_package(pip_package)
    aml_run_config.environment.python.conda_dependencies = conda_dependencies
aml_run_config.environment.name = "InceptionV3_TinyImagenet"

In [22]:
aml_run_config
# Need to name the environment in order to save it to the workspace
aml_run_config.environment.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "InceptionV3_TinyImagenet",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
               

# Set Datastore to get output
Will be using this default datastore for the outputs of our training.

In [23]:
def_blob_store = Datastore(ws, "workspaceblobstore")
def_blob_store

{
  "name": "workspaceblobstore",
  "container_name": "azureml-blobstore-5f3f7498-5d05-44a6-a3f4-727830af8434",
  "account_name": "tinyimagstorage0691ba34b",
  "protocol": "https",
  "endpoint": "core.windows.net"
}

This "def_blob_store" object is essentially a reference to our default datastore's blob container. That output config object earlier references this. Think of it as passing along a path to the directory we want to save our files.

# Constructing Pipeline Steps

In [34]:
# Parameters
steps_per_epoch=150
num_epochs=10
batch_size=64
model="densenet" #"inception" or "densenet". 
trainable=False
run_eval=True

In [47]:
from azureml.pipeline.steps import PythonScriptStep

def_blob_store = Datastore(ws, "workspaceblobstore")
train_source_dir = "./train"
entry_point = "train.py"

# Place to save the outputted model. This is converted into a path for the model to save. 
output = OutputFileDatasetConfig(destination=(
            def_blob_store,'./model/run_{run-id}'))
# So looking inside our datastore... inside the container name listed in the cell above..., 
# you should see a directory "model" with subdirectories representing each run.

# Use the 4 nodes we have
#distributed_training_config = MpiConfiguration(node_count=4)

train_step = PythonScriptStep(
                script_name=entry_point,
                arguments=[
                        "--data_path", "./data", # Path the data is mounted to. Look in train.py
                        "--steps_per_epoch", steps_per_epoch, #params
                        "--num_epochs", num_epochs,
                        "--batch_size", batch_size,
                        #"--model", "inception",
                        "--model", model,
                        "--trainable", trainable,
                        "--save_dest", output,  #The output config object from earlier. See train.py
                        "--run_eval", run_eval # Needs to be implemented still
                          ],
                inputs=[ds_input],
                compute_target=compute_target,
                source_directory=train_source_dir,
                runconfig=aml_run_config,
                allow_reuse=True
            )

# Metric Logging Experiment

In [48]:
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment

pipeline = Pipeline(workspace=ws, steps=[train_step])
print("Pipeline made. Submitting run...")
exp = Experiment(ws, "InceptionV3_Metric_Logging")
pipeline_run = exp.submit(pipeline)
#run = pipeline_run.start_logging(outputs=None, snapshot_directory=None)
print("Submitted. Waiting for completion...")
pipeline_run.wait_for_completion()
print("Completed! Experiment should appear under the Experiments page.")

Pipeline made. Submitting run...
Created step train.py [aaf0c025][83f2dcb2-bf45-40a3-8235-7d051a66c77e], (This step will run and generate new outputs)
Submitted PipelineRun 2e8bd64b-42ec-4027-9a1d-8fce9002829e
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/2e8bd64b-42ec-4027-9a1d-8fce9002829e?wsid=/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/azureml_uw_imageclassification/workspaces/tiny-image-net&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
Submitted. Waiting for completion...
PipelineRunId: 2e8bd64b-42ec-4027-9a1d-8fce9002829e
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/2e8bd64b-42ec-4027-9a1d-8fce9002829e?wsid=/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/azureml_uw_imageclassification/workspaces/tiny-image-net&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRun Status: Running


StepRunId: d2c9ccc4-903e-4b8f-91ff-9c78bbe5e22a
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/d2c9ccc4-903e-

ExperimentExecutionException: ExperimentExecutionException:
	Message: The output streaming for the run interrupted.
But the run is still executing on the compute target. 
Details for canceling the run can be found here: https://aka.ms/aml-docs-cancel-run
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "The output streaming for the run interrupted.\nBut the run is still executing on the compute target. \nDetails for canceling the run can be found here: https://aka.ms/aml-docs-cancel-run"
    }
}

## Get Model from run

In [49]:
for step in pipeline_run.get_steps():
    run_id = step.id

In [58]:
run_id

In [59]:
def_blob_store.download(target_path="./run_output",
                        prefix=f"model/run_{run_id}", 
                        show_progress=True)

Downloading model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/saved_model.pb
Downloaded model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/saved_model.pb, 1 files out of an estimated total of 6
Downloading model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/variables/variables.data-00000-of-00002
Downloaded model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/variables/variables.data-00000-of-00002, 2 files out of an estimated total of 6
Downloading model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/variables/variables.index
Downloaded model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/densenet_imagenet/variables/variables.index, 3 files out of an estimated total of 6
Downloading model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/history.pkl
Downloaded model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/history.pkl, 4 files out of an estimated total of 6
Downloading model/run_e8b34de1-d74c-4d99-9fdd-d8033c6e9676/run_params.pkl
Down

6

In [53]:
from azureml.core import Model

if model == "inception":
    model = Model.register(
                model_path=f"./run_output/model/run_{run_id}/{model}_imagenet/",
                model_name="Inceptionv3_tinyimagenet_model_quickrun",
                tags={'area': 'Tiny Imagenet Image Classification', 'type': 'image classification'},
                description="InceptionV3 model to predict 200 classes",
                workspace=ws,
                model_framework=Model.Framework.TENSORFLOW,
                model_framework_version="2.2.0")  
else: # Densenet
    model = Model.register(
        model_path=f"./run_output/model/run_{run_id}/{model}_imagenet/",
        model_name="Densenet_TinyImagenet",
        tags={'area': 'Tiny Imagenet Image Classification', 'type': 'image classification'},
        description="Densenet model to predict 200 classes",
        workspace=ws,
        model_framework=Model.Framework.TENSORFLOW,
        model_framework_version="2.2.0")  

Registering model Densenet_TinyImagenet


# Clean Up Files

Run cell if you don't need the history, model, etc. anymore

In [54]:
!rm -rf ./run_output/

In [19]:
from azureml.core import Model
model = Model.register(
                    model_path=f"./run_output/model/run_{run_id}/{model}_imagenet/",
                    model_name="Inceptionv3_tinyimagenet_model_quickrun",
                    tags={'area': 'Tiny Imagenet Image Classification', 'type': 'image classification'},
                    description="InceptionV3 model to predict 200 classes",
                    workspace=ws,
                    model_framework=Model.Framework.TENSORFLOW,
                    model_framework_version="2.2.0")

Registering model Inceptionv3_tinyimagenet_model


In [18]:
Model.Framework.TENSORFLOW

'TensorFlow'

This training run took 30 minutes and 1 second

# Doing another run - diff params

In [21]:
from azureml.pipeline.steps import PythonScriptStep

def_blob_store = Datastore(ws, "workspaceblobstore")
train_source_dir = "./train"
entry_point = "train.py"

# Place to save the outputted model. This is converted into a path for the model to save. 
output = OutputFileDatasetConfig(destination=(
            def_blob_store,'./model/run_{run-id}'))
# So looking inside our datastore... inside the container name listed in the cell above..., 
# you should see a directory "model" with subdirectories representing each run.

# Use the 4 nodes we have
#distributed_training_config = MpiConfiguration(node_count=4)

train_step = PythonScriptStep(
                script_name=entry_point,
                arguments=[
                        "--data_path", "./data", # Path the data is mounted to. Look in train.py
                        "--steps_per_epoch", 150, #params
                        "--num_epochs", 15,
                        "--batch_size", 32,
                        "--save_dest", output,  #The output config object from earlier. See train.py
                        "--run_eval", True # Needs to be implemented still
                          ],
                inputs=[ds_input],
                compute_target=compute_target,
                source_directory=train_source_dir,
                runconfig=aml_run_config,
                allow_reuse=True
            )

# Metric Logging Experiment pt 2

In [22]:
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment

pipeline = Pipeline(workspace=ws, steps=[train_step])
print("Pipeline made. Submitting run...")
exp = Experiment(ws, "InceptionV3_Metric_Logging")
pipeline_run = exp.submit(pipeline)
#run = pipeline_run.start_logging(outputs=None, snapshot_directory=None)
print("Submitted. Waiting for completion...")
pipeline_run.wait_for_completion()
print("Completed! Experiment should appear under the Experiments page.")

Pipeline made. Submitting run...
Created step train.py [ef68cbb6][d87d35b5-41ae-43fb-bec3-c2ea672fd7c7], (This step will run and generate new outputs)
Submitted PipelineRun 47b0da5d-2ccb-4452-8ed4-d870dda21346
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/47b0da5d-2ccb-4452-8ed4-d870dda21346?wsid=/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/azureml_uw_imageclassification/workspaces/tiny-image-net&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
Submitted. Waiting for completion...
PipelineRunId: 47b0da5d-2ccb-4452-8ed4-d870dda21346
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/47b0da5d-2ccb-4452-8ed4-d870dda21346?wsid=/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/azureml_uw_imageclassification/workspaces/tiny-image-net&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRun Status: Running


StepRunId: 2e129d23-6071-48aa-be8e-7098eaa7aa41
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/2e129d23-6071-

Found 100000 images belonging to 200 classes.
Found 10000 validated image filenames belonging to 200 classes.
Epoch 1/15
2021-05-27 22:26:19.766033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-05-27 22:26:20.622423: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

  1/150 [..............................] - ETA: 0s - loss: 5.4901 - accuracy: 0.0000e+
  2/150 [..............................] - ETA: 11s - loss: 9.2926 - accuracy: 0.0156  
  3/150 [..............................] - ETA: 34s - loss: 10.3373 - accuracy: 0.01
  4/150 [..............................] - ETA: 47s - loss: 10.4231 - accuracy: 0.00
  5/150 [>.............................] - ETA: 1:21 - loss: 10.1037 - accuracy: 0.006
  6/150 [>.............................] - ETA: 1:31 - loss: 9.8524 - accuracy: 0.0052 
  7/150 [>.............................] - ETA: 1:35 - loss: 9.4708 - ac

Epoch 2/15

  1/150 [..............................] - ETA: 0s - loss: 2.7038 - accuracy: 0.40
  2/150 [..............................] - ETA: 1:08 - loss: 3.0339 - accuracy: 0.35
  3/150 [..............................] - ETA: 1:30 - loss: 2.9928 - accuracy: 0.35
  4/150 [..............................] - ETA: 1:28 - loss: 2.9382 - accuracy: 0.35
  5/150 [>.............................] - ETA: 1:37 - loss: 2.9062 - accuracy: 0.35
  6/150 [>.............................] - ETA: 1:42 - loss: 2.8937 - accuracy: 0.34
  7/150 [>.............................] - ETA: 1:36 - loss: 2.8355 - accuracy: 0.3616
  8/150 [>.............................] - ETA: 1:33 - loss: 2.8779 - accuracy: 0.35
  9/150 [>.............................] - ETA: 1:31 - loss: 2.7799 - accuracy: 0.37
 10/150 [=>............................] - ETA: 1:34 - loss: 2.7713 - accuracy: 0.38
 11/150 [=>............................] - ETA: 1:32 - loss: 2.7422 - accuracy: 0.38
 12/150 [=>............................] - ETA: 1:30 

Epoch 3/15

  1/150 [..............................] - ETA: 0s - loss: 2.2959 - accuracy: 0.43
  2/150 [..............................] - ETA: 46s - loss: 2.2192 - accuracy: 0.484
  3/150 [..............................] - ETA: 1:03 - loss: 2.2525 - accuracy: 0.46
  4/150 [..............................] - ETA: 1:11 - loss: 2.1698 - accuracy: 0.49
  5/150 [>.............................] - ETA: 1:20 - loss: 2.0815 - accuracy: 0.50
  6/150 [>.............................] - ETA: 1:20 - loss: 2.1322 - accuracy: 0.50
  7/150 [>.............................] - ETA: 1:19 - loss: 2.0952 - accuracy: 0.49
  8/150 [>.............................] - ETA: 1:23 - loss: 2.1575 - accuracy: 0.4883


  9/150 [>.............................] - ETA: 1:21 - loss: 2.1535 - accuracy: 0.48
 10/150 [=>............................] - ETA: 1:21 - loss: 2.1156 - accuracy: 0.49
 11/150 [=>............................] - ETA: 1:21 - loss: 2.0457 - accuracy: 0.50
 12/150 [=>............................] - ETA: 1:21 - loss: 2.0221 - accuracy: 0.51
 13/150 [=>............................] - ETA: 1:21 - loss: 2.0655 - accuracy: 0.50
 14/150 [=>............................] - ETA: 1:19 - loss: 2.0727 - accuracy: 0.50
 15/150 [==>...........................] - ETA: 1:20 - loss: 2.0831 - accuracy: 0.51
 16/150 [==>...........................] - ETA: 1:22 - loss: 2.1080 - accuracy: 0.50
 17/150 [==>...........................] - ETA: 1:21 - loss: 2.1249 - accuracy: 0.50
 18/150 [==>...........................] - ETA: 1:22 - loss: 2.1420 - accuracy: 0.49
 19/150 [==>...........................] - ETA: 1:20 - loss: 2.1848 - accuracy: 0.49
 20/150 [===>..........................] - ETA: 1:19 - loss: 2.18

Epoch 4/15

  1/150 [..............................] - ETA: 0s - loss: 2.5380 - accuracy: 0.37
  2/150 [..............................] - ETA: 55s - loss: 2.0526 - accuracy: 0.484
  3/150 [..............................] - ETA: 1:17 - loss: 1.9425 - accuracy: 0.52
  4/150 [..............................] - ETA: 1:25 - loss: 1.9671 - accuracy: 0.50
  5/150 [>.............................] - ETA: 1:24 - loss: 1.8748 - accuracy: 0.54
  6/150 [>.............................] - ETA: 1:24 - loss: 1.8134 - accuracy: 0.55
  7/150 [>.............................] - ETA: 1:30 - loss: 1.8631 - accuracy: 0.54
  8/150 [>.............................] - ETA: 2:06 - loss: 1.8603 - accuracy: 0.54
  9/150 [>.............................] - ETA: 2:01 - loss: 1.8462 - accuracy: 0.54
 10/150 [=>............................] - ETA: 1:56 - loss: 1.8133 - accuracy: 0.54
 11/150 [=>............................] - ETA: 1:52 - loss: 1.8559 - accuracy: 0.53
 12/150 [=>............................] - ETA: 1:49 - 

Epoch 5/15

  1/150 [..............................] - ETA: 0s - loss: 1.9769 - accuracy: 0.46
  2/150 [..............................] - ETA: 1:16 - loss: 2.3435 - accuracy: 0.42
  3/150 [..............................] - ETA: 1:22 - loss: 2.3666 - accuracy: 0.44
  4/150 [..............................] - ETA: 1:15 - loss: 2.1804 - accuracy: 0.48
  5/150 [>.............................] - ETA: 1:36 - loss: 2.1370 - accuracy: 0.5000
  6/150 [>.............................] - ETA: 1:33 - loss: 2.1113 - accuracy: 0.49
  7/150 [>.............................] - ETA: 1:31 - loss: 2.1154 - accuracy: 0.50
  8/150 [>.............................] - ETA: 1:29 - loss: 1.9691 - accuracy: 0.52
  9/150 [>.............................] - ETA: 1:28 - loss: 2.0364 - accuracy: 0.50
 10/150 [=>............................] - ETA: 1:23 - loss: 2.0185 - accuracy: 0.51
 11/150 [=>............................] - ETA: 1:19 - loss: 1.9695 - accuracy: 0.51
 12/150 [=>............................] - ETA: 1:17 

 34/150 [=====>........................] - ETA: 1:04 - loss: 1.9779 - accuracy: 0.54


Epoch 6/15

  1/150 [..............................] - ETA: 0s - loss: 1.9272 - accuracy: 0.53
  2/150 [..............................] - ETA: 57s - loss: 2.0815 - accuracy: 0.5156
  3/150 [..............................] - ETA: 1:23 - loss: 2.2095 - accuracy: 0.47
  4/150 [..............................] - ETA: 1:34 - loss: 2.2097 - accuracy: 0.50
  5/150 [>.............................] - ETA: 1:38 - loss: 2.1577 - accuracy: 0.51
  6/150 [>.............................] - ETA: 1:35 - loss: 2.1552 - accuracy: 0.53
  7/150 [>.............................] - ETA: 1:36 - loss: 2.2003 - accuracy: 0.52
  8/150 [>.............................] - ETA: 1:37 - loss: 2.1310 - accuracy: 0.53
  9/150 [>.............................] - ETA: 1:33 - loss: 2.1080 - accuracy: 0.53
 10/150 [=>............................] - ETA: 1:28 - loss: 2.0881 - accuracy: 0.53
 11/150 [=>............................] - ETA: 1:26 - loss: 2.0519 - accuracy: 0.54
 12/150 [=>............................] - ETA: 1:23 -

Epoch 7/15

  1/150 [..............................] - ETA: 0s - loss: 2.3015 - accuracy: 0.40
  2/150 [..............................] - ETA: 52s - loss: 1.9373 - accuracy: 0.468
  3/150 [..............................] - ETA: 1:20 - loss: 1.8634 - accuracy: 0.47
  4/150 [..............................] - ETA: 1:21 - loss: 1.8999 - accuracy: 0.49
  5/150 [>.............................] - ETA: 1:27 - loss: 1.9697 - accuracy: 0.49
  6/150 [>.............................] - ETA: 1:34 - loss: 1.8574 - accuracy: 0.52
  7/150 [>.............................] - ETA: 1:30 - loss: 1.8881 - accuracy: 0.51
  8/150 [>.............................] - ETA: 1:30 - loss: 1.9475 - accuracy: 0.51
  9/150 [>.............................] - ETA: 1:28 - loss: 1.9227 - accuracy: 0.5139
 10/150 [=>............................] - ETA: 1:28 - loss: 1.8830 - accuracy: 0.52
 11/150 [=>............................] - ETA: 1:26 - loss: 1.8849 - accuracy: 0.52
 12/150 [=>............................] - ETA: 1:22 

Epoch 8/15

  1/150 [..............................] - ETA: 0s - loss: 1.4098 - accuracy: 0.65
  2/150 [..............................] - ETA: 23s - loss: 1.6844 - accuracy: 0.625
  3/150 [..............................] - ETA: 32s - loss: 2.2551 - accuracy: 0.552
  4/150 [..............................] - ETA: 1:32 - loss: 2.4378 - accuracy: 0.52
  5/150 [>.............................] - ETA: 1:54 - loss: 2.3049 - accuracy: 0.53
  6/150 [>.............................] - ETA: 1:47 - loss: 2.2431 - accuracy: 0.53
  7/150 [>.............................] - ETA: 1:40 - loss: 2.1551 - accuracy: 0.55
  8/150 [>.............................] - ETA: 1:33 - loss: 2.1306 - accuracy: 0.54
  9/150 [>.............................] - ETA: 1:29 - loss: 2.0633 - accuracy: 0.55
 10/150 [=>............................] - ETA: 1:30 - loss: 2.0146 - accuracy: 0.56
 11/150 [=>............................] - ETA: 1:27 - loss: 1.9775 - accuracy: 0.57
 12/150 [=>............................] - ETA: 1:24 - 

 20/150 [===>..........................] - ETA: 1:09 - loss: 1.9087 - accuracy: 0.57
 21/150 [===>..........................] - ETA: 1:09 - loss: 1.9233 - accuracy: 0.56
 22/150 [===>..........................] - ETA: 1:08 - loss: 1.9451 - accuracy: 0.56
 23/150 [===>..........................] - ETA: 1:06 - loss: 1.9346 - accuracy: 0.56
 24/150 [===>..........................] - ETA: 1:06 - loss: 1.9243 - accuracy: 0.56
 25/150 [====>.........................] - ETA: 1:04 - loss: 1.9447 - accuracy: 0.56
 26/150 [====>.........................] - ETA: 1:04 - loss: 1.9303 - accuracy: 0.56
 27/150 [====>.........................] - ETA: 1:03 - loss: 1.9048 - accuracy: 0.56
 28/150 [====>.........................] - ETA: 1:02 - loss: 1.9064 - accuracy: 0.56
 29/150 [====>.........................] - ETA: 1:01 - loss: 1.8921 - accuracy: 0.56
 30/150 [=====>........................] - ETA: 1:00 - loss: 1.9039 - accuracy: 0.56
 31/150 [=====>........................] - ETA: 1:00 - loss: 1.89

Epoch 9/15

  1/150 [..............................] - ETA: 0s - loss: 2.5683 - accuracy: 0.37
  2/150 [..............................] - ETA: 39s - loss: 2.5592 - accuracy: 0.4219
  3/150 [..............................] - ETA: 57s - loss: 2.0535 - accuracy: 0.520
  4/150 [..............................] - ETA: 1:01 - loss: 2.1166 - accuracy: 0.50
  5/150 [>.............................] - ETA: 1:08 - loss: 1.9578 - accuracy: 0.54
  6/150 [>.............................] - ETA: 1:16 - loss: 1.8568 - accuracy: 0.55
  7/150 [>.............................] - ETA: 1:13 - loss: 1.8401 - accuracy: 0.54
  8/150 [>.............................] - ETA: 1:21 - loss: 1.8609 - accuracy: 0.54
  9/150 [>.............................] - ETA: 1:20 - loss: 1.8739 - accuracy: 0.54
 10/150 [=>............................] - ETA: 1:18 - loss: 1.8955 - accuracy: 0.54
 11/150 [=>............................] - ETA: 1:16 - loss: 1.9129 - accuracy: 0.53
 12/150 [=>............................] - ETA: 1:18 -

Epoch 10/15

  1/150 [..............................] - ETA: 0s - loss: 1.4435 - accuracy: 0.68
  2/150 [..............................] - ETA: 25s - loss: 1.5897 - accuracy: 0.609
  3/150 [..............................] - ETA: 42s - loss: 1.6512 - accuracy: 0.625
  4/150 [..............................] - ETA: 55s - loss: 1.7089 - accuracy: 0.593
  5/150 [>.............................] - ETA: 1:24 - loss: 1.7748 - accuracy: 0.59
  6/150 [>.............................] - ETA: 1:21 - loss: 1.8768 - accuracy: 0.58
  7/150 [>.............................] - ETA: 1:16 - loss: 1.8225 - accuracy: 0.59
  8/150 [>.............................] - ETA: 1:14 - loss: 1.7626 - accuracy: 0.58
  9/150 [>.............................] - ETA: 1:19 - loss: 1.7274 - accuracy: 0.59
 10/150 [=>............................] - ETA: 1:15 - loss: 1.6944 - accuracy: 0.59
 11/150 [=>............................] - ETA: 1:13 - loss: 1.7406 - accuracy: 0.57
 12/150 [=>............................] - ETA: 1:09 -



Epoch 11/15

  1/150 [..............................] - ETA: 0s - loss: 1.6102 - accuracy: 0.62
  2/150 [..............................] - ETA: 28s - loss: 1.5804 - accuracy: 0.656
  3/150 [..............................] - ETA: 39s - loss: 1.8504 - accuracy: 0.583
  4/150 [..............................] - ETA: 57s - loss: 1.6329 - accuracy: 0.625
  5/150 [>.............................] - ETA: 1:04 - loss: 1.7074 - accuracy: 0.61
  6/150 [>.............................] - ETA: 59s - loss: 1.7023 - accuracy: 0.6042
  7/150 [>.............................] - ETA: 1:01 - loss: 1.7240 - accuracy: 0.59
  8/150 [>.............................] - ETA: 59s - loss: 1.7592 - accuracy: 0.5898
  9/150 [>.............................] - ETA: 1:00 - loss: 1.7463 - accuracy: 0.58
 10/150 [=>............................] - ETA: 1:04 - loss: 1.8133 - accuracy: 0.57
 11/150 [=>............................] - ETA: 1:06 - loss: 1.8225 - accuracy: 0.57
 12/150 [=>............................] - ETA: 1:06

Epoch 12/15

  1/150 [..............................] - ETA: 0s - loss: 1.2178 - accuracy: 0.68
  2/150 [..............................] - ETA: 33s - loss: 1.7772 - accuracy: 0.6094
  3/150 [..............................] - ETA: 38s - loss: 2.1178 - accuracy: 0.541
  4/150 [..............................] - ETA: 41s - loss: 1.8538 - accuracy: 0.585
  5/150 [>.............................] - ETA: 47s - loss: 1.9588 - accuracy: 0.575
  6/150 [>.............................] - ETA: 59s - loss: 1.9470 - accuracy: 0.578
  7/150 [>.............................] - ETA: 58s - loss: 1.9141 - accuracy: 0.571
  8/150 [>.............................] - ETA: 55s - loss: 1.8820 - accuracy: 0.589
  9/150 [>.............................] - ETA: 55s - loss: 1.8248 - accuracy: 0.607
 10/150 [=>............................] - ETA: 54s - loss: 1.8408 - accuracy: 0.603
 11/150 [=>............................] - ETA: 56s - loss: 1.8416 - accuracy: 0.596
 12/150 [=>............................] - ETA: 57s -

Epoch 13/15

  1/150 [..............................] - ETA: 0s - loss: 1.0261 - accuracy: 0.78
  2/150 [..............................] - ETA: 42s - loss: 1.4198 - accuracy: 0.687
  3/150 [..............................] - ETA: 1:09 - loss: 1.5582 - accuracy: 0.68
  4/150 [..............................] - ETA: 1:20 - loss: 1.7471 - accuracy: 0.63
  5/150 [>.............................] - ETA: 1:13 - loss: 1.7233 - accuracy: 0.63
  6/150 [>.............................] - ETA: 1:07 - loss: 1.6985 - accuracy: 0.6302
  7/150 [>.............................] - ETA: 1:21 - loss: 1.7281 - accuracy: 0.62
  8/150 [>.............................] - ETA: 1:18 - loss: 1.7438 - accuracy: 0.62
  9/150 [>.............................] - ETA: 1:16 - loss: 1.6945 - accuracy: 0.63
 10/150 [=>............................] - ETA: 1:16 - loss: 1.6431 - accuracy: 0.64
 11/150 [=>............................] - ETA: 1:12 - loss: 1.6606 - accuracy: 0.62
 12/150 [=>............................] - ETA: 1:09



Epoch 14/15

  1/150 [..............................] - ETA: 0s - loss: 1.2157 - accuracy: 0.78
  2/150 [..............................] - ETA: 35s - loss: 1.3499 - accuracy: 0.671
  3/150 [..............................] - ETA: 47s - loss: 1.6691 - accuracy: 0.645
  4/150 [..............................] - ETA: 48s - loss: 1.6610 - accuracy: 0.656
  5/150 [>.............................] - ETA: 59s - loss: 1.5966 - accuracy: 0.643
  6/150 [>.............................] - ETA: 58s - loss: 1.6017 - accuracy: 0.645
  7/150 [>.............................] - ETA: 57s - loss: 1.5576 - accuracy: 0.651
  8/150 [>.............................] - ETA: 56s - loss: 1.6444 - accuracy: 0.625
  9/150 [>.............................] - ETA: 52s - loss: 1.6632 - accuracy: 0.614
 10/150 [=>............................] - ETA: 50s - loss: 1.6552 - accuracy: 0.612
 11/150 [=>............................] - ETA: 49s - loss: 1.6555 - accuracy: 0.622
 12/150 [=>............................] - ETA: 49s - 

Epoch 15/15

  1/150 [..............................] - ETA: 0s - loss: 1.6104 - accuracy: 0.68
  2/150 [..............................] - ETA: 37s - loss: 1.7736 - accuracy: 0.609
  3/150 [..............................] - ETA: 1:09 - loss: 1.9270 - accuracy: 0.57
  4/150 [..............................] - ETA: 1:04 - loss: 1.8998 - accuracy: 0.56
  5/150 [>.............................] - ETA: 1:02 - loss: 1.8980 - accuracy: 0.54
  6/150 [>.............................] - ETA: 1:02 - loss: 1.8481 - accuracy: 0.53
  7/150 [>.............................] - ETA: 59s - loss: 1.7250 - accuracy: 0.5580
  8/150 [>.............................] - ETA: 1:03 - loss: 1.7054 - accuracy: 0.55
  9/150 [>.............................] - ETA: 1:01 - loss: 1.7685 - accuracy: 0.5556
 10/150 [=>............................] - ETA: 1:01 - loss: 1.7906 - accuracy: 0.55
 11/150 [=>............................] - ETA: 1:02 - loss: 1.7559 - accuracy: 0.55
 12/150 [=>............................] - ETA: 1:0

Finished first round of training.
2021-05-27 22:52:24.129071: W tensorflow/python/util/util.cc:329] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
TODO: Going to run evaluation on eval split!


[2021-05-27T22:52:50.400283] The experiment completed successfully. Finalizing run...
Cleaning up all outstanding Run operations, waiting 900.0 seconds
2 items cleaning up...
Cleanup took 0.12830734252929688 seconds
[2021-05-27T22:52:50.721702] Finished context manager injector.
2021/05/27 22:52:53 Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/status
2021/05/27 22:52:53 Not exporting to RunHistory as the exporter is either stopped or there is no data.
Stopped: false
OriginalData: 2
FilteredData: 0.
2021/05/27 22:52:53 Process Exiting with Code:  0
2021/05/27 22:52:53 All App Insights Logs was send successfully

Streaming azureml-


StepRun(train.py) Execution Summary
StepRun( train.py ) Status: Finished
{'runId': '2e129d23-6071-48aa-be8e-7098eaa7aa41', 'target': 'compute-cluster', 'status': 'Completed', 'startTimeUtc': '2021-05-27T22:23:39.860292Z', 'endTimeUtc': '2021-05-27T22:53:24.437286Z', 'properties': {'ContentSnapshotId': '3578eb64-426a-41f5-a03f-aae8fe85d87c', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': 'd87d35b5-41ae-43fb-bec3-c2ea672fd7c7', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': 'ef68cbb6', 'azureml.pipelinerunid': '47b0da5d-2ccb-4452-8ed4-d870dda21346', '_azureml.ComputeTargetType': 'amlcompute', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [{'dataset': {'id': '3ab6de1d-dd35-4e73-9658-566d601dd1bb'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'input__3ab6de1d', 'mechanism': 'Mount', 'pathOnCompute': './data'}}], 'outputDatasets': [{'identifier': 



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '47b0da5d-2ccb-4452-8ed4-d870dda21346', 'status': 'Completed', 'startTimeUtc': '2021-05-27T22:23:26.144767Z', 'endTimeUtc': '2021-05-27T22:53:27.542759Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://tinyimagstorage0691ba34b.blob.core.windows.net/azureml/ExperimentRun/dcid.47b0da5d-2ccb-4452-8ed4-d870dda21346/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=wA5l4fgGhMT6xXBpwbPYoQi3Wtn63afF%2Ffhm9S%2F0K4w%3D&st=2021-05-27T22%3A43%3A28Z&se=2021-05-28T06%3A53%3A28Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://tinyimagstorage0691ba34b.blob.core.windows.net/azureml/ExperimentRun/dcid.47b0da5d-2ccb-4452-8ed4-d870dda21346/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=48fbFhOOGCbxsn36ljd0oP%2F%2FjmeoCSqeyZlUbHfJMGM%3D&st=2021-05-27T22%

## Get Model from run

In [23]:
for step in pipeline_run.get_steps():
    run_id = step.id

In [56]:
run_id = 'e8b34de1-d74c-4d99-9fdd-d8033c6e9676'

In [57]:
def_blob_store.download(target_path="./run_output",
                        prefix=f"outputs/run_{run_id}", 
                        show_progress=True)

0

In [25]:
from azureml.core import Model
model = Model.register(
                    model_path=f"./run_output/model/run_{run_id}/inceptionv3_imagenet/",
                    model_name="Inceptionv3_tinyimagenet_model_run_2",
                    tags={'area': 'Tiny Imagenet Image Classification', 'type': 'image classification'},
                    description="InceptionV3 model to predict 200 classes",
                    workspace=ws,
                    model_framework=Model.Framework.TENSORFLOW,
                    model_framework_version="2.2.0")

Registering model Inceptionv3_tinyimagenet_model_run_2


In [18]:
Model.Framework.TENSORFLOW

'TensorFlow'

This training run took 30 minutes and 1 second

# Part of original notebook

In [None]:
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment

pipeline = Pipeline(workspace=ws, steps=[train_step])
print("Pipeline made. Submitting run...")
exp = Experiment(ws, "Training_InceptionV3")
pipeline_run = exp.submit(pipeline)
#run = pipeline_run.start_logging(outputs=None, snapshot_directory=None)
print("Submitted. Waiting for completion...")
pipeline_run.wait_for_completion()
print("Completed! Experiment should appear under the Experiments page.")

In [None]:
import os
print(output_data.datastore)

# Model Saved
The model's been saved to the default datastore associated with the account. We can now load it into our workspace, save a local copy, register the model, etc.

We tried utilizing the built in work-flow AzureML has set up by saving directly to outputs or logs directory but some issues have come up that we are not entirely sure why.

Saving to those directories would save the files within a directory named "outputs" of each step and not the experiment. This meant we couldn't get the files directly after training from the PipelineRun object. Did find another work-around that still has some merits. Still looking into why it's not saving to the output and logs folder for the experiment.

We can get the reference to the saved model's location through a couple steps. Previously, we specified an output file path to save the model, we can extract it using the same datastore object.

**def_blob_store.download()**. We need to provide the path to save the file to, and the prefix of the path on the datastore.

We specified to save the models to: model/run_{run_id}

We can get the run id from the pipelinerun object we used to submit the training job.

In [None]:
for step in pipeline_run.get_steps():
    run_id = step.id

Had to get the id of the step. Every resource we've looked into said saving to **outputs** or **logs** would save them to the experiment output file for persistence - those two specifically were special directory names. 

E.g. saving an object named "example" as example.txt by doing joblib.dump(example, "objects/example.txt") in train.py would save it in the output of the step, not the experiment. This download work around is temporary until we can figure out why this portion is not working.

In [None]:
def_blob_store.download(target_path="./run_output",
                        prefix=f"model/run_{run_id}", 
                        show_progress=True)

With the model downloaded, we can register the model, save a local copy, etc.

In [None]:
!ls ./run_output/model

In [None]:
!ls ./run_output/model/run_b9d700aa-5b98-4ef2-84f0-851393f692cd

Can see the saved files there. This solution is not as automated as we want it to be however.

# Clean Up

Clean up downloaded model if we're done using it. Run cell if you don't need the model files anymore.

In [None]:
!rm -rf ./run_output/