You should already have a train.py, driver_training.py, and a parameters.json in an experiment folder. These resources will be used as the first step in the Machine Learning pipeline created and run later in this notebook.

In [1]:
!pwd

/mnt/batch/tasks/shared/LS_root/mounts/clusters/davew202105/code/git/MLOps-E2E-sdkv2/Lab12


In [2]:
# Set the folder for the experiment files used in Challenge 2
training_folder = 'driver-training'

## register_model.py
This script loads the model from where it was saved, and then registers it in the workspace. This will be the second step in the pipeline. The script is written to the experiment folder from this notebook for convenience.

Note:  if we decide not to use AMLS pipelines and just want to use a more DevOps-centric approach (using gh actions or azdo pipelines instead of AMLS pipelines), we can.  

In [3]:
%%writefile $training_folder/register_model.py
# Import libraries
import argparse
import joblib
from azureml.core import Workspace, Model, Run

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument('--model_folder', type=str, dest='model_folder', default="driver_model", help='model location')
args = parser.parse_args()
model_folder = args.model_folder

# Get the experiment run context
run = Run.get_context()

# load the model
print("Loading model from " + model_folder)
model_name = 'driver_model'
model_file = model_folder + "/" + model_name + ".pkl"

# Get metrics for registration
## TODO
## HINT: Try storing the metrics in the parent run, which will be
##       accessible during both the training and registration
##       child runs using the 'run.parent' API.
## See https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#parent

Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'driver_model',
               tags={'Training context':'Pipeline'})

run.complete()

Writing driver-training/register_model.py


## Create an Azure Machine Learning Pipeline to Run the Scripts as a Pipeline

See [this tutorial](https://github.com/MicrosoftDocs/mslearn-aml-labs/blob/master/05-Creating_a_Pipeline.ipynb) for a starting point

Use the scikit-learn and lightgbm conda packages

In [4]:
import azureml.core
from azureml.core import Workspace

# Load the workspace
ws = Workspace.from_config()

In [5]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "automl"  #changeme

# Verify that the compute cluster exists
try:
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If not, create it
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           vm_priority='lowpriority', 
                                                           max_nodes=4)
    pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

pipeline_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [6]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

env_name = training_folder + '-env'

# Create a Python environment for the experiment
env = Environment(env_name)
env.python.user_managed_dependencies = False  # Let Azure ML manage dependencies
env.docker.enabled = True                     # Use a docker container

# Create the pip and conda package dependencies
dependencies = CondaDependencies.create(
    conda_packages=['scikit-learn', 'pandas'],
    pip_packages=['azureml-sdk','lightgbm'])

# Add the package dependencies to the Python environment for the experiment
env.python.conda_dependencies = dependencies

# Register the environment 
env.register(workspace=ws)
registered_env = Environment.get(ws, env_name)

# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()

# Assign the target of the runconfig object to the cluster created above  
pipeline_run_config.target = pipeline_cluster

# Assign the environment of the runconfig object to the registered environment
pipeline_run_config.environment = registered_env

print ("Run configuration created.")

'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


Run configuration created.


In [7]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep, EstimatorStep
from azureml.train.estimator import Estimator

# Create a PipelineData (temporary Data Reference) for the model folder
model_folder = PipelineData(
    "model_folder", 
    datastore=ws.get_default_datastore())

# Create Estimator to train the model like we did before
estimator = Estimator(
    source_directory=training_folder,
    compute_target = pipeline_cluster,
    environment_definition=pipeline_run_config.environment,
    entry_script='driver_training.py')

# Create Step 1, which runs the estimator to train the model
train_step = EstimatorStep(
    name = "Train Model",
    estimator=estimator, 
    estimator_entry_script_arguments=['--output_folder', model_folder],
    outputs=[model_folder],
    compute_target = pipeline_cluster,
    allow_reuse = True)

# Create Step 2, which runs the model registration script
register_step = PythonScriptStep(name = "Register Model",
                                source_directory = training_folder,
                                script_name = "register_model.py",
                                arguments = ['--model_folder', model_folder],
                                inputs=[model_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print("Pipeline steps defined")

'Estimator' is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or an Azure ML curated environment.
'EstimatorStep' is deprecated. Please use 'CommandStep' from 'azureml.pipeline.steps.CommandStep' instead. For an example see https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-commandstep.ipynb.


Pipeline steps defined


In [8]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline, which contains Step 1 & 2
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'driver-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")

RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion()

# try monitoring this from the Azure portal as well.  
# make sure you understand approximately what this is doing

'EstimatorStep' is deprecated. Please use 'CommandStep' from 'azureml.pipeline.steps.CommandStep' instead. For an example see https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-commandstep.ipynb.


Pipeline is built.
Created step Train Model [5a0d70b8][37131c3c-7720-404b-98fb-8dc06b1c8178], (This step will run and generate new outputs)
Created step Register Model [cd5139c5][e2d4d68c-8c17-47fe-85b8-bdfb3640cb1e], (This step will run and generate new outputs)
Submitted PipelineRun de66bf8b-fa49-4584-9184-d1c676565692
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/de66bf8b-fa49-4584-9184-d1c676565692?wsid=/subscriptions/52061d21-01dd-4f9e-aca9-60fff4d67ee2/resourcegroups/MLOpsWorkshop/workspaces/mlops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
Pipeline submitted for execution.


_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

PipelineRunId: de66bf8b-fa49-4584-9184-d1c676565692
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/de66bf8b-fa49-4584-9184-d1c676565692?wsid=/subscriptions/52061d21-01dd-4f9e-aca9-60fff4d67ee2/resourcegroups/MLOpsWorkshop/workspaces/mlops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
PipelineRun Status: NotStarted
PipelineRun Status: Running


StepRunId: db0a8f0f-3511-495b-88cf-137a698bd522
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/db0a8f0f-3511-495b-88cf-137a698bd522?wsid=/subscriptions/52061d21-01dd-4f9e-aca9-60fff4d67ee2/resourcegroups/MLOpsWorkshop/workspaces/mlops&tid=72f988bf-86f1-41af-91ab-2d7cd011db47
StepRun( Train Model ) Status: NotStarted
StepRun( Train Model ) Status: Running

Streaming azureml-logs/20_image_build_log.txt
2021/05/13 17:56:06 Downloading source code...
2021/05/13 17:56:07 Finished downloading source code
2021/05/13 17:56:08 Creating Docker network: acb_default_network, driver: 'bridge'
2021/05/13 17:56:08 Successfully set

'Finished'

In [9]:
# Print the model name, version, tag, and properties
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

driver_model version: 2
	 Training context : Pipeline


driver_model.pkl version: 2


driver_model version: 1
	 Training context : Pipeline


driver_model.pkl version: 1


compliance-classifier version: 17
	 type : classification
	 run_id : 0767a613-e8c2-4811-ae6b-64369e340725
	 build_number : 20201217.1


BikeBuyer.mml version: 4


AutoMLb9be0a22f28 version: 1


compliance-classifier version: 16
	 type : classification
	 run_id : a7eb32c9-3207-4a2c-865a-d65fba7946ba
	 build_number : 20201119.1


compliance-classifier version: 15
	 type : classification
	 run_id : 8e84d5f5-54c9-48c9-b733-9bbbcd86558b
	 build_number : 20201118.1


IBM_attrition_model version: 1
	 area : HR
	 type : attrition


AutoML015fd913221 version: 1


diabetes_model version: 1
	 Training context : Inline Training
	 AUC : 0.8857431111811085
	 Accuracy : 0.9002222222222223


glove-text-classifier version: 1
	 type : classification
	 run_id : glove-embeddings-classifier_1597854176_900c5cea


compliance-classifier ver