## Set experiment folder

**NOTE:** This template notebook assume you have succcessfully ran through Challenge 2. You should already have a train.py, driver_training.py, along with a parameters.json in an experiments folder. Using this template, it'll create another script file for registrating the model called registration.py that'll be added to the experiment folder (defined in the cell below). 

In [1]:
# Set the folder for the experiment files used in Challenge 2
training_folder = 'driver-training'

## register_model.py
This file loads the model from where it was saved, and then registers it in the workspace.  

In [2]:
%%writefile $training_folder/register_model.py
# Import libraries
import argparse
import joblib
from azureml.core import Workspace, Model, Run

# Get parameters
parser = argparse.ArgumentParser()
parser.add_argument(
    '--model_folder',
    type=str,
    dest='model_folder',
    default="driver_model",
    help='model location')
args = parser.parse_args()
model_folder = args.model_folder

# Get the experiment run context
run = Run.get_context()

# Load the model
print("Loading model from " + model_folder)
model_file = model_folder + "/driver_model.pkl"
model = joblib.load(model_file)

Model.register(workspace=run.experiment.workspace,
               model_path = model_file,
               model_name = 'driver_model',
               tags={'Training context':'Pipeline'})

run.complete()

Overwriting driver-training/register_model.py


In [3]:
import azureml.core
from azureml.core import Workspace

# Load the workspace
ws = Workspace.from_config()

## Create an Azure Machine Learning Pipeline to Run the Scripts as a Pipeline

See [this tutorial](https://github.com/MicrosoftDocs/mslearn-aml-labs/blob/master/05-Creating_a_Pipeline.ipynb) for a starting point

Use the scikit-learn and lightgbm conda packages

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Verify that the compute cluster exists
# If not, create it
## TODO
cluster_name = "team5hacker2"

# Verify that cluster exists
try:
    pipeline_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If not, create it
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4,
                                                           idle_seconds_before_scaledown=1800)
    pipeline_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

pipeline_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [8]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration

# Create a Python environment for the experiment
# Let Azure ML manage dependencies by setting user_managed_dependencies to False
# Use docker containers by setting docker.enabled to True 
## TODO
diabetes_env = Environment("driver-pipeline-env")
diabetes_env.python.user_managed_dependencies = False # Let Azure ML manage dependencies
diabetes_env.docker.enabled = True # Use a docker container

# Create a the pip and conda package dependencies
## TODO
driver_packages = CondaDependencies.create(conda_packages=['scikit-learn','pandas','lightgbm'],
                                             pip_packages=['azureml-sdk'])

# Add the package dependencies to the Python environment for the experiment
## TODO
diabetes_env.python.conda_dependencies = driver_packages

# Register the environment 
## TODO
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'driver-pipeline-env')

# Create a new runconfig object for the pipeline
## TODO
pipeline_run_config = RunConfiguration()

# Assign the target of the runconfig object to the cluster created above  
## TODO
pipeline_run_config.target = pipeline_cluster

# Assign the environment of the runconfig object to the registered environment
## TODO
pipeline_run_config.environment = registered_env

print ("Run configuration created.")


Run configuration created.


In [12]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep, EstimatorStep
from azureml.train.estimator import Estimator

# Create a PipelineData (temporary Data Reference) for the model folder
## TODO
training_folder = 'driver-training'
experiment_folder = 'driver-training'
model_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())

diabetes_ds = ws.datasets.get("driver dataset")

# Create Estimator to train the model as in Challenge 2
## TODO
estimator = Estimator(source_directory=training_folder,
                        compute_target = pipeline_cluster,
                        environment_definition=pipeline_run_config.environment,
                        entry_script='driver_training.py')

# Create Step 1, which runs the estimator to train the model
## TODO
train_step = EstimatorStep(name = "Train Model",
                           estimator=estimator, 
                           estimator_entry_script_arguments=['--output_folder', model_folder],
                           inputs=[diabetes_ds.as_named_input('driver_train')],
                           outputs=[model_folder],
                           compute_target = pipeline_cluster,
                           allow_reuse = True)
# Create Step 2, which runs the model registration script
## TODO
register_step = PythonScriptStep(name = "Register Model",
                                source_directory = experiment_folder,
                                script_name = "register_model.py",
                                arguments = ['--model_folder', model_folder],
                                inputs=[model_folder],
                                compute_target = pipeline_cluster,
                                runconfig = pipeline_run_config,
                                allow_reuse = True)

print("Pipeline steps defined")

Pipeline steps defined


In [13]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.widgets import RunDetails

# Construct the pipeline, which contains Step 1 & 2
## TODO

# Create an experiment and run the pipeline
## TODO

# Construct the pipeline
pipeline_steps = [train_step, register_step]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'driver-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
print("Pipeline submitted for execution.")

RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion()

Pipeline is built.
Created step Train Model [8517720d][7003a134-8b49-4efb-9382-25bd7973ad52], (This step will run and generate new outputs)
Created step Register Model [c9f80262][570e562e-9f5a-4694-aa4d-49d7893b28ca], (This step will run and generate new outputs)
Submitted PipelineRun 410437fd-0559-43e4-9ecf-6973d097bcb7
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/driver-training-pipeline/runs/410437fd-0559-43e4-9ecf-6973d097bcb7?wsid=/subscriptions/b4f30574-19b5-4753-926d-877888e82fc4/resourcegroups/oh-dsdata-data/workspaces/team5ws
Pipeline submitted for execution.


_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

PipelineRunId: 410437fd-0559-43e4-9ecf-6973d097bcb7
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/driver-training-pipeline/runs/410437fd-0559-43e4-9ecf-6973d097bcb7?wsid=/subscriptions/b4f30574-19b5-4753-926d-877888e82fc4/resourcegroups/oh-dsdata-data/workspaces/team5ws
PipelineRun Status: Running


StepRunId: a446cd3b-0f32-4c90-8d71-f7b114a226d0
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/driver-training-pipeline/runs/a446cd3b-0f32-4c90-8d71-f7b114a226d0?wsid=/subscriptions/b4f30574-19b5-4753-926d-877888e82fc4/resourcegroups/oh-dsdata-data/workspaces/team5ws
StepRun( Train Model ) Status: NotStarted
StepRun( Train Model ) Status: Queued

Streaming azureml-logs/20_image_build_log.txt
StepRun( Train Model ) Status: Running
2020/05/27 12:34:33 Downloading source code...
2020/05/27 12:34:34 Finished downloading source code
2020/05/27 12:34:34 Creating Docker network: acb_default_network, driver: 'bridge'
2020/05/27 12:34:35 Successfull

'Finished'

In [14]:
# Print the model name, version, tag, and properties
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

driver_model version: 1
	 Training context : Pipeline


driver_model.pkl version: 12
	 auc : 0.6377511613946426


driver_model.pkl version: 11
	 auc : 0.6380025131414137


driver_model.pkl version: 10
	 metrics : {'driver-training_1590531244_e6526452': {'learning_rate': 0.04, 'boosting_type': 'gbdt', 'objective': 'binary', 'metric': 'auc', 'sub_feature': 0.7, 'num_leaves': 60, 'min_data': 100, 'verbose': 0, 'min_hessian': 1, 'auc': 0.6380025131414137}}


driver_model.pkl version: 9
	 metrics : {'driver-training_1590530540_fe469c8e': {'learning_rate': 0.02, 'boosting_type': 'gbdt', 'objective': 'binary', 'metric': 'auc', 'num_leaves': 60, 'sub_feature': 0.7, 'verbose': 0, 'min_hessian': 1, 'min_data': 100, 'auc': 0.6377511613946426}}


driver_model.pkl version: 8
	 auc : {'driver-training_1590530540_fe469c8e': {'learning_rate': 0.02, 'boosting_type': 'gbdt', 'objective': 'binary', 'metric': 'auc', 'num_leaves': 60, 'sub_feature': 0.7, 'verbose': 0, 'min_hessian': 1, 'min_data': 100, 'au