Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Build AML Pipeline with azureml modules

In this tutorial you will learn how to work with Azure ML module:

1. Setup enrivonment - install module CLI and module/pipeline SDK
2. Register a few sample modules into your aml workspace using CLI
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2




## Setup environment and prepare workspace
* Install azure cli with azure-cli-ml extension, setup a azureml workspace following the [instructions here](azureml-module-get-started.ipynb).

## Prepare datasets

* Install azureml-dataprep to be used in dataset registration.
* Create datasets to be used in the pipeline.

In [1]:
!pip install azureml-dataprep



Please restart the kernel here to get the installed packages applied.

In [2]:
from azureml.core import Workspace, Run, Dataset
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.from_config()

In [3]:
# get dataset
training_data_name = 'aml_module_training_data'
test_data_name = 'aml_module_test_data'

if training_data_name not in ws.datasets:
    print('Registering a training dataset for sample pipeline ...')
    train_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    train_data.register(workspace = ws, 
                        name = training_data_name, 
                        description = 'Training data (just for illustrative purpose)')
    print('Registerd')

if test_data_name not in ws.datasets:
    print('Registering a test dataset for sample pipeline ...')
    test_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    test_data.register(workspace = ws, 
                       name = test_data_name, 
                       description = 'Test data (just for illustrative purpose)')
    print('Registered')

train_data = Dataset.get_by_name(ws, name=training_data_name)
test_data = Dataset.get_by_name(ws, name=test_data_name)

## Register modules

* Register the modules to be used in the pipeline using azureml CLI.
* Load modules with the module SDK.

In [4]:
!az ml module register --spec-file resources/create_pipeline/mpi_train/mpi_train.yaml 
!az ml module register --spec-file resources/create_pipeline/score/score.yaml 
!az ml module register --spec-file resources/create_pipeline/evaluate/eval.yaml 
!az ml module register --spec-file resources/create_pipeline/compare_two_models/compare2.yaml 

[K[33mVersion 0.0.1 already exists in module MPI Train (namespace: microsoft.com/aml/samples)[0m
{}
[K[33mVersion 0.0.1 already exists in module Score (namespace: microsoft.com/aml/samples)[0m
{}
[K[33mVersion 0.0.1 already exists in module Evaluate (namespace: microsoft.com/aml/samples)[0m
{}
[K[33mVersion 0.0.1 already exists in module Compare 2 Models (namespace: microsoft.com/aml/samples)[0m
{}
[0m

In [5]:
# list available custom module in aml workspace
!az ml module list -o table 

[KDefaultVersion    Name              Namespace                  Status
----------------  ----------------  -------------------------  --------
0.0.1             MPI Train         microsoft.com/aml/samples  Active
0.0.1             Score             microsoft.com/aml/samples  Active
0.0.1             Evaluate          microsoft.com/aml/samples  Active
0.0.1             Compare 2 Models  microsoft.com/aml/samples  Active
[0m

In [6]:
# get modules
train_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='MPI Train')
score_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Score')
eval_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Evaluate')
compare_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Compare 2 Models')

# if you have unpublished module in local or github, below function allow user to test as anounymous module
# compare_module_func = Module.load_from_yaml(ws, yaml_file='./CompareModdels/compare2.yaml')
# compare_module_func = Module.load_from_yaml(ws, yaml_file='https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml')

## Create compute target

* If compute target doesn't exist yet, Create a compute target to run the pipeline.

In [7]:
COMPUTE_TARGET_NAME = 'pipeline-compute'

In [8]:
!az ml computetarget create amlcompute --name=$COMPUTE_TARGET_NAME --vm-size=Standard_DS2_v2 --max-nodes=4

Provisioning compute resources...
Resource creation submitted successfully.
{
  "location": "eastus",
  "name": "pipeline-compute",
  "provisioningErrors": null,
  "provisioningState": "Succeeded"
}
[0m

## Create pipeline
You can build pipeline through SDK experience, or drag-n-drop way through [Designer](https://ml.azure.com) in workspace portal

With the module SDK:
* Symplified the syntax to provide consistent experience with drag-n-drop
* Support intellisense and docstring, free you to work with dict all the time
* Support creating a pipeline with unpublished module


In [9]:
# define a sub pipeline
@dsl.pipeline(name = 'A sub pipeline including train/score/eval', 
              description = 'train model and evaluate model perf')
def training_pipeline(input_data, learning_rate):
    train = train_module_func(
        training_data=input_data, 
        max_epochs=5, 
        learning_rate=learning_rate)

    train.runsettings.configure(process_count_per_node = 2, node_count = 2)

    score = score_module_func(
        model_input=train.outputs.model_output, 
        test_data=test_data)

    eval = eval_module_func(scoring_result=score.outputs.score_output)
    
    return {'eval_output': eval.outputs.eval_output, 'model_output': train.outputs.model_output}

In [10]:
# define pipeline with sub pipeline
@dsl.pipeline(name = 'A dummy pipeline that trains multiple models and output the best one', 
              description = 'select best model trained with different learning rate',
              default_compute_target = COMPUTE_TARGET_NAME)
def dummy_automl_pipeline():
    train_and_evalute_model1 = training_pipeline(train_data, 0.01)
    train_and_evalute_model2 = training_pipeline(train_data, 0.02)
    
    compare = compare_module_func(
        model1=train_and_evalute_model1.outputs.model_output, 
        eval_result1=train_and_evalute_model1.outputs.eval_output,
        model2=train_and_evalute_model2.outputs.model_output,
        eval_result2=train_and_evalute_model2.outputs.eval_output
    )

    return {**compare.outputs}

# create a pipeline
pipeline = dummy_automl_pipeline()

In [12]:
# validate pipeline and visualize the graph
pipeline.validate()

Tuple timeout setting is deprecated


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

SupportDetectView()

{'result': 'validation passed', 'errors': []}

In [13]:
# save as a draft
pipeline.save(experiment_name = 'pipeline-with-azureml-module')

Name,Id,Details page,Pipeline type,Updated on,Created by,Tags
A dummy pipeline that trains multiple models and output the best one,2a805968-37f1-4d27-8018-0a447db537db,Link,TrainingPipeline,"June 09, 2020 07:12 AM",Zhidong Zhu,azureml.Designer: true

0
azureml.Designer: true


In [14]:
# Submit a pipeline run
run = pipeline.submit(experiment_name = 'pipeline-with-azureml-module')
run.wait_for_completion()

age3418fe952.blob.core.windows.net/azureml/ExperimentRun/dcid.7a63ee79-55aa-42cf-9b47-4ee2af7a1667/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=zt3k9XdDLSnZ3T1NiANTItYzOL%2BNBAuu%2FUjoacslmVY%3D&st=2020-06-09T07%3A04%3A05Z&se=2020-06-09T15%3A14%3A05Z&sp=r', 'logs/azureml/stdoutlogs.txt': 'https://samplewsstorage3418fe952.blob.core.windows.net/azureml/ExperimentRun/dcid.7a63ee79-55aa-42cf-9b47-4ee2af7a1667/logs/azureml/stdoutlogs.txt?sv=2019-02-02&sr=b&sig=jFJlHz9YV%2Fs%2F3nnRYDEa48rSYCTbNs0DgdpaN8RsWKo%3D&st=2020-06-09T07%3A04%3A05Z&se=2020-06-09T15%3A14%3A05Z&sp=r'}}




StepRunId: a16e6eca-411c-43ff-8256-4266e022c2e1
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/pipeline-with-azureml-module/runs/a16e6eca-411c-43ff-8256-4266e022c2e1?wsid=/subscriptions/e9b2ec51-5c94-4fa8-809a-dc1e695e4896/resourcegroups/sample-ws-rg/workspaces/sample-ws

StepRun(Evaluate) Execution Summary
StepRun( Evaluate ) Status: Completed
{'runId': 'a16e6eca-411c-43ff-8256-4266e022

<RunStatus.completed: 'Completed'>