Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.png)

# How to Publish a Pipeline and Invoke the REST endpoint
In this notebook, we will see how we can publish a pipeline and then invoke the REST endpoint.

### Initialization Steps

In [1]:
import azureml.core
from azureml.core import Workspace, Datastore, Experiment, Dataset
from azureml.data import OutputFileDatasetConfig
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core.graph import PipelineParameter

print("Pipeline SDK-specific imports completed")

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

# Default datastore (Azure blob storage)
# def_blob_store = ws.get_default_datastore()
def_blob_store = Datastore(ws, "workspaceblobstore")
print("Blobstore's name: {}".format(def_blob_store.name))

SDK version: 1.20.0
Pipeline SDK-specific imports completed
opendatasetspmworkspace2
opendatasetspmrg
eastus2
21d8f407-c4c4-452e-87a4-e609bfb86248
Blobstore's name: workspaceblobstore


### Compute Targets
#### Retrieve an already attached  Azure Machine Learning Compute

In [14]:
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your cluster.
amlcompute_cluster_name = "cpu-cluster"

found = False
# Check if this compute target already exists in the workspace.
cts = ws.compute_targets
if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':
    found = True
    print('Found existing compute target.')
    compute_target = cts[amlcompute_cluster_name]
    
if not found:
    print('Creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2", # for GPU, use "STANDARD_NC6"
                                                                #vm_priority = 'lowpriority', # optional
                                                                max_nodes = 4)

    # Create the cluster.
    aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)
    
    # Can poll for a minimum number of nodes and for a specific timeout.
    # If no min_node_count is provided, it will use the scale settings for the cluster.
    aml_compute.wait_for_completion(show_output = True, timeout_in_minutes = 10)
    
     # For a more detailed view of current AmlCompute status, use get_status().

Found existing compute target.


In [4]:
# For a more detailed view of current Azure Machine Learning Compute status, use get_status()
# example: un-comment the following line.
# print(aml_compute.get_status().serialize())

## Building Pipeline Steps with Inputs and Outputs
A step in the pipeline can take [dataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) as input. This dataset can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline.

In [5]:
# Uploading data to the datastore
data_path = def_blob_store.upload_files(["./20news.pkl"], target_path="20newsgroups", overwrite=True)

Uploading an estimated of 1 files
Uploading ./20news.pkl
Uploaded ./20news.pkl, 1 files out of an estimated total of 1
Uploaded 1 files


In [15]:
# Reference the data uploaded to blob storage using file dataset
# Assign the datasource to blob_input_data variable
blob_input_data = Dataset.File.from_files(data_path).as_named_input("test_data")
print("Dataset created")

Dataset created


In [16]:
# Define intermediate data using OutputFileDatasetConfig
processed_data1 = OutputFileDatasetConfig(name="processed_data1")
print("Output dataset object created")

Output dataset object created


#### Define a Step that consumes a dataset and produces intermediate data.
In this step, we define a step that consumes a dataset and produces intermediate data.

**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step.

In [17]:
# trainStep consumes the datasource (Datareference) in the previous step
# and produces processed_data1

source_directory = "publish_run_train"

trainStep = PythonScriptStep(
    script_name="train.py", 
        arguments=["--input_data", blob_input_data.as_mount(), "--output_train", processed_data1],
    compute_target= aml_compute, 
    source_directory=source_directory
)
print("trainStep created")

trainStep created


#### Define a Step that consumes intermediate data and produces intermediate data
In this step, we define a step that consumes an intermediate data and produces intermediate data.

**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

In [18]:
# extractStep to use the intermediate data produced by trainStep
# This step also produces an output processed_data2
processed_data2 = OutputFileDatasetConfig(name="processed_data2")
source_directory = "publish_run_extract"

extractStep = PythonScriptStep(
    script_name="extract.py",
    arguments=["--input_extract", processed_data1.as_input(), "--output_extract", processed_data2],
    compute_target=aml_compute, 
    source_directory=source_directory)
print("extractStep created")

extractStep created


#### Define a Step that consumes multiple intermediate data and produces intermediate data
In this step, we define a step that consumes multiple intermediate data and produces intermediate data.

### PipelineParameter

This step also has a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py) argument that help with calling the REST endpoint of the published pipeline.

In [19]:
# We will use this later in publishing pipeline
pipeline_param = PipelineParameter(name="pipeline_arg", default_value=10)
print("pipeline parameter created")

pipeline parameter created


**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**

In [20]:
# Now define compareStep that takes two inputs (both intermediate data), and produce an output
processed_data3 = OutputFileDatasetConfig(name="processed_data3")

# You can register the output as dataset after job completion
processed_data3 = processed_data3.register_on_complete("compare_result")

source_directory = "publish_run_compare"

compareStep = PythonScriptStep(
    script_name="compare.py",
    arguments=["--compare_data1", processed_data1.as_input(), "--compare_data2", processed_data2.as_input(), "--output_compare", processed_data3, "--pipeline_param", pipeline_param],  
    compute_target= aml_compute, 
    source_directory=source_directory)
print("compareStep created")

compareStep created


#### Build the pipeline

In [22]:
pipeline1 = Pipeline(workspace=ws, steps=[compareStep])
print ("Pipeline is built")

Pipeline is built


## Run published pipeline
### Publish the pipeline

In [23]:
published_pipeline1 = pipeline1.publish(name="My_New_Pipeline", description="My Published Pipeline Description", continue_on_step_failure=True)
published_pipeline1

Created step compare.py [b584a867][4dd8abd7-a672-42aa-aeed-c60f32710e53], (This step will run and generate new outputs)
Created step train.py [c3f35dca][cf5b513e-9d8e-42bf-ba38-35886ad97557], (This step will run and generate new outputs)
Created step extract.py [281e8e0d][428bdecc-00de-4c66-a301-362034ef1ae8], (This step will run and generate new outputs)


Name,Id,Status,Endpoint
My_New_Pipeline,a01a396f-cf92-4c59-8ed4-dcf5979aef18,Active,REST Endpoint


Note: the continue_on_step_failure parameter specifies whether the execution of steps in the Pipeline will continue if one step fails. The default value is False, meaning when one step fails, the Pipeline execution will stop, canceling any running steps.

### Publish the pipeline from a submitted PipelineRun
It is also possible to publish a pipeline from a submitted PipelineRun

In [24]:
# submit a pipeline run
pipeline_run1 = Experiment(ws, 'Pipeline_experiment').submit(pipeline1)
# publish a pipeline from the submitted pipeline run
published_pipeline2 = pipeline_run1.publish_pipeline(name="My_New_Pipeline2", description="My Published Pipeline Description", version="0.1", continue_on_step_failure=True)
published_pipeline2

Submitted PipelineRun 3f60e1ac-0f42-4527-9247-63c0db12fedd
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/Pipeline_experiment/runs/3f60e1ac-0f42-4527-9247-63c0db12fedd?wsid=/subscriptions/21d8f407-c4c4-452e-87a4-e609bfb86248/resourcegroups/opendatasetspmrg/workspaces/opendatasetspmworkspace2


Name,Id,Status,Endpoint
My_New_Pipeline2,3aa728fe-fde8-48e7-85a8-ec84caab5288,Active,REST Endpoint


### Get published pipeline

You can get the published pipeline using **pipeline id**.

To get all the published pipelines for a given workspace(ws): 
```css
all_pub_pipelines = PublishedPipeline.get_all(ws)
```

In [25]:
from azureml.pipeline.core import PublishedPipeline

pipeline_id = published_pipeline1.id # use your published pipeline id
published_pipeline = PublishedPipeline.get(ws, pipeline_id)
published_pipeline

Name,Id,Status,Endpoint
My_New_Pipeline,a01a396f-cf92-4c59-8ed4-dcf5979aef18,Active,REST Endpoint


### Run published pipeline using its REST endpoint
[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace.

In [26]:
from azureml.core.authentication import InteractiveLoginAuthentication
import requests

auth = InteractiveLoginAuthentication()
aad_token = auth.get_authentication_header()

rest_endpoint = published_pipeline.endpoint

print("You can perform HTTP POST on URL {} to trigger this pipeline".format(rest_endpoint))

# specify the param when running the pipeline
response = requests.post(rest_endpoint, 
                         headers=aad_token, 
                         json={"ExperimentName": "My_Pipeline1",
                               "RunSource": "SDK",
                               "ParameterAssignments": {"pipeline_arg": 45}})

You can perform HTTP POST on URL https://eastus2.api.azureml.ms/pipelines/v1.0/subscriptions/21d8f407-c4c4-452e-87a4-e609bfb86248/resourceGroups/opendatasetspmrg/providers/Microsoft.MachineLearningServices/workspaces/opendatasetspmworkspace2/PipelineRuns/PipelineSubmit/a01a396f-cf92-4c59-8ed4-dcf5979aef18 to trigger this pipeline


In [27]:
try:
    response.raise_for_status()
except Exception:    
    raise Exception('Received bad response from the endpoint: {}\n'
                    'Response Code: {}\n'
                    'Headers: {}\n'
                    'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

Submitted pipeline run:  34b8b506-3f58-4a47-9a1f-77fb15151c08


# Next: Data Transfer
The next [notebook](https://aka.ms/pl-data-trans) will showcase data transfer steps between different types of data stores.