Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Build an ML Pipeline

In this notebook, you learn how to create a machine learning training pipeline by using Azure Machine Learning components.

1. Prepare and create components into the workspace.
2. Use the component and pipeline SDK to create a pipeline the registered components.

## Prerequisites
* Install azure-ai-ml sdk following the [instructions here](../../README.md).
* Initialize credential & create compute clusters following [instructions here](../../configuration.ipynb);

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [1]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import load_component, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import MLClient
from azure.ai.ml.constants import AssetTypes, InputOutputModes, InputOutputModes

import os

# enable internal components in v2
os.environ["AZURE_ML_INTERNAL_COMPONENTS_ENABLED"] = "True"

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [3]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"

Found the config file in: D:\programs\azureml-examples\sdk\.azureml\config.json


In [4]:
from azure.ai.ml import Input, load_component
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml.dsl import pipeline
from src.components import write_output, write_output_annotation

In [5]:
# @pipeline
# def inner_pipeline():
#     node_0 = write_output()
#     # node_0.outputs.my_output.path = "azureml://datastores/workspaceblobstore/hod/copypath/test_0/output_0"
#     node_0.outputs.my_output.path = "azureml://datastores/workspaceblobstore/paths/hod/copypath/test_1/output_0"
#     return node_0.outputs


# @pipeline(name="hod_test_path_uri_nested_pipeline")
# def my_pipeline():
#     node_0 = inner_pipeline()
#     return node_0.outputs


In [6]:
@pipeline(name="hod_test_path_uri")
def my_pipeline():
    node_0 = write_output()
    node_0.outputs.my_output.path = "azureml://datastores/workspaceblobstore/paths/hod/copypath/test_3/output_0"
    return node_0.outputs


In [7]:
# create a pipeline
pipeline_job = my_pipeline()
pipeline_job.settings.default_compute = "cpu-cluster"
pipeline_job.settings.force_rerun = True
# pipeline_job.settings.default_datastore = "workspaceblobstore"
# pipeline_job.settings._dataset_access_mode = "DatasetInDpv2"
# pipeline_job.settings._enable_dataset_mode = True

# Validating the pipeline
ml_client.jobs.validate(pipeline_job)

Method validate: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


{
  "result": "Succeeded"
}

In [8]:
# Specify the workspace for workspace independent component when submitting the pipeline.
created_pipeline_job = ml_client.jobs.create_or_update(pipeline_job, experiment_name="hod_copy_path")

# show detail information of run
created_pipeline_job

Experiment,Name,Type,Status,Details Page
hod_copy_path,orange_clock_rqbmjwldrv,pipeline,Preparing,Link to Azure Machine Learning studio
