# Build pipeline with dsl.command_component

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `CommandComponent` using python function and dsl.command_component decorator
- Create `Pipeline` using component defined by dsl.command_component

**Motivations** - This notebook explains how to define `CommandComponent` via Python function and @dsl.command_component, then use command component to build pipeline.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [1]:
#import required libraries
from azure.identity import InteractiveBrowserCredential
from azure.ml import MLClient, dsl
from azure.ml.entities import JobInput
import piprunpkg

## 1.2. Configure credential

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [11]:
from azure.identity import DefaultAzureCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token('https://management.azure.com/.default')
except Exception as ex:
    # If exception happens when retrieve token, try exclude the failed credential like this then try again:
    # Exclude VSCode credential:
    # credential = DefaultAzureCredential(exclude_visual_studio_code_credential=True)
    raise Exception("Failed to retrieve a token from the included credentials due to the following exception, try to add `exclude_xxx_credential=True` to `DefaultAzureCredential` and try again.") from ex

EnvironmentCredential.get_token failed: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
SharedTokenCacheCredential.get_token failed: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
VisualStudioCodeCredential.get_token failed: Failed to get Azure user details from Visual Studio Code.


## 1.3. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. 

In [12]:
try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": "d511f82f-71ba-49a4-8233-d7be8a3650f4",
        "resource_group": "mire2etesting",
        "workspace_name": "gewa_ws"
    }

    if client_config["subscription_id"].startswith('<'):
        print("please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <WORKSPACE_NAME> in notebook cell")
        raise ex
    else:  # write and reload from config file
        import json, os
        config_path = "../../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

Found the config file in: D:\practice\azureml-examples\sdk\jobs\.azureml\config.json


MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x00000257BA71ED90>,
         subscription_id=d511f82f-71ba-49a4-8233-d7be8a3650f4,
         resource_group_name=mire2etesting,
         workspace_name=gewa_ws)


## 1.4. Retrieve or create an Azure Machine Learning compute target

In [4]:
# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
try:
    ml_client.compute.get(name=cluster_name)
except Exception:
    print('Creating a new compute target...')
    from azure.ml.entities import AmlCompute
    compute = AmlCompute(
        name=cluster_name,
        size="Standard_D2_v2",
        max_instances=2
    )
    ml_client.compute.begin_create_or_update(compute)

# 2. Import components that are defined with python function

We defined three sample component using dsl.command_component in [dsl_components.py](dsl_components.py).

In [2]:
with open("src/dsl_components.py") as fin:
    print(fin.read())

from pathlib import Path
from random import randint
from uuid import uuid4

from azure.ml import dsl, ArtifactInput, ArtifactOutput
from azure.ml.entities import Environment

# init customer environment with conda YAML
# the YAML file shall be put under your code folder.
conda_env = Environment(
    conda_file=Path(__file__).parent / "conda.yaml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04"
)


@dsl.command_component(
    name="dsl_train_model",
    display_name="Train",
    description="A dummy train component defined by dsl component.",
    version="0.0.2",
    # specify distribution type if needed
    # distribution={'type': 'mpi'},
    # specify customer environment, note that azure-ml must be included.
    environment=conda_env,
    # specify your code folder, default code folder is current file's parent
    # code='.'
)
def train_model(
    training_data: ArtifactInput,
    max_epochs: int,
    model_output: ArtifactOutput,
    learning_rate=0.02,
):
    lines 

In [3]:
%load_ext autoreload
%autoreload 2

from src.dsl_components import train_model, score_data, eval_model

print(train_model)



<function train_model at 0x000001DC403E9430>


You can also register dsl component functions to workspace use `ml_client.components.create_or_update()`.

In [4]:
print(train_model)

<function train_model at 0x000001DC403E9430>


# 3. Sample pipeline job

## 3.1 Build pipeline

In [5]:

cluster_name = "cpu-cluster"
# define a pipeline with dsl component
@dsl.pipeline(
    name='A-training-pipeline',
    description='E2E dummy train-score-eval pipeline with components defined via python function components',
    default_compute=cluster_name,
)
def pipeline_with_python_function_components(input_data, test_data, learning_rate):
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=input_data, max_epochs=5, learning_rate=learning_rate
    )

    print(train_with_sample_data)

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_data
    )

    eval_with_sample_data = eval_model(scoring_result=score_with_sample_data.outputs.score_output)

    # Return: pipeline outputs
    return {
        'eval_output': eval_with_sample_data.outputs.eval_output,
        'model_output': train_with_sample_data.outputs.model_output,
    }

pipeline = pipeline_with_python_function_components(
    input_data=JobInput(path="./data/Titanic.csv", type="uri_file"), 
    test_data=JobInput(path="./data/Titanic.csv", type="uri_file"), 
    learning_rate=0.1
)

print(pipeline)

$schema: '{}'
type: command
inputs:
  training_data: ${{parent.inputs.input_data}}
  max_epochs: 5
  learning_rate: ${{parent.inputs.learning_rate}}
outputs: {}
command: python -m azure.ml.dsl.executor --file dsl_components.py --name dsl_train_model
  --params --training_data ${{inputs.training_data}} --max_epochs ${{inputs.max_epochs}}
  [--learning_rate ${{inputs.learning_rate}}] --model_output ${{outputs.model_output}}
code: d:/practice/azureml-examples/sdk/jobs/pipelines/1b_pipeline_with_python_function_components/src
environment_variables: {}
component:
  name: dsl_train_model
  version: 0.0.2
  display_name: Train
  description: A dummy train component defined by dsl component.
  type: command
  inputs:
    training_data:
      type: uri_folder
    max_epochs:
      type: integer
    learning_rate:
      type: number
      optional: true
      default: '0.02'
  outputs:
    model_output:
      type: uri_folder
  command: python -m azure.ml.dsl.executor --file dsl_components.py --

In [6]:
from piprunpkg import run_pipeline
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
print(os.getcwd())
output_root_dir = "./local-run-output"
run_pipeline(output_root_dir=output_root_dir, pipeline_job=pipeline)

d:\practice\azureml-examples\sdk\jobs\pipelines\1b_pipeline_with_python_function_components
DEBUG:docker.utils.config:Trying paths: ['C:\\Users\\xiaopwan\\.docker\\config.json', 'C:\\Users\\xiaopwan\\.dockercfg']
DEBUG:docker.utils.config:Found file at path: C:\Users\xiaopwan\.docker\config.json
DEBUG:docker.auth:Found 'auths' section
DEBUG:docker.auth:Auth data for viennadroptest.azurecr.io is absent. Client might be using a credentials store instead.
DEBUG:docker.auth:Found 'credsStore' section
DEBUG:urllib3.connectionpool:http://localhost:None "GET /version HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:http://localhost:None "GET /v1.41/images/cliv2anonymousenvironment:e7b1221ce8d02665371bf65bbdfe4d40/json HTTP/1.1" 200 None
INFO:piprunpkg.image_builder:image with name cliv2anonymousenvironment:e7b1221ce8d02665371bf65bbdfe4d40 already exist, skip image building...
DEBUG:executor:docker image: cliv2anonymousenvironment:e7b1221ce8d02665371bf65bbdfe4d40
DEBUG:docker.utils.config:Tryin

APIError: 400 Client Error for http+docker://localnpipe/v1.41/containers/create?name=train_with_sample_data: Bad Request ("invalid mount config for type "bind": invalid mount path: 'wasbs:/demo@dprepdata.blob.core.windows.net/Titanic.csv' mount path must be absolute")

# 3.2 Submit pipeline job

In [13]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="pipeline_samples")
print(f'Job link: {pipeline_job.services["Studio"].endpoint}')
pipeline_job

Job link: https://ml.azure.com/runs/good_dog_x3gl8dcyy7?wsid=/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourcegroups/mire2etesting/workspaces/gewa_ws&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Experiment,Name,Type,Status,Details Page
pipeline_samples,good_dog_x3gl8dcyy7,pipeline,Failed,Link to Azure Machine Learning studio


In [10]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

RunId: good_dog_x3gl8dcyy7
Web View: https://ml.azure.com/runs/good_dog_x3gl8dcyy7?wsid=/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourcegroups/mire2etesting/workspaces/gewa_ws

Execution Summary
RunId: good_dog_x3gl8dcyy7
Web View: https://ml.azure.com/runs/good_dog_x3gl8dcyy7?wsid=/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourcegroups/mire2etesting/workspaces/gewa_ws


Exception: Exception : 
 "Detailed error not set on the Run. Please check the logs for details." 

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)