# E2E registered components

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `commandComponent` using YAML
- Register it to workspace
- Create `pipeline` using these registered components

**Motivations** - This notebook explains how to define `commandComponent` via YAML then use command component to build pipeline. The command compoonent is a fundamental construct of Azure Machine Learning pipeline. It can be used to run a task on a specified compute (either local or on the cloud). The command compoonent accepts `environment` to setup required infrastructure. You can define a `command` to run on this infrastructure with `inputs`. You can reuse the same `component` with different pipeline.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [13]:
#import required libraries
from azure.ml import MLClient
from azure.ml.entities import JobInput 
from azure.identity import InteractiveBrowserCredential
from azure.ml._constants import AssetTypes
from azure.ml import dsl
from azure.ml.dsl import pipeline
from azure.ml.entities import load_component
from pathlib import Path

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
#Enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [3]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# Pipeline job with registered component
## Register components

In [7]:
parent_dir = '.'
train_component = load_component(yaml_file=parent_dir + "/train.yml")
ml_client.components.create_or_update(train_component)


[32mUploading train_src (0.0 MBs): 100%|##########| 1501/1501 [00:00<00:00, 6762.48it/s]
[39m



CommandComponentVersion({'auto_increment_version': False, 'is_anonymous': True, 'name': '3aa9bf0d-30a2-d4b2-8fba-f58e1ee448fa', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/workspaces/main/components/3aa9bf0d-30a2-d4b2-8fba-f58e1ee448fa/versions/1', 'base_path': './', 'creation_context': <azure.ml._restclient.v2021_10_01.models._models_py3.SystemData object at 0x000001E06632D7C0>, 'serialize': <msrest.serialization.Serializer object at 0x000001E0676FA340>, 'command': 'python train.py  --training_data ${{inputs.training_data}}  --max_epocs ${{inputs.max_epocs}}    --learning_rate ${{inputs.learning_rate}}  --learning_rate_schedule ${{inputs.learning_rate_schedule}}  --model_output ${{outputs.model_output}}', 'code': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/works

In [8]:
score_component = load_component(yaml_file=parent_dir + "/score.yml")
ml_client.components.create_or_update(score_component)

[32mUploading score_src (0.0 MBs): 100%|##########| 940/940 [00:00<00:00, 4179.77it/s]
[39m



CommandComponentVersion({'auto_increment_version': False, 'is_anonymous': True, 'name': 'db5615a1-6958-e0d1-c5eb-b8bbaf6d5a2a', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/workspaces/main/components/db5615a1-6958-e0d1-c5eb-b8bbaf6d5a2a/versions/1', 'base_path': './', 'creation_context': <azure.ml._restclient.v2021_10_01.models._models_py3.SystemData object at 0x000001E06774C3D0>, 'serialize': <msrest.serialization.Serializer object at 0x000001E06774CB50>, 'command': 'python score.py  --model_input ${{inputs.model_input}}  --test_data ${{inputs.test_data}} --score_output ${{outputs.score_output}}', 'code': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/workspaces/main/codes/fda80b8d-d41d-4bd6-91dc-76e9d00f4ec6/versions/1', 'environment_variables': {}, 'environment': 

In [9]:
eval_component = load_component(yaml_file=parent_dir + "/eval.yml")
ml_client.components.create_or_update(eval_component)

[32mUploading eval_src (0.0 MBs): 100%|##########| 796/796 [00:00<00:00, 3662.46it/s]
[39m



CommandComponentVersion({'auto_increment_version': False, 'is_anonymous': True, 'name': '60261348-ce61-50ae-d5fd-6650ef0bfdf9', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/workspaces/main/components/60261348-ce61-50ae-d5fd-6650ef0bfdf9/versions/1', 'base_path': './', 'creation_context': <azure.ml._restclient.v2021_10_01.models._models_py3.SystemData object at 0x000001E0676FA0D0>, 'serialize': <msrest.serialization.Serializer object at 0x000001E067757160>, 'command': 'python eval.py  --scoring_result ${{inputs.scoring_result}}  --eval_output ${{outputs.eval_output}}', 'code': '/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourceGroups/azureml-examples/providers/Microsoft.MachineLearningServices/workspaces/main/codes/eb717c41-9e62-4299-a16c-0202d2602e8c/versions/1', 'environment_variables': {}, 'environment': '/subscriptions/6560575d-fa06-4

## Build pipeline

In [15]:
# 1. Load component funcs
train_func = load_component(
    client=ml_client,
    name=train_component.name,
    version=train_component.version,
)
score_func = load_component(
    client=ml_client,
    name=score_component.name,
    version=score_component.version,
)
eval_func = load_component(
    client=ml_client,
    name=eval_component.name,
    version=eval_component.version,
)

# 2. Construct pipeline
@dsl.pipeline(
    compute="cpu-cluster",
    description="E2E dummy train-score-eval pipeline with registered components",
)
def sample_pipeline(
        pipeline_job_training_input,
        pipeline_job_test_input,
        pipeline_job_training_max_epocs,
        pipeline_job_training_learning_rate,
        pipeline_job_learning_rate_schedule,
):
    train_job = train_func(
        training_data=pipeline_job_training_input,
        max_epocs=pipeline_job_training_max_epocs,
        learning_rate=pipeline_job_training_learning_rate,
        learning_rate_schedule=pipeline_job_learning_rate_schedule,
    )
    score_job = score_func(model_input=train_job.outputs.model_output, test_data=pipeline_job_test_input)
    score_job.outputs.score_output.mode = "upload"
    evaluate_job = eval_func(scoring_result=score_job.outputs.score_output)
    return {
        "pipeline_job_trained_model": train_job.outputs.model_output,
        "pipeline_job_scored_data": score_job.outputs.score_output,
        "pipeline_job_evaluation_report": evaluate_job.outputs.eval_output,
    }

pipeline = sample_pipeline(
    JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    20,
    1.8,
    "time-based",
)
pipeline.outputs.pipeline_job_trained_model.mode = "upload"
pipeline.outputs.pipeline_job_scored_data.mode = "upload"
pipeline.outputs.pipeline_job_evaluation_report.mode = "upload"

## Submit pipeline job

In [16]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="e2e_registered_components", continue_run_on_step_failure=True)
pipeline_job.services["Studio"].endpoint

[32mUploading data (0.0 MBs): 100%|##########| 510/510 [00:00<00:00, 2190.43it/s]
[39m



'https://ml.azure.com/runs/e4249158-6563-4768-8dc8-eefecdc62f46?wsid=/subscriptions/6560575d-fa06-4e7d-95fb-f962e74efd7a/resourcegroups/azureml-examples/workspaces/main&tid=72f988bf-86f1-41af-91ab-2d7cd011db47'