# Build pipeline with components

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `CommandComponent` using YAML, `dsl.command_component`
- Create components into workspace
- Create `Pipeline` using registered components.

**Motivations** - This notebook explains how to different method to create components via SDK then use these components to build pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
# Import required libraries
from azure.ml import MLClient, dsl, ArtifactInput, ArtifactOutput
from azure.ml.entities import JobInput, load_component
from azure.ml._constants import AssetTypes

## 1.2. Configure credential

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [None]:
from azure.identity import DefaultAzureCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token('https://management.azure.com/.default')
except Exception as ex:
    # If exception happens when retrieve token, try exclude the failed credential like this then try again:
    # Exclude VSCode credential:
    # credential = DefaultAzureCredential(exclude_visual_studio_code_credential=True)
    raise Exception("Failed to retrieve a token from the included credentials due to the following exception, try to add `exclude_xxx_credential=True` to `DefaultAzureCredential` and try again.") from ex

## 1.3. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
try:
    # Get a handle to the workspace
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": "<SUBSCRIPTION_ID>",
        "resource_group": "<RESOURCE_GROUP>",
        "workspace_name": "<WORKSPACE_NAME>"
    }

    if client_config["subscription_id"].startswith('<'):
        print("please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <WORKSPACE_NAME> in notebook cell")
        raise ex
    else:  # write and reload from config file
        import json, os
        config_path = "../../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

# 2. Define and create components into workspace
## 2.1 Load components definition from YAML

In [None]:
parent_dir = '.'
train_model = load_component(yaml_file=parent_dir + "/train_model.yml")
score_data = load_component(yaml_file=parent_dir + "/score_data.yml")
# print the component as yaml
print(score_data)

## 2.2 Register components into workspace

In [None]:

# create component into workspace 
train_model = ml_client.components.create_or_update(train_model)
score_data = ml_client.components.create_or_update(score_data)

# print the component as yaml
# NOTE: resources like code, environment are resolved to remote arm id.
print(score_data)

In [None]:
# get a created component from workspace
score_data = ml_client.components.get(name=score_data.name, version=score_data.version)

## 2.3 Create components use dsl.component

**Note:** dsl.component function need to write as separate py file, which will be included in component code snapshot.

In [None]:
%%writefile eval_src/dsl_component.py

from azure.ml import dsl, ArtifactInput, ArtifactOutput

@dsl.command_component(
    name="eval_model",
    display_name="Eval Model",
    description="A dummy eval component defined by dsl component.",
    version="0.0.5",
)
def eval_model(
    scoring_result: ArtifactInput,
    eval_output: ArtifactOutput,
):
    from pathlib import Path
    from datetime import datetime
    print ("hello evaluation world...")

    lines = [
        f'Scoring result path: {scoring_result}',
        f'Evaluation output path: {eval_output}',
    ]

    for line in lines:
        print(line)

    # Evaluate the incoming scoring result and output evaluation result.
    # Here only output a dummy file for demo.
    curtime = datetime.now().strftime("%b-%d-%Y %H:%M:%S")
    eval_msg = f"Eval done at {curtime}\n"
    (Path(eval_output) / 'eval_result.txt').write_text(eval_msg)

In [None]:
%load_ext autoreload
%autoreload 2
# in notebook auto reload the component if any code change
from eval_src.dsl_component import eval_model

print(type(eval_model))
help(eval_model)

In [None]:
try:
    # try get back the dsl.command_component defined component
    eval_model = ml_client.components.get(name="eval_model", version="0.0.5")
except:
    # create if not exists
    eval_model = ml_client.components.create_or_update(eval_model)

print(eval_model)


# 3. Sample pipeline job
## 3.1 Build pipeline

In [None]:
# Construct pipeline
@dsl.pipeline(
    default_compute="cpu-cluster",
    description="E2E dummy train-score-eval pipeline with registered components",
)
def pipeline_with_registered_components(
    training_input,
    test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        learning_rate_schedule=learning_rate_schedule,
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, 
        test_data=test_input
    )
    score_with_sample_data.outputs.score_output.mode = "upload"

    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "trained_model": train_with_sample_data.outputs.model_output,
        "scored_data": score_with_sample_data.outputs.score_output,
        "evaluation_report": eval_with_sample_data.outputs.eval_output,
    }

pipeline = pipeline_with_registered_components(
    training_input = JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    test_input = JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based"
)

In [None]:
print(pipeline)

## 3.2 Submit pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="pipeline_samples")
pipeline_job

In [None]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)