# Build pipeline with components

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `commandComponent` using YAML
- Create components into workspace
- Create `pipeline` using registered / anonymous components and dsl component.

**Motivations** - This notebook explains how to different method to create components via SDK then use these components to build pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.identity import InteractiveBrowserCredential
from azure.ml import MLClient, dsl
from azure.ml.entities import JobInput, load_component
from azure.ml._constants import AssetTypes
from azure.ml.dsl._types import DataInput, DataOutput

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
#Enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# 2. Define and create components into workspace
## 2.1 Load components definition from YAML

In [None]:
parent_dir = '.'
train_func = load_component(yaml_file=parent_dir + "/train.yml")

## 2.2 Register components

In [None]:
score_func = load_component(yaml_file=parent_dir + "/score.yml")
# create component into workspace 
score_func = ml_client.components.create_or_update(score_func)
# get a created component from workspace
score_func = ml_client.components.get(name=score_func.name, version=score_func.version)

In [None]:
# print the component as yaml
print(score_func)

In [None]:
# inspect more information
print(type(score_func))
help(score_func._func)

## 2.3 Create components use dsl.component

**Note:** current version only support to put dsl.component function as separate py file

In [None]:
%%writefile eval_dsl_component.py

from azure.ml import dsl
from azure.ml.entities import Environment
from azure.ml.dsl._types import DataInput, DataOutput

@dsl.command_component(
    name="Eval",
    display_name="Eval",
    description="A dummy eval component defined by dsl component.",
    version="0.0.1",
)
def eval_func(
    scoring_result: DataInput,
    eval_output: DataOutput,
):
    from pathlib import Path
    from datetime import datetime
    print ("hello evaluation world...")

    lines = [
        f'Scoring result path: {scoring_result}',
        f'Evaluation output path: {eval_output}',
    ]

    for line in lines:
        print(line)

    # Evaluate the incoming scoring result and output evaluation result.
    # Here only output a dummy file for demo.
    curtime = datetime.now().strftime("%b-%d-%Y %H:%M:%S")
    eval_msg = f"Eval done at {curtime}\n"
    (Path(eval_output) / 'eval_result.txt').write_text(eval_msg)

In [None]:
%load_ext autoreload
%autoreload 2
# in notebook auto reload the component if any code change
from eval_dsl_component import eval_func

print(type(eval_func))
help(eval_func)

# 3. Sample pipeline job
## 3.1 Build pipeline

In [None]:
# Construct pipeline
@dsl.pipeline(
    default_compute="cpu-cluster",
    description="E2E dummy train-score-eval pipeline with registered components",
)
def basic_pipeline(
    training_input,
    test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    train_job = train_func(
        training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        learning_rate_schedule=learning_rate_schedule,
    )
    score_job = score_func(model_input=train_job.outputs.model_output, test_data=test_input)
    score_job.outputs.score_output.mode = "upload"
    evaluate_job = eval_func(scoring_result=score_job.outputs.score_output)
    return {
        "trained_model": train_job.outputs.model_output,
        "scored_data": score_job.outputs.score_output,
        "evaluation_report": evaluate_job.outputs.eval_output,
    }

pipeline = basic_pipeline(
    training_input = JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    test_input = JobInput(type=AssetTypes.URI_FOLDER, path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based"
)
# demo how to set pipeline output settings like mode
pipeline.outputs.trained_model.mode = "upload"

In [None]:
print(pipeline)

## 3.2 Submit pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="basic_pipeline")
print(f'Job link: {pipeline_job.services["Studio"].endpoint}')

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)