# Build pipeline with dsl.command_component

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `commandComponent` using python function and dsl.command_component decorator
- Create `pipeline` using component defined by dsl.command_component

**Motivations** - This notebook explains how to define `commandComponent` via Python function and @dsl.command_component, then use these components to build pipeline.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.identity import InteractiveBrowserCredential
from azure.ml import MLClient, dsl
from azure.ml.entities import JobInput

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
#Enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# 2. Import components that are defined with python function

We defined three sample component using dsl.command_component in [dsl_components.py](dsl_components.py).

In [None]:
with open("dsl_components.py") as fin:
    print(fin.read())

In [None]:
%load_ext autoreload
%autoreload 2

from dsl_components import train_model, score_data, eval_model

help(train_model)

You can also register dsl component functions to workspace use `ml_client.components.create_or_update()`.

In [None]:
print(train_model)

# 3. Sample pipeline job

## 3.1 Build pipeline

In [None]:

cluster_name = "cpu-cluster"
# define a pipeline with dsl component
@dsl.pipeline(
    name='A-training-pipeline',
    description='E2E dummy train-score-eval pipeline with components defined via python function components',
    default_compute=cluster_name,
)
def pipeline_with_python_function_components(input_data, test_data, learning_rate):
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=input_data, max_epochs=5, learning_rate=learning_rate
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_data
    )

    eval_with_sample_data = eval_model(scoring_result=score_with_sample_data.outputs.score_output)

    # Return: pipeline outputs
    return {
        'eval_output': eval_with_sample_data.outputs.eval_output,
        'model_output': train_with_sample_data.outputs.model_output,
    }

pipeline = pipeline_with_python_function_components(
    input_data=JobInput(path="https://dprepdata.blob.core.windows.net/demo/Titanic.csv"), 
    test_data=JobInput(path="https://dprepdata.blob.core.windows.net/demo/Titanic.csv"), 
    learning_rate=0.1
)

# 3.2 Submit pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="pipeline_with_python_function_components")
print(f'Job link: {pipeline_job.services["Studio"].endpoint}')

In [None]:
# Wait until the job completes
# ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)