# Build pipeline with command function

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `CommandComponent` using command() function
- Create `Pipeline` using component defined by command() function

**Motivations** - This notebook explains how to define `CommandComponent` via command function, then use command component to build pipeline. The command component is a fundamental construct of Azure Machine Learning pipeline. It can be used to run a task on a specified compute (either local or on the cloud). The command component accepts `Environment` to setup required infrastructure. You can define a `Command` to run on this infrastructure with `inputs`. You can reuse the same `Component` with different pipeline.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [None]:
# import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, command, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import Environment, BuildContext, JobService

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [None]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))

# 2. Define Command object via command function 
Use `command` function to create a `Command` object which can be used in `@pipeline` function.


In [None]:
download_url = "https://azuremlexamples.blob.core.windows.net/datasets/iris.csv"
file_name = "data.csv"
custom_path = "azureml://datastores/workspaceblobstore/paths/custom_path/${{name}}/"
environment = "AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1"
# 1. Create a command component to download a input data
download_data = command(
    name="download-input",
    display_name="Download Data",
    description="Download a input from remote URL and return it in output.",
    tags=dict(),
    command="curl ${{inputs.url}} > ${{outputs.output_folder}}/${{inputs.file_name}}",
    environment=environment,
    inputs=dict(url=download_url, file_name=file_name),
    # example how to change path of output on step level,
    # please note if the output is promoted to pipeline level you need to change path in pipeline job level
    outputs=dict(
        output_folder=Output(type="uri_folder", mode="rw_mount", path=custom_path)
    ),
)
# 2. Create a R script command to train data
train_data_with_r = command(
    name="train-data-with-R",
    display_name="Train Data with R",
    description="Train data with R.",
    tags=dict(),
    command="Rscript train.R --data_folder ${{inputs.iris}}/${{inputs.file_name}}",
    environment=Environment(build=BuildContext(path="docker_context")),
    code="./src",
    inputs=dict(
        iris=Input(type="uri_folder"),
        file_name=file_name,
    ),
    outputs={},
    services={
        "Jupyterlab endpoint": JobService(job_service_type="jupyter_lab"),
        "Vscode endpoint": JobService(job_service_type="vs_code"),
        # "My_ssh": JobService(
        # job_service_type = "ssh",
        # properties={
        #   "sshPublicKeys":"<add-public-key>"
        # }
        # ) # uncomment add SSH Public Key to access job container via SSH
    },
)
# 3. Create a command component to show inputs
show_data = command(
    name="show-data",
    display_name="Show Data",
    description="Show data in command line and return it in output.",
    tags=dict(),
    command="cat ${{inputs.input_folder}}/${{inputs.file_name}} && cp ${{inputs.input_folder}}/${{inputs.file_name}} ${{outputs.output_folder}}/${{inputs.file_name}}",
    environment=environment,
    inputs=dict(input_folder=Input(type="uri_folder"), file_name=file_name),
    outputs=dict(output_folder=Output(type="uri_folder", mode="rw_mount")),
)

# 3. Basic pipeline job

## 3.1 Build pipeline

Build a pipeline with predefined commands.

In [None]:
@pipeline(
    tags={"owner": "sdkteam", "tag": "tagvalue"},
)
def pipeline_with_non_python_components(url, file_name):
    """The hello world pipeline job."""
    download_data_node = download_data(url=url, file_name=file_name)
    train_data_with_r(
        iris=download_data_node.outputs.output_folder, file_name=file_name
    )
    show_data_node = show_data(
        input_folder=download_data_node.outputs.output_folder, file_name=file_name
    )

    return {"output_file": show_data_node.outputs.output_folder}


pipeline_job = pipeline_with_non_python_components(download_url, file_name)

# example how to change path of output on pipeline level
pipeline_job.outputs.output_file = Output(
    type="uri_folder", mode="rw_mount", path=custom_path
)

# set pipeline level compute
pipeline_job.settings.default_compute = "cpu-cluster"

# 3.2 Submit pipeline job

In [None]:
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job

In [None]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](../)