# E2E registered components

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `commandComponent` using YAML
- Register it to workspace
- Create `pipeline` using these registered components

**Motivations** - This notebook explains how to define `commandComponent` via YAML then use command component to build pipeline. The command compoonent is a fundamental construct of Azure Machine Learning pipeline. It can be used to run a task on a specified compute (either local or on the cloud). The command compoonent accepts `environment` to setup required infrastructure. You can define a `command` to run on this infrastructure with `inputs`. You can reuse the same `component` with different pipeline.  


# Prerequisite

In [None]:
#import required libraries
from azure.ml import MLClient
from azure.ml.entities import Code, Dataset
from azure.identity import InteractiveBrowserCredential

In [None]:
#Enter details of your AML workspace
# subscription_id = '<SUBSCRIPTION_ID>'
# resource_group = '<RESOURCE_GROUP>'
# workspace = '<AML_WORKSPACE_NAME>'

subscription_id = 'ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9'
resource_group = 'pipeline-pm'
workspace = 'pm-dev'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# Pipeline job with registered component
## Register components

In [None]:
from azure.ml.entities import CommandComponent
parent_dir = '.'
environment = "AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:5"

train_component = CommandComponent(
    name="Train",
    version="32",
    inputs=dict(
        training_data=dict(type="path"),
        max_epocs=dict(type="integer"),
        learning_rate=dict(type="number", default=0.01),
        learning_rate_schedule=dict(type="string", default="time-based")
    ),
    outputs=dict(
        model_output=dict(type="path")
    ),
    code=Code(local_path=parent_dir + "/train_src"),
    environment=environment,
    command="python train.py --training_data ${{inputs.training_data}} --max_epocs ${{inputs.max_epocs}} "
            "--learning_rate ${{inputs.learning_rate}} --learning_rate_schedule ${{"
            "inputs.learning_rate_schedule}} --model_output ${{outputs.model_output}} "
)
ml_client.components.create_or_update(train_component)


In [None]:
score_component = CommandComponent(
    name="Score",   
    version="32",
    inputs=dict(
        model_input=dict(type="path"),
        test_data=dict(type="path"),
    ),
    outputs=dict(
        score_output=dict(type="path")
    ),
    code=Code(local_path=parent_dir + "/score_src"),
    environment=environment,
    command="python score.py --model_input ${{inputs.model_input}} --test_data ${{inputs.test_data}} "
            "--score_output ${{outputs.score_output}} "
)
ml_client.components.create_or_update(score_component)

In [None]:
eval_component = CommandComponent(
    name="Eval",
    version="32",
    inputs=dict(
        scoring_result=dict(type="path"),
    ),
    outputs=dict(
        eval_output=dict(type="path")
    ),
    code=Code(local_path=parent_dir + "/eval_src"),
    environment=environment,
    command="python eval.py --scoring_result ${{inputs.scoring_result}} --eval_output ${{outputs.eval_output}}"
)
ml_client.components.create_or_update(eval_component)

## Build pipeline

In [None]:
from azure.ml import dsl, MLClient
from azure.ml.dsl import Pipeline
from azure.ml.entities import Component as ComponentEntity, Dataset
from pathlib import Path

def generate_dsl_pipeline(
        client: MLClient,
        train_component: ComponentEntity,
        score_component: ComponentEntity,
        eval_component: ComponentEntity,
    ) -> Pipeline:
    # 1. Load component funcs
    train_func = dsl.load_component(
        client=client,
        name=train_component.name,
        version=train_component.version,
    )
    score_func = dsl.load_component(
        client=client,
        name=score_component.name,
        version=score_component.version,
    )
    eval_func = dsl.load_component(
        client=client,
        name=eval_component.name,
        version=eval_component.version,
    )

    # 2. Construct pipeline
    @dsl.pipeline(
        compute="cpu-cluster",
        description="E2E dummy train-score-eval pipeline with registered components",
    )
    def sample_pipeline(
            pipeline_job_training_input,
            pipeline_job_test_input,
            pipeline_job_training_max_epocs,
            pipeline_job_training_learning_rate,
            pipeline_job_learning_rate_schedule,
    ):
        train_job = train_func(
            training_data=pipeline_job_training_input,
            max_epocs=pipeline_job_training_max_epocs,
            learning_rate=pipeline_job_training_learning_rate,
            learning_rate_schedule=pipeline_job_learning_rate_schedule,
        )
        score_job = score_func(model_input=train_job.outputs.model_output, test_data=pipeline_job_test_input)
        score_job.outputs.score_output.mode = "upload"
        evaluate_job = eval_func(scoring_result=score_job.outputs.score_output)
        return {
            "pipeline_job_trained_model": train_job.outputs.model_output,
            "pipeline_job_scored_data": score_job.outputs.score_output,
            "pipeline_job_evaluation_report": evaluate_job.outputs.eval_output,
        }

    pipeline = sample_pipeline(
        Dataset(local_path=parent_dir + "/data/"),
        Dataset(local_path=parent_dir + "/data/"),
        20,
        1.8,
        "time-based",
    )
    pipeline.outputs.pipeline_job_trained_model.mode = "upload"
    pipeline.outputs.pipeline_job_scored_data.mode = "upload"
    pipeline.outputs.pipeline_job_evaluation_report.mode = "upload"
    return pipeline

## Submit pipeline job

In [None]:
# create pipeline instance
pipeline = generate_dsl_pipeline(ml_client, train_component, score_component, eval_component)
# submit job to workspace
ml_client.jobs.create_or_update(pipeline, experiment_name="e2e_registered_components", continue_run_on_step_failure=True)