# Build Pipeline with Components from yaml

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define and load `CommandComponent` from YAML
- Create `Pipeline` using loaded component.

**Motivations** - This notebook covers the scenario that user define components using yaml then use these components to build pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [1]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

  "class": algorithms.Blowfish,


## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. Multiple accounts
were found in the cache. Use username and tenant id to disambiguate.
	VisualStudioCodeCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2021-07-15T22:47:38.7310935Z and was inactive for 90.00:00:00.
Trace ID: f57e2700-8ac7-4d3b-b708-b461b77b3800
Correlation ID: 136312ed-e83e-4ff5-8b6e-9df54dfcda6f
Timestamp: 2022-08-19 18:21:52Z'
Content: {"error":"in

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [3]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "lin-cpu-iso-0"
print(ml_client.compute.get(cluster_name))

Found the config file in: .\config.json


AmlCompute({'type': 'amlcompute', 'created_on': None, 'provisioning_state': 'Succeeded', 'provisioning_errors': None, 'name': 'lin-cpu-iso-0', 'description': 'CPU compute for detonation chamber', 'tags': {}, 'properties': {}, 'id': '/subscriptions/60d27411-7736-4355-ac95-ac033929fe9d/resourceGroups/MOP.HERON.PROD.c499b17e-00ac-4b99-a999-4932adb26a8d/providers/Microsoft.MachineLearningServices/workspaces/amlworkspace725jmzufbhxww/computes/lin-cpu-iso-0', 'Resource__source_path': None, 'base_path': 'c:\\Users\\fufang\\repo\\azureml-examples\\sdk\\jobs\\pipelines\\1a_pipeline_with_components_from_yaml', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x0000020BF8A4BB08>, 'resource_id': None, 'location': 'southcentralus', 'size': 'STANDARD_D2_V2', 'min_instances': 0, 'max_instances': 4, 'idle_time_before_scale_down': 900.0, 'identity': <azure.ai.ml.entities._compute._identity.IdentityConfiguration object at 0x0000020BF8A4B088>, 'ssh_public_access_enabled':

# 2. Define and create components into workspace
## 2.1 Load components from YAML

In [16]:
parent_dir = "."
train_model = load_component(path=parent_dir + "/train_model.yml")
score_data = load_component(path=parent_dir + "/score_data.yml")
eval_model = load_component(path=parent_dir + "/eval_model.yml")

## 2.2 Inspect loaded component

In [19]:
# Print the component as yaml
print(train_model)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: azureml_anonymous
version: 0.0.1
display_name: Train Model
description: A dummy training component
type: command
inputs:
  training_data:
    type: uri_folder
    optional: true
  max_epochs:
    type: integer
    optional: true
  learning_rate:
    type: number
    default: '0.01'
  learning_rate_schedule:
    type: string
    default: time-based
outputs:
  model_output:
    type: uri_folder
command: python train.py  [--training_data ${{inputs.training_data}}] [--max_epochs
  ${{inputs.max_epochs}}]  --learning_rate ${{inputs.learning_rate}}  --learning_rate_schedule
  ${{inputs.learning_rate_schedule}}  --model_output ${{outputs.model_output}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:5
code: ./train_src
datastore: heron_sandbox_storage
tags: {}
is_deterministic: true



# 3. Sample pipeline job
## 3.1 Build pipeline

In [17]:
# Construct pipeline
@pipeline(default_datastore="heron_sandbox_storage")
def pipeline_with_components_from_yaml(
    # training_input,
    # test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    """E2E dummy train-score-eval pipeline with components defined via yaml."""
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        # training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        # learning_rate_schedule=learning_rate_schedule,
    )

    # score_with_sample_data = score_data(
    #     model_input=train_with_sample_data.outputs.model_output, test_data=test_input
    # )
    # score_with_sample_data.outputs.score_output.mode = "upload"

    # eval_with_sample_data = eval_model(
    #     scoring_result=score_with_sample_data.outputs.score_output
    # )

    # Return: pipeline outputs
    return {
        "trained_model": train_with_sample_data.outputs.model_output,
        # "scored_data": score_with_sample_data.outputs.score_output,
        # "evaluation_report": eval_with_sample_data.outputs.eval_output,
    }

from azure.ai.ml.constants import AssetTypes
registered_data_asset = ml_client.data.get(name="sample_csv_compliance", version="1")

# input = Input(type=AssetTypes.URI_FILE, path=registered_data_asset.id)

input = Input(type="uri_folder", path=parent_dir + "/data/")

pipeline_job = pipeline_with_components_from_yaml(
    # training_input=input,
    # test_input=input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
)

# set pipeline level compute
pipeline_job.settings.default_compute = "lin-cpu-iso-0"

In [None]:
# Inspect built pipeline
print(pipeline_job)

## 3.2 Submit pipeline job

In [18]:
# Submit pipeline job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="fufang_test_dpv2"
)
pipeline_job

HttpResponseError: (AuthorizationFailed) The client 'fufang_debug@prdtrs01.prod.outlook.com' with object id '4a38d706-3c49-4817-82b1-7794fdc6311a' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/datastores/listSecrets/action' over scope '/subscriptions/60d27411-7736-4355-ac95-ac033929fe9d/resourceGroups/MOP.HERON.PROD.c499b17e-00ac-4b99-a999-4932adb26a8d/providers/Microsoft.MachineLearningServices/workspaces/amlworkspace725jmzufbhxww/datastores/workspaceblobstore' or the scope is invalid. If access was recently granted, please refresh your credentials.
Code: AuthorizationFailed
Message: The client 'fufang_debug@prdtrs01.prod.outlook.com' with object id '4a38d706-3c49-4817-82b1-7794fdc6311a' does not have authorization to perform action 'Microsoft.MachineLearningServices/workspaces/datastores/listSecrets/action' over scope '/subscriptions/60d27411-7736-4355-ac95-ac033929fe9d/resourceGroups/MOP.HERON.PROD.c499b17e-00ac-4b99-a999-4932adb26a8d/providers/Microsoft.MachineLearningServices/workspaces/amlworkspace725jmzufbhxww/datastores/workspaceblobstore' or the scope is invalid. If access was recently granted, please refresh your credentials.

In [None]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](../README.md)