# Build pipeline using schedule

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define and load `CommandComponent` from YAML
- Create `Pipeline` using loaded component.
- Set the schedule of `Pipeline`.

**Motivations** - This notebook explains how to create a pipeline using schedule.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [1]:
# Import required libraries
from azure.ml import MLClient, dsl, Input
from azure.ml.entities import (
    CronSchedule,
    RecurrenceSchedule,
    RecurrencePattern,
    ScheduleStatus,
    load_component,
)

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [3]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))

Found the config file in: C:\Users\biyi\azureml-examples\sdk\.azureml\config.json


AmlCompute({'type': 'amlcompute', 'created_on': None, 'provisioning_state': 'Succeeded', 'provisioning_errors': None, 'name': 'cpu-cluster', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/96aede12-2f73-41cb-b983-6d11a904839b/resourceGroups/sdk/providers/Microsoft.MachineLearningServices/workspaces/sdk-canary/computes/cpu-cluster', 'base_path': './', 'creation_context': None, 'serialize': <msrest.serialization.Serializer object at 0x000002065B143DC0>, 'resource_id': None, 'location': 'eastus2euap', 'size': 'STANDARD_DS3_V2', 'min_instances': 0, 'max_instances': 8, 'idle_time_before_scale_down': 3600.0, 'identity': None, 'ssh_public_access_enabled': True, 'ssh_settings': None, 'network_settings': None, 'tier': 'dedicated'})


# 2. Define and create components into workspace
## 2.1 Load components definition from YAML

In [4]:
parent_dir = "."
train_model = load_component(yaml_file=parent_dir + "/train_model.yml")
score_data = load_component(yaml_file=parent_dir + "/score_data.yml")
eval_model = load_component(yaml_file=parent_dir + "/eval_model.yml")

## 2.2 Inspect loaded component

In [5]:
# Print the component as yaml
print(train_model)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_model
version: 0.0.1
display_name: Train Model
description: A dummy training component
type: command
inputs:
  training_data:
    type: uri_folder
  max_epochs:
    type: integer
    optional: true
  learning_rate:
    type: number
    default: '0.01'
  learning_rate_schedule:
    type: string
    default: time-based
outputs:
  model_output:
    type: uri_folder
command: python train.py  --training_data ${{inputs.training_data}}  [--max_epochs
  ${{inputs.max_epochs}}]  --learning_rate ${{inputs.learning_rate}}  --learning_rate_schedule
  ${{inputs.learning_rate_schedule}}  --model_output ${{outputs.model_output}}
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:5
code: azureml:./train_src
is_deterministic: true
tags: {}



In [6]:
# Inspect more information
print(type(train_model))
help(train_model._func)

<class 'azure.ml.entities._component.command_component.CommandComponent'>
Help on function [component] Train Model:

[component] Train Model(*, training_data: 'uri_folder' = None, max_epochs: 'int' = None, learning_rate: 'float' = None, learning_rate_schedule: 'str' = None)
    A dummy training component
    
    Component yaml:
    ```yaml
    $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
    type: command
    
    name: train_model
    display_name: Train Model
    description: A dummy training component
    version: 0.0.1
    inputs:
      training_data: 
        type: uri_folder
      max_epochs:
        type: integer
        optional: true
      learning_rate: 
        type: number
        default: 0.01
      learning_rate_schedule: 
        type: string
        default: time-based 
    outputs:
      model_output:
        type: uri_folder
    code: ./train_src
    environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:5
    command: >-


# 3. Sample pipeline job
## 3.1 Build pipeline

In [7]:
# Construct pipeline
@dsl.pipeline(
    default_compute="cpu-cluster",
    description="E2E dummy train-score-eval pipeline using schedule",
)
def pipeline_using_schedule(
    training_input,
    test_input,
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
):
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=training_input,
        max_epochs=training_max_epochs,
        learning_rate=training_learning_rate,
        learning_rate_schedule=learning_rate_schedule,
    )

    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_input
    )
    score_with_sample_data.outputs.score_output.mode = "upload"

    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "trained_model": train_with_sample_data.outputs.model_output,
        "scored_data": score_with_sample_data.outputs.score_output,
        "evaluation_report": eval_with_sample_data.outputs.eval_output,
    }

In [16]:
pipeline = pipeline_using_schedule(
    training_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    test_input=Input(type="uri_folder", path=parent_dir + "/data/"),
    training_max_epochs=20,
    training_learning_rate=1.8,
    learning_rate_schedule="time-based",
)

name: loving_reggae_pb58zdscyy
display_name: pipeline_using_schedule
description: E2E dummy train-score-eval pipeline using schedule
type: pipeline
inputs:
  training_input:
    mode: ro_mount
    type: uri_folder
    path: azureml:./data/
  test_input:
    mode: ro_mount
    type: uri_folder
    path: azureml:./data/
  training_max_epochs: 20
  training_learning_rate: 1
  learning_rate_schedule: time-based
outputs:
  trained_model: null
  scored_data: null
  evaluation_report: null
tags: {}
jobs:
  train_with_sample_data:
    $schema: '{}'
    type: command
    inputs:
      training_data: ${{parent.inputs.training_input}}
      max_epochs: ${{parent.inputs.training_max_epochs}}
      learning_rate: ${{parent.inputs.training_learning_rate}}
      learning_rate_schedule: ${{parent.inputs.learning_rate_schedule}}
    outputs:
      model_output: ${{parent.outputs.trained_model}}
    command: python train.py  --training_data ${{inputs.training_data}}  [--max_epochs
      ${{inputs.max_ep

## 3.2 Configure schedule of pipeline job

In [33]:
# create a cron schedule fire at 17:00 PM and 17:30 PM with UTC timezone every day
cron_schedule = CronSchedule(
    expression="0,30 17 * * *",
    start_time="2022-04-21T01:15:00",
    time_zone="Universal Coordinated Time",
    status=ScheduleStatus.ENABLED,
)
pipeline.schedule = cron_schedule

In [17]:
# create a recurrence schedule fire at 10:00 AM and 10:01AM with PST timezone every day
# pattern = RecurrencePattern(hours=10, minutes=[0, 1])
# recurrence_schedule = RecurrenceSchedule(
#     frequency='day',
#     interval=1,
#     start_time="2022-04-21T01:15:00",
#     time_zone="Pacific Standard Time",
#     status=ScheduleStatus.ENABLED
# )
# pipeline.schedule = recurrence_schedule

In [19]:
print(pipeline)

PipelineJob({'inputs': {'training_input': <azure.ml.entities._job.pipeline._io.PipelineInput object at 0x000002067FA69610>, 'test_input': <azure.ml.entities._job.pipeline._io.PipelineInput object at 0x000002067FA69670>, 'training_max_epochs': <azure.ml.entities._job.pipeline._io.PipelineInput object at 0x000002067FA696D0>, 'training_learning_rate': <azure.ml.entities._job.pipeline._io.PipelineInput object at 0x000002067FA69730>, 'learning_rate_schedule': <azure.ml.entities._job.pipeline._io.PipelineInput object at 0x000002067FA69790>}, 'outputs': {'trained_model': <azure.ml.entities._job.pipeline._io.PipelineOutput object at 0x000002067FA694C0>, 'scored_data': <azure.ml.entities._job.pipeline._io.PipelineOutput object at 0x000002067FA697F0>, 'evaluation_report': <azure.ml.entities._job.pipeline._io.PipelineOutput object at 0x000002067FA69580>}, 'component': _PipelineComponent({'components': OrderedDict([('train_with_sample_data', {'code': {}, 'command': {}}), ('score_with_sample_data',

## 3.2 Submit pipeline job

In [18]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline, experiment_name="pipeline_samples"
)

Datetime with no tzinfo will be considered UTC.


Experiment,Name,Type,Status,Details Page
pipeline_samples,loving_reggae_pb58zdscyy,pipeline,NotStarted,Link to Azure Machine Learning studio


Note: Job will be run as per the defined schedule. It might not run immediately.

In [13]:
pipeline_job

Experiment,Name,Type,Status,Details Page
pipeline_samples,nifty_sun_wrgkfrc7wq,pipeline,NotStarted,Link to Azure Machine Learning studio


# Next Steps
You can see further examples of running a pipeline job [here](../)