# Attention: AML Workspace & Auth
> When connecting to the AML workspace, you will need to look out for the device code to do an in-browser interactive authentication.  
The first section may not require it, but we encourage you to have an AML workspace ready for the full notebook.

# Creating Azure ML pipeline with Zendikon

In this example, we will be learning how to create an AML pipeline using Zendikon.

The code for the pipeline steps has been provided in the `steps` folder.

We will assume all steps are using the same conda environment specified in the `conda_dependencies.yml` file.

# Running our pipeline on AML

From this point on, we will require an AML workspace to work with.
For more details on creating an AML workspace, see [Quickstart: Create workspace resources you need to get started with Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources)  

**Replace the placeholder AML workspace details below with the workspace you have access to.**

# Experiment Setup

We begin by using the AML SDK to establish the AML workspace, experiment and compute target we will be utilizing. 

In [12]:
from azureml.core import Workspace, Experiment, ComputeTarget, Dataset, Environment
from azureml.core.compute import AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Replace these with your own AML workspace details
subscription_id = "7df29d08-a878-4f14-8044-00033608a1db"
resource_group = "zendikon-test"
workspace_name = "zendikon-test"

# Choose a name for your CPU cluster, existing ones can be used.
cpu_cluster_name = "zendikon-cpu-ds3"

# AML Setup
workspace = Workspace(
      subscription_id=subscription_id,
      resource_group=resource_group,
      workspace_name=workspace_name
)
print('Workspace name: ' + workspace.name,
      'Subscription id: ' + workspace.subscription_id,
      'Resource group: ' + workspace.resource_group, sep='\n')

experiment_name = "Simple_AML_Pipeline"
experiment = Experiment(workspace=workspace, name=experiment_name)

# Verify that the cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=workspace, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4, 
                                                           idle_seconds_before_scaledown=2400)
    compute_target = ComputeTarget.create(workspace, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Workspace name: zendikon-test
Subscription id: 7df29d08-a878-4f14-8044-00033608a1db
Resource group: zendikon-test
Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Create the step class for each of the step in the pipeline

We manage steps in Zendikon by creating a class representing that step. Having a class managing each step allows you to reuse the step as many time as needed in a pipeline, in different pipelines, as well as dynamically change the steps configuration later.


In [13]:
from zendikon.pipelines.step.base_step import BasePipelineStep

# ================ PIPELINE STEPS SET UP =================================

LoadDataStep = BasePipelineStep.create_step_class(
    source_directory="./steps", script_name="load_data.py", class_name="LoadDataStep")
PreprocessDataStep = BasePipelineStep.create_step_class(
    source_directory="./steps", script_name="preprocess_data.py", class_name="PreprocessDataStep")
TrainModelStep = BasePipelineStep.create_step_class(
    source_directory="./steps", script_name="train.py", class_name="TrainModelStep")


## Create the steps for the pipeline

After having the step class, we can put them together into a pipeline with the configuration we desire.

Notice: We are not specifying any input datasets for the pipeline as well as the first step in the pipeline. This is because in the `load data step`, we actually download the dataset we want to use from `sklearn` in the code. Please look into  `steps/load_data.py` for more information.

In [14]:
from zendikon.pipelines.pipeline import PipelineStepInfo, Pipeline
from zendikon.pipelines.step.step_config import StepConfig
# ================ PIPELINE SET UP ==============

steps_info = [
    PipelineStepInfo(LoadDataStep,
                     StepConfig("load data step", inputs=[], outputs=["raw_features", "targets"], arguments={"active": True, "ratio": 0.7}, conda_dependencies_file="./conda_dependencies.yml")),
    PipelineStepInfo(PreprocessDataStep,
                     StepConfig("preprocess data step", inputs=["raw_features"], outputs=["processed_features"], conda_dependencies_file="./conda_dependencies.yml")),
    PipelineStepInfo(TrainModelStep,
                     StepConfig("train a model", inputs=["processed_features", "targets"], outputs=["predicted"], conda_dependencies_file="./conda_dependencies.yml"))
]
pipeline = Pipeline(input_datasets=[], compute_targets=[compute_target], steps_info=steps_info)


# Submit Pipeline
In order to submit the pipeline, in the first submission, you will need to specify `add_zendikon_feed=True` and `personal_access_token`. This will allow Zendikon package to be installed from Zendikon feed to your pipeline when the pipeline is running on your AML workspace. This only need to be done once! After the first submission, you can specify `add_zendikon_feed=False` (default setting) and leave `personal_access_token` to be `None`.

The pipeline will now execute remotely on our specified compute target, and we can track the progress in AML Studio with the generated link below:

In [15]:
pipeline.submit(experiment, wait_for_completion=False, add_zendikon_feed=True, personal_access_token="<YOUR PAT>")

Created step load data step [67f02983][a52df852-6188-4b53-a588-0225dbc44ca2], (This step is eligible to reuse a previous run's output)
Created step preprocess data step [e7ffa6fe][230263cd-d68c-43b1-84b0-a063a075e145], (This step is eligible to reuse a previous run's output)
Created step train a model [4ac81f96][d95ef809-a9bc-422b-8422-83d0bf32b32a], (This step is eligible to reuse a previous run's output)
Submitted PipelineRun 16deab2d-ef61-4bc8-9d9a-11b99a665633
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/16deab2d-ef61-4bc8-9d9a-11b99a665633?wsid=/subscriptions/7df29d08-a878-4f14-8044-00033608a1db/resourcegroups/zendikon-test/workspaces/zendikon-test&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Experiment,Id,Type,Status,Details Page,Docs Page
Simple_AML_Pipeline,16deab2d-ef61-4bc8-9d9a-11b99a665633,azureml.PipelineRun,Running,Link to Azure Machine Learning studio,Link to Documentation
