# Build pipeline with sweep node

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create sweep node with `sweep()`
- Create `Pipeline` with sweep node

**Motivations** - This notebook explains how to create a sweep node by using `sweep()` and use it in a pipeline. A sweep node can be used to enable hyperparameter tuning on a specified compute (either local or on the cloud) for a specific command component. You can define a `search_space` and an `objective` to search for the target output.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.ml import MLClient, dsl, Input

## 1.2. Configure credential

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token('https://management.azure.com/.default')
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. 

In [None]:
try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": "<SUBSCRIPTION_ID>",
        "resource_group": "<RESOURCE_GROUP>",
        "workspace_name": "<WORKSPACE_NAME>"
    }

    if client_config["subscription_id"].startswith('<'):
        print("please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <WORKSPACE_NAME> in notebook cell")
        raise ex
    else:  # write and reload from config file
        import json, os
        config_path = "../../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

## 1.4. Retrieve or create an Azure Machine Learning compute target

In [None]:
# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
try:
    ml_client.compute.get(name=cluster_name)
except Exception:
    print('Creating a new compute target...')
    from azure.ml.entities import AmlCompute
    compute = AmlCompute(name=cluster_name, size="Standard_D2_v2", max_instances=2)
    ml_client.compute.begin_create_or_update(compute)

# 2. Pipeline job with hyperparameter sweep
Similar to [1b_pipeline_with_python_function_components](../1b_pipeline_with_python_function_components/), we define 3 sample components using dsl.command_component in [dsl_component.py](dsl_component.py) first and then build the pipeline.

## 2.1 Build pipeline
In pipeline definition, we define `search_space` for hyperparameter sweep in inputs of `train_model` and call `train_model.sweep()` to create a sweep node based on `train_model` with specific run settings. Run settings includes:
- objective_primary_metric 
- objective_goal
- sampling_algorithm
- limits
- early_termination_policy
- compute

Please check section **3. Run a sweep on this command** in [Run hyperparameter sweep on a Command or CommandComponent](../../single-step/lightgbm/iris/lightgbm-iris-sweep.ipynb) for detailed explanation for the above concepts and settings.

Noted that the **primary metric** of sweep objective must be **LOGGED** in the definition of `train_model`.

In [None]:
from azure.ml import dsl
from azure.ml.entities import load_component
from azure.ml.sweep import (
    Choice,
    Uniform,
)

train_component_func = load_component(yaml_file="./train.yml")
score_component_func = load_component(yaml_file="./predict.yml")

# define a pipeline with dsl component
@dsl.pipeline(
    description="Tune hyperparameters using sample components",
    default_compute="cpu-cluster",
)
def pipeline_with_hyperparameter_sweep():
    train_model = train_component_func(
        data=Input(type="uri_file", path="wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv"),
        c_value=Uniform(min_value=0.5, max_value=0.9),
        kernel=Choice(["rbf", "linear", "poly"]),
        coef0=Uniform(min_value=0.1, max_value=1),
        degree=3,
        gamma="scale",
        shrinking=False,
        probability=False,
        tol=0.001,
        cache_size=1024,
        verbose=False,
        max_iter=-1,
        decision_function_shape="ovr",
        break_ties=False,
        random_state=42
    )
    sweep_step = train_model.sweep(
        primary_metric="training_f1_score",
        goal="minimize",
        sampling_algorithm="random",
        compute="cpu-cluster",
    )
    sweep_step.set_limits(max_total_trials=20, max_concurrent_trials=10, timeout=7200)

    score_data = score_component_func(
        model=sweep_step.outputs.model_output, 
        test_data=sweep_step.outputs.test_data
    )
    

pipeline = pipeline_with_hyperparameter_sweep()


## 2.2 Submit pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="pipeline_samples")
pipeline_job

In [None]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)