# Run hyperparameter sweep on a CommandJob or CommandComponent

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Create and run a `CommandJob` from Python SDK.
- Use the `SweepJob ` to run a hyperparameter sweep on this CommandJob
- Create a `CommandComponent` from Python SDK.
- Use the `SweepJob ` to run a hyperparameter sweep on the CommandComponent.

**Motivations** - This example explains how Azure Machine Learning lets you automate hyperparameter tuning using the SweepJob to efficiently optimize hyperparameters. Hyperparameters are adjustable parameters that let you control the model training process. Hyperparameter tuning, also called hyperparameter optimization, is the process of finding the configuration of hyperparameters that results in the best performance. 

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.ml import MLClient
from azure.ml import ArtifactInput, command, NumberInput, StringInput
from azure.ml.sweep import Choice, Uniform, MedianStopping
from azure.identity import InteractiveBrowserCredential

## 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
#Enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# 2. Configure and run the CommandJob
In this section we will configure and run the CommandJob

## 2.1 Configure the CommandJob
The `CommandJob` allows user to configure the following key aspects.
- `code` - This is the path where the code to run the command is located
- `command` - This is the command that needs to be run
- `inputs` - This is the dictionary of inputs using name value pairs to the CommandJob. The key is a name for the input within the context of the job and the value is the input value. Inputs can be referenced in the `command` using the `${{inputs.<input_name>}}` expression. To use files or folders as inputs, we can use the `JobInput` class. The `JobInput` class supports three parameters:
    - `type` - The type of input. This can be a `uri_file` or `uri_folder`. The default is `uri_folder`.         
    - `path` - The path to the file or folder. These can be local or remote files or folders. For remote files - http/https, wasb are supported. 
        - Azure ML `data`/`dataset` or `datastore` are of type `uri_folder`. To use `data`/`dataset` as input, you can use registered dataset in the workspace using the format '<data_name>:<version>'. For e.g JobInput(type='uri_folder', path='my_dataset:1')
    - `mode` - 	Mode of how the data should be delivered to the compute target. Allowed values are `ro_mount`, `rw_mount` and `download`. Default is `ro_mount`
- `environment` - This is the environment needed for the command to run. Curated or custom environments from the workspace can be used. Or a custom environment can be created and used as well. Check out the [environment](/sdk/assets/environment/environment.ipynb) notebook for more examples.
- `compute` - The compute on which the CommandJob will run. In this example we are using a compute called `cpu-cluster` present in the workspace. You can replace it any other compute in the workspace. You can run it on the local machine by using `local` for the compute. This will run the CommandJob on the local machine and all the run details and output of the job will be uploaded to the Azure ML workspace.
- `distribution` - Distribution configuration for distributed training scenarios. Azure Machine Learning supports PyTorch, TensorFlow, and MPI-based distributed training. The allowed values are `PyTorch`, `TensorFlow` or `Mpi`.
- `display_name` - The display name of the Job
- `description` - The description of the experiment

In [None]:
#define the command job
command_job=command(
    code='./src',
    command = 'python main.py --iris-csv ${{inputs.iris_csv}} --learning-rate ${{inputs.learning_rate}} --boosting ${{inputs.boosting}}',
    environment='AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest',
    inputs={'iris_csv':ArtifactInput(type='uri_file', path='https://azuremlexamples.blob.core.windows.net/datasets/iris.csv'),'learning_rate': 0.9, 'boosting': 'gbdt'},
    compute='cpu-cluster',
    display_name='lightgbm-iris-example',
    experiment_name='lightgbm-iris-example',
    description='Train a LightGBM model on the Iris dataset.'
)

## 2.2 Run the CommandJob
Using the `MLClient` created earlier, we will now run this CommandJob in the workspace.

In [None]:
#submit the command job
returned_job = ml_client.create_or_update(command_job)
#get a URL for the status of the job
returned_job.services["Studio"].endpoint

# 3.  Run a sweep on this command job
Hyperparameters are adjustable parameters that let you control the model training process. Hyperparameter tuning, also called hyperparameter optimization, is the process of finding the configuration of hyperparameters that results in the best performance. Azure Machine Learning provides the `SweepJob` to do hyperparameter tuning. To start with let us understand 2 concepts

* **Trial Job** - This is the job that needs to be optimized. The SweepJob will run this job several times using a set of parameters so that it can be tuned. The set of parameters is defined in the search space
* **Search Space** - The `search space` is a set of hyperparameters. Each of these parameters can have a discrete or continuos value. The SweepJob will run the `trial` job template multiple times using different combination of hyperparameter values as specified in the `search space`. 

The Azure Machine Learning `SweepJob` allows user to configure the following key aspects:
- `search_space` - Dictionary of name value pairs to define the hyperparameter search space. The key is the name of the hyperparameter and the value is the parameter expression. Parameters provided in search space will override parameters provided as inputs within the command job itself. Hyperparameters can be referenced in the `trial.command` using the ${{search_space.<hyperparameter>}} expression. 
- `sampling_algorithm`- The hyperparameter sampling algorithm to use over the search_space. Allowed values are `random`, `grid` and `bayesian`.
- `objective` - the objective of the sweep
  - `primary_metric` - The name of the primary metric reported by each trial job. The metric must be logged in the user's training script using `mlflow.log_metric()` with the same corresponding metric name.
  - `goal` - The optimization goal of the objective.primary_metric. The allowed values are `maximize` and `minimize`.
- `compute` - Name of the compute target to execute the job on.
- `inputs` - This is the dictionary of inputs using name value pairs to the SweepJob. The key is a name for the input within the context of the job and the value is the input value. Inputs can be referenced in the `command` using the `${{inputs.<input_name>}}` expression. 
- `limits` - Limits for the sweep job

## 3.1 Define search_space and limits
We will now define the search space and limits for the sweep job. Please note that Parameters provided in search space will override parameters provided as inputs within the command job itself.

In [None]:
# we are using the command_job above again by applying it inputs via calling it as a function
# note that we do not apply the 'iris_csv' input again -- that way the prior input from above will be used again
command_job_for_sweep = command_job(learning_rate=Uniform(min_value=0.01, max_value=0.9), 
                                    boosting=Choice(values=['gbdt', 'dart']))


## 3.2 Define the SweepJob 
We will now define the sweep job. In this example we configure the sweep job with `trial`, `search_space`, `sampling_algorithm`, `objective`, `limits` and `compute`.

We will use the `command_job` created earlier as the `trial` for this `SweepJob`.

In [None]:
# apply the sweep parameter to obtain the sweep_job
sweep_job = command_job_for_sweep.sweep(
    compute='cpu-cluster',
    sampling_algorithm='random',
    objective_primary_metric='test-multi_logloss',
    bbjective_goal='Minimize',
    display_name='lightgbm-iris-sweep-example',
    experiment_name='lightgbm-iris-sweep-example',
    description='Run a hyperparameter sweep job for LightGBM on Iris dataset.'
)

#define the limits for this sweep
sweep_job.set_limits(max_total_trials=20, max_concurrent_trials=10, timeout=7200)

## 3.3 Run the SweepJob
Using the `MLClient` created earlier, we will now run this SweepJob in the workspace.

In [None]:
#submit the sweep job
returned_sweep_job = ml_client.create_or_update(sweep_job)
#get a URL for the status of the job
returned_sweep_job.services["Studio"].endpoint

# 4. Run a sweep on a Command Component
An Azure Machine Learning [component](https://docs.microsoft.com/en-us/azure/machine-learning/concept-component) is a self-contained piece of code that performs a step in a machine learning job or pipeline. Components can do tasks such as data processing, model training, model scoring, and so on. A component is analogous to a function - it has a name, parameters, expects input, and returns output. This makes them very useful to reuse code/functionality.

# 4.1 Create reusable component
Let us create a component to perform the same functionality of the `command_job` we created earlier. A component allows user to define the following key parameters:
- `command` - The command to execute
- `code` - Local path to the source code directory to be uploaded and used for the component.
- `environment` - The environment to use for the component. This can be either a reference to an existing versioned environment in the workspace or an inline environment specification.
- `inputs` - This is the dictionary of inputs using name value pairs to the CommandComponent. Inputs can be referenced in the `command` using the `${{inputs.<input_name>}}` expression.
- `instance_count` - The number of nodes to use for the component
- `name` - Name of the component
- `version` - Version of the component
- `distribution` - The distribution configuration for distributed training scenarios. Allowed values are `Pytorch`, `Mpi`, or `TensorFlow`.

In [None]:
#define the command component -- this is the same as the command job above, only no compute is specified and no values but type infomation is provided as inputs
command_component=command(
    code='./src',
    command = 'python main.py --iris-csv ${{inputs.iris_csv}} --learning-rate ${{inputs.learning_rate}} --boosting ${{inputs.boosting}}',
    environment='AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest',
    inputs={'iris_csv':ArtifactInput(type='uri_file'),'learning_rate': NumberInput(default=0.9), 'boosting': StringInput(default='gbdt')},
    display_name='lightgbm-iris-example',
    experiment_name='lightgbm-iris-example',
    description='Train a LightGBM model on the Iris dataset.'
)

## 4.2 Define the SweepJob 
We will now define the sweep job. In this example we configure the sweep job with `trial`, `search_space`, `sampling_algorithm`, `objective`, `limits` and `compute`.

We will use the `trial_component` created earlier as the `trial` for this `SweepJob`.

In [None]:
# run sweep using this component
# this time we will provide in input for iris_csv since no default was provide at compnent definition
command_job_for_sweep = command_job(learning_rate=Uniform(min_value=0.01, max_value=0.9), 
                                    boosting=Choice(values=['gbdt', 'dart']),
                                    ris_csv= ArtifactInput(type='uri_file', path='wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv',mode='rw_mount'))

# this is the same as above
cmd_sweep_job = command_job_for_sweep.sweep(
    compute='cpu-cluster',
    sampling_algorithm='random',
    objective_primary_metric='test-multi_logloss',
    bbjective_goal='Minimize',
    display_name='lightgbm-iris-sweep-example',
    experiment_name='lightgbm-iris-sweep-example',
    description='Run a hyperparameter sweep job for LightGBM on Iris dataset.'
)

#define the limits for this sweep
sweep_job.set_limits(max_total_trials=20, max_concurrent_trials=10, timeout=7200)

# set early stopping on this one
sweep_job.early_termination = MedianStopping(delay_evaluation=5, evaluation_interval= 2)

## 4.3 Run the SweepJob
Using the `MLClient` created earlier, we will now run this SweepJob in the workspace.

In [None]:
#submit the sweep job
returned_sweep_job_cmd = ml_client.jobs.create_or_update(cmd_sweep_job)
#get a URL for the status of the job
returned_sweep_job_cmd.services["Studio"].endpoint

# Next Steps
You can see further examples of running a job [here](/sdk/jobs/single-step/)