<h1 style="font-family:'Glacial Indifference', sans-serif; font-size:32px; text-align:center; background-color:teal; color:white; border-radius: 15px 50px; ">Azure ML</h1>


> Disclaimer: This notebook summarizes Python SDK common code commands used in Azure Machine Learning platform. All original contents belong to Microsoft and are accessible at [Microsoft Learn](https://learn.microsoft.com/en-us/training/courses/dp-100t01)


# Setting up

In [None]:
pip install azure-ai-ml

---

# Workspace

## Create a workspace

In [None]:
from azure.ai.ml.entities import Workspace

workspace_name = "mlw-example"

ws_basic = Workspace(
    name=workspace_name,
    location="eastus",
    display_name="Basic workspace-example",
    description="This example shows how to create a basic workspace",
)
ml_client.workspaces.begin_create(ws_basic)

## Connect to a workspace

### Step 1: Define the authentication

By connecting, we're authenticating your environment to interact with the workspace to create and manage assets and resources.

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

While working with a compute instance, managed by Azure Machine Learning, we can use the default values to connect to the workspace.

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

### Step 2: Connect to workspace on submitting job

After defining the authentication, we need to call MLClient for the environment to connect to the workspace. We'll call MLClient anytime we want to create or update an asset or resource in the workspace.

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python train.py",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    experiment_name="train-model"
)

# connect to workspace and submit job
returned_job = ml_client.create_or_update(job)

---

# Data source

To work with data in Azure Machine Learning, we can access data by using **Uniform Resource Identifiers** (URIs). 

When we work with a data source or a specific file or folder **repeatedly**, we can create **datastores** and **data assets** within the Azure Machine Learning workspace. Datastores and data assets allow we to securely store the connection information to your data.

## Create a datastore

**Definition**: In Azure Machine Learning, datastores are abstractions for cloud data sources

Whenever we want to connect another Azure storage service with the Azure Machine Learning workspace, we can create a datastore. Note that creating a datastore, creates the connection between your workspace and the storage, it doesn't create the storage service itself.

To create a datastore and connect to a (already existing) storage, we'll need to specify:

- The class to indicate with what type of storage service we want to connect. The example below connects to a Blob storage (`AzureBlobDatastore`).
- `name`: The display name of the datastore in the Azure Machine Learning workspace.
- `description`: Optional description to provide more information about the datastore.
- `account_name`: The name of the Azure Storage Account.
- `container_name`: The name of the container to store blobs in the Azure Storage Account.
- `credentials`: Provide the method of authentication and the credentials to authenticate. The example below uses an account key.

Below is an example when we want to create a datastore to connect to an Azure Blob Storage container:

In [None]:
blob_datastore = AzureBlobDatastore(
    			name = "blob_example",
    			description = "Datastore pointing to a blob container",
    			account_name = "mytestblobstore",
    			container_name = "data-container",
    			credentials = AccountKeyConfiguration(
        			account_key="XXXxxxXXXxXXXXxxXXX"
    			),
)
ml_client.create_or_update(blob_datastore)

Alternatively, we can create a datastore to connect to an Azure Blob Storage container by using a SAS token to authenticate:

In [None]:
blob_datastore = AzureBlobDatastore(
name="blob_sas_example",
description="Datastore pointing to a blob container",
account_name="mytestblobstore",
container_name="data-container",
credentials=SasTokenConfiguration(
sas_token="?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
),
)
ml_client.create_or_update(blob_datastore)

To check all the available datastores, use `datastores.list()`:

In [None]:
stores = ml_client.datastores.list()
for ds_name in stores:
    print(ds_name.name)
     

## Create a data asset

**Definition**: In Azure Machine Learning, data assets are references to where the data is stored, how to get access, and any other relevant metadata

Data assets are most useful when executing machine learning tasks as Azure Machine Learning jobs

To point to a specific folder or file in a datastore, we can create data assets. There are three types of data assets:

- **URI_FILE** points to a specific file.
- **URI_FOLDER** points to a specific folder.
- **MLTABLE** points to a MLTable file which specifies how to read one or more files within a folder.

### Create a URI file data asset

The supported paths we can use when creating a URI file data asset are:

- Local: `./<path>`
- Azure Blob Storage: `wasbs://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>`
- Azure Data Lake Storage (Gen 2): `abfss://<file_system>@<account_name>.dfs.core.windows.net/<folder>/<file>`
- Datastore: `azureml://datastores/<datastore_name>/paths/<folder>/<file>`

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<supported-path>'

my_data = Data(
    path=my_path,
    type=AssetTypes.URI_FILE,
    description="<description>",
    name="<name>",
    version="<version>"
)

ml_client.data.create_or_update(my_data)

### Create a URI file data asset

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<supported-path>'

my_data = Data(
    path=my_path,
    type=AssetTypes.URI_FOLDER,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

### Create a MLTable data asset

A MLTable data asset allows we to point to tabular data. When we create a MLTable data asset, we specify the schema definition to read the data. Therefore, we want to use a MLTable data asset when the schema of your data is complex or changes frequently. Instead of changing how to read the data in every script that uses the data, we only have to change it in the data asset itself.

For certain features in Azure Machine Learning, like Automated Machine Learning, we need to use a MLTable data asset, as Azure Machine Learning needs to know how to read the data.

#### Step 1: Define the schema

In [None]:
# This is a yml code, not Python
type: mltable

paths:
  - pattern: ./*.txt
transformations:
  - read_delimited:
      delimiter: ','
      encoding: ascii
      header: all_files_same_headers

#### Step 2: Create a MLTable data asset with defined schema

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = '<path-including-mltable-file>'

my_data = Data(
    path=my_path,
    type=AssetTypes.MLTABLE,
    description="<description>",
    name="<name>",
    version='<version>'
)

ml_client.data.create_or_update(my_data)

### List all data assets

In [None]:
datasets = ml_client.data.list()
for ds_name in datasets:
    print(ds_name.name)

### Parsing data

When we parse the URI file, URI folder or MLTable data asset as input in an Azure Machine Learning job, we first need to read the data before we can work with it.

#### Parsing with URI file

In [None]:
import argparse
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

df = pd.read_csv(args.input_data)
print(df.head(10))

#### Parsing with URI folder

In [None]:
import argparse
import glob
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)

#### Parsing with MLTable

In [None]:
import argparse
import mltable
import pandas

parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()

tbl = mltable.load(args.input_data)
df = tbl.to_pandas_dataframe()

print(df.head(10))

## Use data in a job (Complete example)

After using a notebook for experimentation. We can use scripts to train machine learning models. A script can be run as a job, and for each job we can specify inputs and outputs.

We can use either **data assets** or **datastore paths** as inputs or outputs of a job.

The example below creates the **move-data.py** script in the src folder. The script reads the input data with the `read_csv()` function. The script then stores the data as a CSV file in the output path.

In [None]:
import os

# create a folder for the script files
script_folder = 'src'
os.makedirs(script_folder, exist_ok=True)
print(script_folder, 'folder created')

In [None]:
%%writefile $script_folder/move-data.py
# import libraries
import argparse
import pandas as pd
import numpy as np
from pathlib import Path

def main(args):
    # read data
    df = get_data(args.input_data)

    output_df = df.to_csv((Path(args.output_datastore) / "diabetes.csv"), index = False)

# function that reads the data
def get_data(path):
    df = pd.read_csv(path)

    # Count the rows and print the result
    row_count = (len(df))
    print('Analyzing {} rows of data'.format(row_count))
    
    return df

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--input_data", dest='input_data',
                        type=str)
    parser.add_argument("--output_datastore", dest='output_datastore',
                        type=str)

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":
    # add space in logs
    print("\n\n")
    print("*" * 60)

    # parse args
    args = parse_args()

    # run main function
    main(args)

    # add space in logs
    print("*" * 60)
    print("\n\n")

**Data asset:** We use URI file data asset with local file to point to the local diabetes.csv:

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

my_path = './data/diabetes.csv'

my_data = Data(
    path=my_path,
    type=AssetTypes.URI_FILE,
    description="Data asset pointing to a local file, automatically uploaded to the default datastore",
    name="diabetes-local"
)

ml_client.data.create_or_update(my_data)

Finally, we submit a job that runs the **move-data.py** script, using the data asset `diabetes-local`, pointing to the local **diabetes.csv** file as input. The output is a path pointing to a folder in the new datastore `blob_training_data`.

In [None]:
from azure.ai.ml import Input, Output
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import command

# configure input and output
my_job_inputs = {
    "local_data": Input(type=AssetTypes.URI_FILE, path="azureml:diabetes-local:1")
}

my_job_outputs = {
    "datastore_data": Output(type=AssetTypes.URI_FOLDER, path="azureml://datastores/blob_training_data/paths/datastore-path")
}

# configure job
job = command(
    code="./src",
    command="python move-data.py --input_data 
{{outputs.datastore_data}}",
    inputs=my_job_inputs,
    outputs=my_job_outputs,
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="move-diabetes-data",
    experiment_name="move-diabetes-data"
)

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

---

# Compute target

**Definition**: In Azure Machine Learning, compute targets are physical or virtual computers on which jobs are run.

## Compute instance

### Create

In [None]:
from azure.ai.ml.entities import ComputeInstance

ci_basic_name = "basic-ci-12345"
ci_basic = ComputeInstance(
    name=ci_basic_name, 
    size="STANDARD_DS3_v2"
)
ml_client.begin_create_or_update(ci_basic).result()

## Compute cluster

### Create

When we create a compute cluster, there are three main parameters we need to consider:

- `size`: Specifies the virtual machine type of each node within the compute cluster. Based on the sizes for virtual machines in Azure. Next to size, we can also specify whether we want to use CPUs or GPUs.
- `max_instances`: Specifies the maximum number of nodes your compute cluster can scale out to. The number of parallel workloads your compute cluster can handle is analogous to the number of nodes your cluster can scale to.
- `tier`: Specifies whether your virtual machines are low priority or dedicated. Setting to low priority can lower costs as we're not guaranteed availability.

In [None]:
from azure.ai.ml.entities import AmlCompute

cluster_basic = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="STANDARD_DS3_v2",
    location="westus",
    min_instances=0,
    max_instances=2,
    idle_time_before_scale_down=120,
    tier="low_priority",
)
ml_client.begin_create_or_update(cluster_basic).result()

### Modify the configuration

Once created, we can only change the configuration for:

- `min_instances`: Minimum number of nodes
- `max_instances`: Maximum number of nodes
- `idle_time_before_scale_down`: Idle time before scale down

For example, to change `max_instances` to 2:

In [None]:
from azure.ai.ml.entities import AmlCompute

cluster_scale = AmlCompute(
    name="aml-cluster",
    max_instances=2,
)
ml_client.begin_create_or_update(cluster_scale)

### Check the configuration

When the compute cluster is updated, we can verify its configuration by printing its attributes.

In [None]:
cpu_cluster = ml_client.compute.get("aml-cluster")

print (
        f"AMLCompute with name {cpu_cluster.name} has a maximum of {cpu_cluster.max_instances} nodes"
    )
     

### Use

There are three main scenarios in which we can use a compute cluster:

- Running a pipeline job we built in the Designer.
- Running an Automated Machine Learning job.
- Running a script as a job.

Example of using compute cluster to run a script as a command job:

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python diabetes-training.py",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="cpu-cluster",
    display_name="train-with-cluster",
    experiment_name="diabetes-training"
    )

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

---

# Environment

As a data scientist, we want to write code that works in any development environment. Whether we're using local or cloud compute, the code should successfully execute to train a machine learning model for example.

To run code, we need to ensure necessary packages, libraries, and dependencies are installed on the compute we use to run the code. In Azure Machine Learning, environments list and store the necessary packages that we can reuse across compute targets. Azure Machine Learning builds environment definitions into Docker images and conda environments. When we use an environment, Azure Machine Learning builds the environment on the Azure Container registry associated with the workspace.

In [None]:
# List the environments using the Python SDK:
envs = ml_client.environments.list()
for env in envs:
    print(env.name)

# Review the details of a specific environment
env = ml_client.environments.get(name="my-environment", version="1")
print(env)

## Curated environment

**Definition**: Curated environments are prebuilt environments for the most common machine learning workloads, available in your workspace by default. Curated environments use the prefix **AzureML-** and are designed to provide for scripts that use popular machine learning frameworks and tooling.

In [None]:
# Review the description and tags of a curated environment with the Python SDK: 
env = ml_client.environments.get("AzureML-sklearn-0.24-ubuntu18.04-py37-cpu", version=44)
print(env. description, env.tags)

### Use

To specify which environment we want to use to run your script, we reference an environment using the `<curated-environment-name>:<version>` or `<curated-environment-name>@latest` syntax.

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python train.py",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="train-with-curated-environment",
    experiment_name="train-with-curated-environment"
)

# submit job
returned_job = ml_client.create_or_update(job)

## Custom environment

### Create

#### Option 1: Using Docker image

Docker images can be hosted in a public registry like [Docker Hub](https://hub.docker.com/) or privately stored in an Azure Container registry.

In [None]:
from azure.ai.ml.entities import Environment

env_docker_image = Environment(
    image="pytorch/pytorch:latest",
    name="public-docker-image-example",
    description="Environment created from a public Docker image.",
)
ml_client.environments.create_or_update(env_docker_image)

Alternatively, we can also use the Azure Machine Learning base images to create an environment (which are similar to the images used by curated environments):

In [None]:
from azure.ai.ml.entities import Environment

env_docker_image = Environment(
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
    name="aml-docker-image-example",
    description="Environment created from a Azure ML Docker image.",
)
ml_client.environments.create_or_update(env_docker_image)

#### Option 2: Using a conda specification file

Though Docker images contain all necessary packages when working with a specific framework, it may be that we need to include other packages to run your code.

##### Step 1: Create conda specification yaml file

In [None]:
# This is a yml code, not Python
name: basic-env-cpu
channels:
  - conda-forge
dependencies:
  - python=3.7
  - scikit-learn
  - pandas
  - numpy
  - matplotlib

##### Step 2: Create an environment using yaml conda specification file

In [None]:
from azure.ai.ml.entities import Environment

env_docker_conda = Environment(
    conda_file="./conda-env.yml",
    name="docker-image-plus-conda-example",
    description="Environment created from a Docker image plus Conda environment.",
)
ml_client.environments.create_or_update(env_docker_conda)

#### Option 3: Using Docker image as base environemnt, and add some additional packages using conda specification

We simply use both parameters `image` and `conda_file`:

In [None]:
from azure.ai.ml.entities import Environment

env_docker_conda = Environment(
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
    conda_file="./conda-env.yml",
    name="docker-image-plus-conda-example",
    description="Environment created from a Docker image plus Conda environment.",
)
ml_client.environments.create_or_update(env_docker_conda)

### Use

Similarly to curated environment, to specify which environment we want to use to run your script, we reference an environment using the `<environment-name>:<version>` or `<environment-name>@latest`

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python train.py",
    environment="docker-image-plus-conda-example:1",
    compute="aml-cluster",
    display_name="train-custom-env",
    experiment_name="train-custom-env"
)

# submit job
returned_job = ml_client.create_or_update(job)

---

# Automated Machine Learning (AutoML)

## Preprocess data and configure featurization

In order for AutoML to understand how to read the data, we need to create a **MLTable data asset** that includes the schema of the data.

In [None]:
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import Input

my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml:input-data-automl:1")

AutoML applies scaling and normalization to numeric data automatically, helping prevent any large-scale features from dominating training. During an AutoML experiment, multiple scaling or normalization techniques will be applied.

Other optional featurization:
- Missing value imputation to eliminate nulls in the training dataset.
- Categorical encoding to convert categorical features to numeric indicators.
- Dropping high-cardinality features, such as record IDs.
- Feature engineering (for example, deriving individual date parts from DateTime features)

## AutoML experiment

### Configure

When we use the Python SDK (v2) to configure an AutoML experiment or job, we configure the experiment using the `automl` class. For classification, we'll use the `automl.classification` function as shown in the following example:

In [None]:
from azure.ai.ml import automl

# configure the classification job
classification_job = automl.classification(
    compute="aml-cluster",
    experiment_name="auto-ml-class-dev",
    training_data=my_training_data_input,
    target_column_name="Diabetic",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True
)

Parameters explaination:

- Uses the compute cluster named `aml-cluster`
- Sets `Diabetic` as the target column
- Sets `accuracy` as the primary metric

**Note**: For a full list of primary metric, use the command below:

In [None]:
from azure.ai.ml.automl import ClassificationPrimaryMetrics
 
list(ClassificationPrimaryMetrics)

### Set the limits

To minimize costs and time spent on training, we can set limits to an AutoML experiment or job by using set_limits().

There are several options to set limits to an AutoML experiment:

- `timeout_minutes`: Number of minutes after which the complete AutoML experiment is terminated.
- `trial_timeout_minutes`: Maximum number of minutes one trial can take.
- `max_trials`: Maximum number of trials, or models that will be trained.
- `enable_early_termination`: Whether to end the experiment if the score isn't improving in the short term.

In [None]:
classification_job.set_limits(
    timeout_minutes=60, 
    trial_timeout_minutes=20, 
    max_trials=5,
    enable_early_termination=True,
)

Parameters explaination:
- Times out after `60` minutes of total training time
- Trains a maximum of `5` models
- No model will be trained with the `LogisticRegression` algorithm

### Submit

To submit, we use the code below:

In [None]:
# submit the AutoML job
returned_job = ml_client.jobs.create_or_update(
    classification_job
)

To monitor AutoML job runs in Azure ML studio, we can use the code below to get the direct link to the job.

In [None]:
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

---

# MLflow

**Definition**: MLflow is an open-source library for tracking and managing your machine learning experiments. In particular, **MLflow Tracking** is a component of MLflow that logs everything about the model we're training, such as **parameters**, **metrics**, and **artifacts**.

In [None]:
# Setting up
# On Azure ML
import mlflow 

# On local device
pip install mlflow
pip install azureml-mlflow

## Create an MLflow experiment

In [None]:
import mlflow
experiment_name = "mlflow-experiment-diabetes"
mlflow.set_experiment(experiment_name)

## Train and track models

We have 2 options:
- Use autologging
- Use custom logging

### Option 1: Autologging

To enable autologging, use `mlflow.sklearn.autolog()`:

In [None]:
from sklearn.linear_model import LogisticRegression

with mlflow.start_run():
    mlflow.sklearn.autolog()

    model = LogisticRegression(C=1/0.1, solver="liblinear").fit(X_train, y_train)

#### Flavor

**Definition**: A flavor is the machine learning library with which the model was created.

The framework we use to train your model is automatically identified and included as the **flavor** of your model. 

Optionally, we can specify which flavor we want your model to be identified as by using `mlflow.<flavor>.autolog()`. Some common flavors that we can use with autologging are:
- Keras: `mlflow.keras.autolog`
- Scikit-learn: `mlflow.sklearn.autolog()`
- LightGBM: `mlflow.lightgbm.autolog`
- XGBoost: `mlflow.xgboost.autolog`
- TensorFlow: `mlflow.tensorflow.autolog`
- PyTorch: `mlflow.pytorch.autolog`
- ONNX: `mlflow.onnx.autolog`

### Option 2: Custom logging

First, we disable autologging:

In [None]:
mlflow.sklearn.autolog(disable=True)

Common functions used with custom logging are:

- `mlflow.log_param()`: Logs a single key-value parameter. Use this function for an input parameter we want to log.
- `mlflow.log_metric()`: Logs a single key-value metric. Value must be a number. Use this function for any output we want to store with the run.
- `mlflow.log_artifact()`: Logs a file. Use this function for any plot we want to log, save as image file first.
- `mlflow.log_model()`: Logs a model. Use this function to create an MLflow model, which may include a custom signature, environment, and input examples.

The code below use custom logging to log one parameter, one metric and one artifact (a png file contains the ROC curve):

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
import numpy as np

with mlflow.start_run():
    model = DecisionTreeClassifier().fit(X_train, y_train)

    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)

    # plot ROC curve
    y_scores = model.predict_proba(X_test)

    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], 'k--')
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.savefig("ROC-Curve.png")

    mlflow.log_param("estimator", "DecisionTreeClassifier")
    mlflow.log_metric("Accuracy", acc)
    mlflow.log_artifact("ROC-Curve.png")

All the log can be found in Jobs page of Azure ML studio:
- **Params** and **Metrics** in Overview
- **Artifacts** in Outputs + logs

#### Signature

As logging the model allows we to easily deploy the model, we may want to customize the model's expected inputs and outputs. The schemas of the expected inputs and outputs are defined as the **signature** in the MLmodel file. The signature is stored in `JSON` format in the MLmodel file, together with other metadata of the model. 
The model signature can be inferred from datasets or created manually by hand.

##### Option 1: Infer signature

To log a model with a signature that is inferred from your training dataset and model predictions, we can use `infer_signature()`. For example, the following example takes the training dataset to infer the schema of the inputs, and the model's predictions to infer the schema of the output:

In [None]:
import pandas as pd
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature

iris = datasets.load_iris()
iris_train = pd.DataFrame(iris.data, columns=iris.feature_names)
clf = RandomForestClassifier(max_depth=7, random_state=0)
clf.fit(iris_train, iris.target)

# Infer the signature from the training dataset and model's predictions
signature = infer_signature(iris_train, clf.predict(iris_train))

# Log the scikit-learn model with the custom signature
mlflow.sklearn.log_model(clf, "iris_rf", signature=signature)

##### Option 2: Manually created signature

In [None]:
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec

# Define the schema for the input data
input_schema = Schema([
  ColSpec("double", "sepal length (cm)"),
  ColSpec("double", "sepal width (cm)"),
  ColSpec("double", "petal length (cm)"),
  ColSpec("double", "petal width (cm)"),
])

# Define the schema for the output data
output_schema = Schema([ColSpec("long")])

# Create the signature object
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

## Register an MLflow model

After training, we want to deploy a machine learning model in order to integrate the model with an application. In Azure Machine Learning, we can easily deploy a model to a batch or online endpoint when we register the model with MLflow.

MLflow uses the MLModel format to store all relevant model assets in a folder or directory. One essential file in the directory is the `MLmodel` file. The `MLmodel` file is the single source of truth about how the model should be loaded and used.

The `MLmodel` file may include:

- `artifact_path`: During the training job, the model is logged to this path.
- `flavor`: The machine learning library with which the model was created.
- `model_uuid`: The unique identifier of the registered model.
- `run_id`: The unique identifier of job run during which the model was created.
- `signature`: Specifies the schema of the model's inputs and outputs:
    - `inputs`: Valid input to the model. For example, a subset of the training dataset.
    - `outputs`: Valid model output. For example, model predictions for the input dataset.

Here is an example of a MLmodel file created for a computer vision model trained with `fastai` may look like:

In [None]:
# This is a yml code, not Python
artifact_path: classifier
flavors:
  fastai:
    data: model.fastai
    fastai_version: 2.4.1
  python_function:
    data: model.fastai
    env: conda.yaml
    loader_module: mlflow.fastai
    python_version: 3.8.12
model_uuid: e694c68eba484299976b06ab9058f636
run_id: e13da8ac-b1e6-45d4-a9b2-6a0a5cfac537
signature:
  inputs: '[{"type": "tensor",
             "tensor-spec": 
                 {"dtype": "uint8", "shape": [-1, 300, 300, 3]}
           }]'
  outputs: '[{"type": "tensor", 
              "tensor-spec": 
                 {"dtype": "float32", "shape": [-1,2]}
            }]'

There are three types of models we can register:

- **MLflow**: Model trained and tracked with MLflow. Recommended for standard use cases.
- **Custom**: Model type with a custom standard not currently supported by Azure Machine Learning.
- **Triton**: Model type for deep learning workloads. Commonly used for TensorFlow and PyTorch model deployments.

To register a MLflow model, we first need to submit a training script as a command job. Turning back to our diabetes prediction model as below:

In [None]:
from azure.ai.ml import command

# configure job

job = command(
    code="./src",
    command="python train-model-signature.py --training_data diabetes.csv",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="diabetes-train-signature",
    experiment_name="diabetes-training"
    )

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Once the job is completed and the model is trained, use the job name to find the job run and register the model from its outputs.

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

job_name = returned_job.name

run_model = Model(
    path=f"azureml://jobs/{job_name}/outputs/artifacts/paths/model/",
    name="mlflow-diabetes",
    description="Model created from run.",
    type=AssetTypes.MLFLOW_MODEL,
)
# Uncomment after adding required details above
ml_client.models.create_or_update(run_model)

### A complete example of a trained and registered MLflow model

#### Case 1: Autologging with specified flavor (sklearn)

In [None]:
%%writefile $script_folder/train-model-sklearn.py
# import libraries
import mlflow
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

def main(args):
    ### AUTOLOGGING AND SPECIFIED FLAVOR ###
    mlflow.sklearn.autolog()

    # read data
    df = get_data(args.training_data)

    # split data
    X_train, X_test, y_train, y_test = split_data(df)

    # train model
    model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)

    eval_model(model, X_test, y_test)

# function that reads the data
def get_data(path):
    print("Reading data...")
    df = pd.read_csv(path)
    
    return df

# function that splits the data
def split_data(df):
    print("Splitting data...")
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
    'SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

    return X_train, X_test, y_train, y_test

# function that trains the model
def train_model(reg_rate, X_train, X_test, y_train, y_test):
    mlflow.log_param("Regularization rate", reg_rate)
    print("Training model...")
    model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)

    return model

# function that evaluates the model
def eval_model(model, X_test, y_test):
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    print('Accuracy:', acc)

    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print('AUC: ' + str(auc))

    # plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], 'k--')
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.savefig("ROC-Curve.png") 

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--training_data", dest='training_data',
                        type=str)
    parser.add_argument("--reg_rate", dest='reg_rate',
                        type=float, default=0.01)

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":
    # add space in logs
    print("\n\n")
    print("*" * 60)

    # parse args
    args = parse_args()

    # run main function
    main(args)

    # add space in logs
    print("*" * 60)
    print("\n\n")

#### Case 2: Custom logging with defined signature

In [None]:
%%writefile $script_folder/train-model-signature.py
# import libraries
import mlflow
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
import mlflow.sklearn
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec

def main(args):
    ### DISABLE AUTOLOGGING ###
    mlflow.autolog(log_models=False)

    # read data
    df = get_data(args.training_data)

    # split data
    X_train, X_test, y_train, y_test = split_data(df)

    # train model
    model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)

    # evaluate model
    y_hat = eval_model(model, X_test, y_test)

    ### DEFINE SCHEMA ###
    input_schema = Schema([
    ColSpec("integer", "Pregnancies"),
    ColSpec("integer", "PlasmaGlucose"),
    ColSpec("integer", "DiastolicBloodPressure"),
    ColSpec("integer", "TricepsThickness"),
    ColSpec("integer", "DiastolicBloodPressure"),
    ColSpec("integer", "SerumInsulin"),
    ColSpec("double", "BMI"),
    ColSpec("double", "DiabetesPedigree"),
    ColSpec("integer", "Age"),
    ])

    output_schema = Schema([ColSpec("boolean")])

    # Create the signature object
    signature = ModelSignature(inputs=input_schema, outputs=output_schema)

    # manually log the model
    mlflow.sklearn.log_model(model, "model", signature=signature)

# function that reads the data
def get_data(path):
    print("Reading data...")
    df = pd.read_csv(path)
    
    return df

# function that splits the data
def split_data(df):
    print("Splitting data...")
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
    'SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

    return X_train, X_test, y_train, y_test

# function that trains the model
def train_model(reg_rate, X_train, X_test, y_train, y_test):
    mlflow.log_param("Regularization rate", reg_rate)
    print("Training model...")
    model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)

    return model

# function that evaluates the model
def eval_model(model, X_test, y_test):
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    print('Accuracy:', acc)
 
    return y_hat

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--training_data", dest='training_data',
                        type=str)
    parser.add_argument("--reg_rate", dest='reg_rate',
                        type=float, default=0.01)

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":
    # add space in logs
    print("\n\n")
    print("*" * 60)

    # parse args
    args = parse_args()

    # run main function
    main(args)

    # add space in logs
    print("*" * 60)
    print("\n\n")

Supposing that we move on with script 2, we can run the code below to train the model:

In [None]:
from azure.ai.ml import command

# configure job

job = command(
    code="./src",
    command="python train-model-signature.py --training_data diabetes.csv",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="diabetes-train-signature",
    experiment_name="diabetes-training"
    )

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Finally, we register the model:

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

job_name = returned_job.name

run_model = Model(
    path=f"azureml://jobs/{job_name}/outputs/artifacts/paths/model/",
    name="mlflow-diabetes",
    description="Model created from run.",
    type=AssetTypes.MLFLOW_MODEL,
)
# Uncomment after adding required details above
ml_client.models.create_or_update(run_model)
     

## View and search for experiments

### Search all the active experiments in the workspace

In [None]:
import mlflow
experiments = mlflow.search_experiments()
for exp in experiments:
    print(exp.name)

To include archived experiments, use `ViewType.ALL`:

In [None]:
from mlflow.entities import ViewType

experiments = mlflow.search_experiments(view_type=ViewType.ALL)
for exp in experiments:
    print(exp.name)

To retrieve a specific experiment:

In [None]:
exp = mlflow.get_experiment_by_name(experiment_name)
print(exp)

### Retrieve runs

MLflow allows we to search for runs inside of any experiment

In [None]:
mlflow.search_runs(exp.experiment_id)

We can use `search_all_experiments=True` if we want to search across all the experiments in the workspace. By default, experiments are ordered descending by `start_time`, which is the time the experiment was queued in Azure Machine Learning. However, we can change this default by using the parameter `order_by`.

For example, if we want to sort by start time and only show the last two results:

In [None]:
mlflow.search_runs(exp.experiment_id, order_by=["start_time DESC"], max_results=2)

We can also look for a run with a specific combination in the hyperparameters:

In [None]:
mlflow.search_runs(
    exp.experiment_id, filter_string="params.num_boost_round='100'", max_results=2
)

We can even create a query to filter the runs. Filter query strings are written with a simplified version of the SQL WHERE clause. The code below retrieve runs which train a LogisticRegression model with AUC higher than 0.8:

In [None]:
query = "metrics.AUC > 0.8 and tags.model_type = 'LogisticRegression'"
mlflow.search_runs(exp.experiment_id, filter_string=query)

---

# Run a training script as a command job 

## Create a production-ready script

When we've used notebooks for experimentation and development, we'll first need to convert a notebook to a script. Scripts are ideal for testing and automation in your production environment. To create a production-ready script, we'll need to:

- Remove nonessential code
- Refactor your code into functions.
- Test your script in the terminal.

### Remove nonessential code

The main benefit of using notebooks is being able to quickly explore your data. For example, we can use `print()` and `df.describe()` statements to explore your data and variables. When we create a script that will be used for automation, we want to avoid including code written for exploratory purposes.

### Refactor code into functions

Notebook version:

In [None]:
# read and visualize the data
print("Reading data...")
df = pd.read_csv('diabetes.csv')
df.head()

# split data
print("Splitting data...")
X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

Script version (refactored):

In [None]:
def main(csv_file):
    # read data
    df = get_data(csv_file)

    # split data
    X_train, X_test, y_train, y_test = split_data(df)

# function that reads the data
def get_data(path):
    df = pd.read_csv(path)
    
    return df

# function that splits the data
def split_data(df):
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
    'SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

    return X_train, X_test, y_train, y_test

### Test script

- One simple way to test your script, is to run the script in a terminal. Within the Azure Machine Learning workspace, we can quickly run a script in the terminal of the compute instance.
- Alternatively, navigate to Compute > Terminal, and run the code below to run a Python script named `train.py`:

In [None]:
python train.py

## Configure a command job

To configure a command job, we'll use the `command` function. To run a script, we'll need to specify values for the following parameters:

- `code`: The folder that includes the script to run.
- `command`: Specifies which file to run.
- `environment`: The necessary packages to be installed on the compute before running the command.
- `compute`: The compute to use to run the command.
- `display_name`: The name of the individual job.
- `experiment_name`: The name of the experiment the job belongs to.

The code below configure a command job to run a file named `train.py`, on the compute cluster named `aml-cluster` with the following code:

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python train.py",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="train-model",
    experiment_name="train-classification-model"
    )

## Submit a command job

In [None]:
# submit job
returned_job = ml_client.create_or_update(job)

---

# Hyperparameters tuning / Run a sweep job

In Azure Machine Learning, we can tune hyperparameters by submitting a script as a **sweep job**. A sweep job will run a **trial** for each hyperparameter combination to be tested. Each trial uses a training script with parameterized hyperparameter values to train a model, and logs the target performance metric achieved by the trained model.

## Define a search space

To define a search space for hyperparameter tuning, create a dictionary with the appropriate parameter expression for each named hyperparameter.

For example, the following search space indicates that the `batch_size` hyperparameter can have the value 16, 32, or 64, and the `learning_rate` hyperparameter can have any value from a normal distribution with a mean of 10 and a standard deviation of 3.

In [None]:
from azure.ai.ml.sweep import Choice, Normal

command_job_for_sweep = job(
    batch_size=Choice(values=[16, 32, 64]), # manually created list    
    learning_rate=Normal(mu=10, sigma=3), # distribution list
)

For **discrete** hyperparameters:
- `QUniform(min_value, max_value, q)`: Returns a value like round(Uniform(min_value, max_value) / q) * q
- `QLogUniform(min_value, max_value, q)`: Returns a value like round(exp(Uniform(min_value, max_value)) / q) * q
- `QNormal(mu, sigma, q)`: Returns a value like round(Normal(mu, sigma) / q) * q
- `QLogNormal(mu, sigma, q)`: Returns a value like round(exp(Normal(mu, sigma)) / q) * q

For **continuous** hyperparameters:
- `Uniform(min_value, max_value)`: Returns a value uniformly distributed between min_value and max_value- `
LogUniform(min_value, max_value`): Returns a value drawn according to exp(Uniform(min_value, max_value)) so that the logarithm of the return value is uniformly distribute
- `
Normal(mu, sigm`a): Returns a real value that's normally distributed with mean mu and standard deviation sig
- `a
LogNormal(mu, sig`ma): Returns a value drawn according to exp(Normal(mu, sigma)) so that the logarithm of the return value is normally distributed

## Configure a sampling method

There are three main sampling methods available in Azure Machine Learning:

- Grid sampling: Tries every possible combination.
- Random sampling: Randomly chooses values from the search space.
    - Sobol: Adds a seed to random sampling to make the results reproducible.
- Bayesian sampling: Chooses new values based on previous results.

### Grid sampling

Grid sampling can only be applied when all hyperparameters are discrete, and is used to try every possible combination of parameters in the search space.

The example below uses grid sampling to try every possible combination of discrete batch_size and learning_rate value:

In [None]:
from azure.ai.ml.sweep import Choice

command_job_for_sweep = command_job(
    batch_size=Choice(values=[16, 32, 64]),
    learning_rate=Choice(values=[0.01, 0.1, 1.0]),
)

sweep_job = command_job_for_sweep.sweep(
    sampling_algorithm = "grid",
    ...
)

### Random sampling

Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values as shown in the following code example:

In [None]:
from azure.ai.ml.sweep import Normal, Uniform

command_job_for_sweep = command_job(
    batch_size=Choice(values=[16, 32, 64]),   
    learning_rate=Normal(mu=10, sigma=3),
)

sweep_job = command_job_for_sweep.sweep(
    sampling_algorithm = "random",
    ...
)

#### Sobol

To reproduce a random sampling sweep job, we use Sobol instead.

In [None]:
from azure.ai.ml.sweep import RandomSamplingAlgorithm

sweep_job = command_job_for_sweep.sweep(
    sampling_algorithm = RandomSamplingAlgorithm(seed=123, rule="sobol"),
    ...
)

### Bayesian sampling

Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection. The following code example shows how to configure Bayesian sampling:

In [None]:
from azure.ai.ml.sweep import Uniform, Choice

command_job_for_sweep = job(
    batch_size=Choice(values=[16, 32, 64]),    
    learning_rate=Uniform(min_value=0.05, max_value=0.1),
)

sweep_job = command_job_for_sweep.sweep(
    sampling_algorithm = "bayesian",
    ...
)

## Configure early termination

When we configure a sweep job in Azure Machine Learning, we can also set a maximum number of trials. A more sophisticated approach may be to stop a sweep job when newer models **don't produce significantly better results**. To stop a sweep job based on the performance of the models, we can use an **early termination policy**.

We'll most likely want to use an early termination policy when working with **continuous hyperparameters** and a **random** or **Bayesian sampling** method.

There are two main parameters when we choose to use an early termination policy:
- `evaluation_interval`: Specifies at which interval we want the policy to be evaluated. Every time the primary metric is logged for a trial counts as an interval.- `
delay_evaluatio`n: Specifies when to start evaluating the policy. This parameter allows for at least a minimum of trials to complete without an early termination policy affecting them.

There are three options to determine the extent to which a model should perform better than previous trials:

### Option 1: Bandit policy

Bandit policy: Uses a `slack_factor` (relative) or `slack_amount`(absolute). Any new model must perform within the slack range of the best performing model.

For example, the following code applies a bandit policy with a delay of five trials, evaluates the policy at every interval, and allows an absolute slack amount of 0.2.

In [None]:
from azure.ai.ml.sweep import BanditPolicy

sweep_job.early_termination = BanditPolicy(
    slack_amount = 0.2, 
    delay_evaluation = 5, 
    evaluation_interval = 1
)

### Option 2: Median stopping policy

Median stopping policy: Uses the median of the averages of the primary metric. Any new model must perform better than the median.

For example, the following code applies a median stopping policy with a delay of five trials and evaluates the policy at every interval.

In [None]:
from azure.ai.ml.sweep import MedianStoppingPolicy

sweep_job.early_termination = MedianStoppingPolicy(
    delay_evaluation = 5, 
    evaluation_interval = 1
)

### Option 3: Truncation selection policy

Truncation selection policy: Uses a `truncation_percentage`, which is the percentage of lowest performing trials. Any new model must perform better than the lowest performing trials.

For example, the following code applies a truncation selection policy with a delay of four trials, evaluates the policy at every interval, and uses a truncation percentage of 20%.

In [None]:
from azure.ai.ml.sweep import TruncationSelectionPolicy

sweep_job.early_termination = TruncationSelectionPolicy(
    evaluation_interval=1, 
    truncation_percentage=20, 
    delay_evaluation=4 
)

## Use a sweep job

### Step 1: Create a training script for hyperparameter tuning

To run a sweep job, we need to create a training script just the way we would do for any other training job, except that your script must:
- Include an argument for each hyperparameter we want to vary.
- Log the target performance metric with MLflow. A logged metric enables the sweep job to evaluate the performance of the trials it initiates, and identify the one that produces the best performing model.

For example, the following example script trains a logistic regression model using a `--regularization` argument to set the regularization rate hyperparameter, and logs the accuracy metric with the name `Accuracy`:

In [None]:
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import mlflow

# get regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01)
args = parser.parse_args()
reg = args.reg_rate

# load the training dataset
data = pd.read_csv("data.csv")

# separate features and labels, and split for training/validatiom
X = data[['feature1','feature2','feature3','feature4']].values
y = data['label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

# train a logistic regression model with the reg hyperparameter
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate and log accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
mlflow.log_metric("Accuracy", acc)

### Step 2: Create a base command job that specifies script and define parameters

In [None]:
from azure.ai.ml import command

# configure command job as base
job = command(
    code="./src",
    command="python train.py --regularization ${{inputs.reg_rate}}",
    inputs={
        "reg_rate": 0.01,
    },
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    )

### Step 3: Override the input parameters with the search space

In [None]:
from azure.ai.ml.sweep import Choice

command_job_for_sweep = job(
    reg_rate=Choice(values=[0.01, 0.1, 1]),
)

### Step 4: Configure and submit the sweep job

To configure the sweep jobs:
- `compute`: Name of the compute target to execute the job on.
- `sampling_algorithm`: The hyperparameter sampling algorithm to use over the search space. Allowed values are `random`, `grid` and `bayesian`.
- `primary_metric`: The name of the primary metric reported by each trial job. The metric must be logged in the user's training script using `mlflow.log_metric()` with the same corresponding metric name.
- `goal`: The optimization goal of the primary_metric. The allowed values are `maximize` and `minimize`.
- `limits`: Limits for the sweep job. For example, the maximum amount of trials or models we want to train.

In [None]:
from azure.ai.ml import MLClient

# apply the sweep parameter to obtain the sweep_job
sweep_job = command_job_for_sweep.sweep(
    compute="aml-cluster",
    sampling_algorithm="grid",
    primary_metric="Accuracy",
    goal="Maximize",
)

# set the name of the sweep job experiment
sweep_job.experiment_name="sweep-example"

# define the limits for this sweep
sweep_job.set_limits(max_total_trials=4, max_concurrent_trials=2, timeout=7200)

# submit the sweep
returned_sweep_job = ml_client.create_or_update(sweep_job)

### Step 5: Motitor a sweep job

We can monitor sweep jobs in Azure Machine Learning studio. The sweep job will initiate trials for each hyperparameter combination to be tried. For each trial, we can review all logged metrics. The code below gives the direct link to monitor the sweep job:

In [None]:
aml_url = returned_sweep_job.studio_url
print("Monitor your job at", aml_url)

---

# Pipeline

**Definition**: In Azure Machine Learning, a pipeline is a workflow of machine learning tasks in which each task is defined as a component.

## Component

Components allow we to create reusable scripts that can easily be shared across users within the same Azure Machine Learning workspace. It's an effective way to build an Azure ML pipeline

A component consists of three parts:
- **Metadata**: Includes the component's name, version, etc.
- **Interface**: Includes the expected input parameters (like a dataset or hyperparameter) and expected output (like metrics and artifacts).
- **Command**, **code** and **environment**: Specifies how to run the code.

To create a component, we need two files:
- A **script** that contains the workflow we want to execute.
- A **YAML** file to define the metadata, interface, and command, code, and environment of the component.

For example, let's say we have a Python script `prep.py` that preparares the data by removing missing values and normalizing it:

In [None]:
# import libraries
import argparse
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.preprocessing import MinMaxScaler

# setup arg parser
parser = argparse.ArgumentParser()

# add arguments
parser.add_argument("--input_data", dest='input_data',
                    type=str)
parser.add_argument("--output_data", dest='output_data',
                    type=str)

# parse args
args = parser.parse_args()

# read the data
df = pd.read_csv(args.input_data)

# remove missing values
df = df.dropna()

# normalize the data    
scaler = MinMaxScaler()
num_cols = ['feature1','feature2','feature3','feature4']
df[num_cols] = scaler.fit_transform(df[num_cols])

# save the data as a csv
output_df = df.to_csv(
    (Path(args.output_data) / "prepped-data.csv"), 
    index = False
)

### Create

To create a component for the `prep.py` script, we'll need a YAML file `prep.yml`:

In [None]:
# This is a yml code, not Python
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: prep_data
display_name: Prepare training data
version: 1
type: command
inputs:
  input_data: 
    type: uri_file
outputs:
  output_data:
    type: uri_file
code: ./src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
  python prep.py 
  --input_data ${{inputs.input_data}}
  --output_data ${{outputs.output_data}}

### Load

Now we can load the component:

In [None]:
from azure.ai.ml import load_component
parent_dir = ""

loaded_component_prep = load_component(source=parent_dir + "./prep.yml")

### Register

Finally, to make the components accessible to other users in the workspace, we can also register components to the Azure Machine Learning workspace:

In [None]:
prep = ml_client.components.create_or_update(prepare_data_component)

## Create a pipeline

An Azure Machine Learning pipeline is defined in a YAML file. The YAML file includes the pipeline job name, inputs, outputs, and settings.

For example, if we want to build a pipeline that first prepares the data, and then trains the model, the step by step explaination should be:
1. The pipeline is built by defining the function `pipeline_function_name`.
2. The pipeline function expects `pipeline_job_input` as the overall pipeline input.
3. The first pipeline step requires a value for the input parameter `input_data`. The value for the input will be the value of `pipeline_job_input`.
4. The first pipeline step is defined by the loaded component for `prep_data`.
5. The value of the `output_data` of the first pipeline step is used for the expected input `training_data` of the second pipeline step.
6. The second pipeline step is defined by the loaded component for `train_model` and results in a trained model referred to by `model_output`.
7. Pipeline outputs are defined by returning variables from the pipeline function. There are two outputs:
    - `pipeline_job_transformed_data` with the value of `prep_data.outputs.output_data`
    - `pipeline_job_trained_model` with the value of `train_model.outputs.model_output`

We can achieve this pipeline by using the code below:

In [None]:
from azure.ai.ml.dsl import pipeline

@pipeline()
def pipeline_function_name(pipeline_job_input):
    prep_data = loaded_component_prep(input_data=pipeline_job_input)
    train_model = loaded_component_train(training_data=prep_data.outputs.output_data)

    return {
        "pipeline_job_transformed_data": prep_data.outputs.output_data,
        "pipeline_job_trained_model": train_model.outputs.model_output,
    }

To pass a registered data asset as the pipeline job input, we can call the function we created with the data asset as input:

In [None]:
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes

pipeline_job = pipeline_function_name(
    Input(type=AssetTypes.URI_FILE, 
    path="azureml:data:1"
))

The result then can be reviewed using:

In [None]:
print(pipeline_job)

We can also change any parameter of the pipeline job configuration by referring to the parameter and specifying the new value:

In [None]:
# change the output mode
pipeline_job.outputs.pipeline_job_transformed_data.mode = "upload"
pipeline_job.outputs.pipeline_job_trained_model.mode = "upload"
# set pipeline level compute
pipeline_job.settings.default_compute = "aml-cluster"
# set pipeline level datastore
pipeline_job.settings.default_datastore = "workspaceblobstore"

# print the pipeline job again to review the changes
print(pipeline_job)

## Run a pipeline job

In [None]:
# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_job"
)

## Schedule a pipeline job

A pipeline is ideal if we want to get your model ready for production. Pipelines are especially useful for automating the retraining of a machine learning model. To automate the retraining of a model, we can schedule a pipeline.

To schedule a pipeline job, we'll use the `JobSchedule` class to associate a schedule to a pipeline job.

### Step 1: Create a schedule

There are various ways to create a schedule. A simple approach is to create a time-based schedule using the `RecurrenceTrigger` class with the following parameters:

- `frequency`: Unit of time to describe how often the schedule fires. Value can be either minute, hour, day, week, or month.
- `interval`: Number of frequency units to describe how often the schedule fires. Value needs to be an integer.

The code below create a schedule that fires every minute:

In [None]:
from azure.ai.ml.entities import RecurrenceTrigger

schedule_name = "run_every_minute"

recurrence_trigger = RecurrenceTrigger(
    frequency="minute",
    interval=1,
)

### Step 2: Schedule a pipeline

To schedule a pipeline, we'll need `pipeline_job` to represent the pipeline we've built:

In [None]:
from azure.ai.ml.entities import JobSchedule

job_schedule = JobSchedule(
    name=schedule_name, trigger=recurrence_trigger, create_job=pipeline_job
)

job_schedule = ml_client.schedules.begin_create_or_update(
    schedule=job_schedule
).result()

### Delete a schedule

In [None]:
To delete a schedule, we first need to disable it:

In [None]:
ml_client.schedules.begin_disable(name=schedule_name).result()
ml_client.schedules.begin_delete(name=schedule_name).result()

---

# Responsible AI (RAI) dashboard

When we compare and evaluate your machine learning models, we'll want to review more than just their performance metric. Azure Machine Learning allows we to create responsible AI dashboard to explore how the model performs on different cohorts of the data.

## Step 1: Create the data asssets

To create the responsible AI dashboard, we need to register the training and testing datasets as MLtable data assets. The MLtable data assets reference the Parquet files we created earlier.

In [None]:
train_data_path = "train-data/"
test_data_path = "test-data/"
data_version = "1"

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

input_train_data = "diabetes_train_mltable"
input_test_data = "diabetes_test_mltable"

try:
    # Try getting data already registered in workspace
    train_data = ml_client.data.get(
        name=input_train_data,
        version=data_version,
    )
    test_data = ml_client.data.get(
        name=input_test_data,
        version=data_version,
    )
except Exception as e:
    train_data = Data(
        path=train_data_path,
        type=AssetTypes.MLTABLE,
        description="RAI diabetes training data",
        name=input_train_data,
        version=data_version,
    )
    ml_client.data.create_or_update(train_data)

    test_data = Data(
        path=test_data_path,
        type=AssetTypes.MLTABLE,
        description="RAI diabetes test data",
        name=input_test_data,
        version=data_version,
    )
    ml_client.data.create_or_update(test_data)

## Step 2: Build the pipeline to create the responsible AI dashboard

### Step 2.1: Get Azure ML registry for RAI components

In [None]:
# Get handle to azureml registry for the RAI built in components
registry_name = "azureml"
ml_client_registry = MLClient(
    credential=credential,
    subscription_id=ml_client.subscription_id,
    resource_group_name=ml_client.resource_group_name,
    registry_name=registry_name,
)
print(ml_client_registry)

### Step 2.2: Register the model

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

file_model = Model(
    path="model",
    type=AssetTypes.MLFLOW_MODEL,
    name="local-mlflow-diabetes",
    description="Model created from local file.",
)
model = ml_client.models.create_or_update(file_model)

### Step 2.3: Setting up RAI pipeline

In [None]:
model_name = model.name
expected_model_id = f"{model_name}:1"
azureml_model_id = f"azureml:{expected_model_id}"

In [None]:
label = "latest"

# Start with RAI Insights dashboard constructor component:
rai_constructor_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_insight_constructor", label=label
)

# we get latest version and use the same version for all components
version = rai_constructor_component.version
print("The current version of RAI built-in components is: " + version)

# Add Error Analysis to RAI Insights dashboard component
rai_erroranalysis_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_erroranalysis", version=version
)

# Add Explanation to RAI Insights dashboard component
rai_explanation_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_explanation", version=version
)

# End with a Gather RAI Insights dashboard component
rai_gather_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_insight_gather", version=version
)

### Step 2.4: Setting up the semi-complete pipeline

Finally, we build the pipeline and connect the components in the appropriate order:
1. Construct the dashboard.
2. Add error analysis.
3. Add explanations.
4. Gather all insights and visualize them in the dashboard.

In [None]:
from azure.ai.ml import Input, dsl
from azure.ai.ml.constants import AssetTypes

compute_name = "aml-cluster"

@dsl.pipeline(
    compute=compute_name,
    description="RAI insights on diabetes data",
    experiment_name=f"RAI_insights_{model_name}",
)
def rai_decision_pipeline(
    target_column_name, train_data, test_data
):
    # Initiate the RAIInsights
    create_rai_job = rai_constructor_component(
        title="RAI dashboard diabetes",
        task_type="classification",
        model_info=expected_model_id,
        model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),
        train_dataset=train_data,
        test_dataset=test_data,
        target_column_name=target_column_name,
    )
    create_rai_job.set_limits(timeout=300)

    # Add error analysis
    error_job = rai_erroranalysis_component(
        rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,
    )
    error_job.set_limits(timeout=300)

    # Add explanations
    explanation_job = rai_explanation_component(
        rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,
        comment="add explanation", 
    )
    explanation_job.set_limits(timeout=300)

    # Combine everything
    rai_gather_job = rai_gather_component(
        constructor=create_rai_job.outputs.rai_insights_dashboard,
        insight_3=error_job.outputs.error_analysis,
        insight_4=explanation_job.outputs.explanation,
    )
    rai_gather_job.set_limits(timeout=300)

    rai_gather_job.outputs.dashboard.mode = "upload"

    return {
        "dashboard": rai_gather_job.outputs.dashboard,
    }

### Step 2.5: Define inputs

Now the pipeline has been built, we need to define the two necessary inputs: the training and test dataset.

In [None]:
from azure.ai.ml import Input
target_feature = "Diabetic"

diabetes_train_pq = Input(
    type="mltable",
    path=f"azureml:{input_train_data}:{data_version}",
    mode="download",
)
diabetes_test_pq = Input(
    type="mltable",
    path=f"azureml:{input_test_data}:{data_version}",
    mode="download",
)

### Step 2.6: Setting up complete pipeline

Finally, we'll put everything together: assign the inputs to the pipeline and set the target column (the predicted label).

In [None]:
import uuid
from azure.ai.ml import Output

# Pipeline to construct the RAI Insights
insights_pipeline_job = rai_decision_pipeline(
    target_column_name="Diabetic",
    train_data=diabetes_train_pq,
    test_data=diabetes_test_pq,
)

# Workaround to enable the download
rand_path = str(uuid.uuid4())
insights_pipeline_job.outputs.dashboard = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/",
    mode="upload",
    type="uri_folder",
)

## Step 3: Run the complete pipeline

In [None]:
from azure.ai.ml.entities import PipelineJob
from IPython.core.display import HTML
from IPython.display import display
import time

def submit_and_wait(ml_client, pipeline_job) -> PipelineJob:
    created_job = ml_client.jobs.create_or_update(pipeline_job)
    assert created_job is not None

    print("Pipeline job can be accessed in the following URL:")
    display(HTML('{0}'.format(created_job.studio_url)))

    while created_job.status not in [
        "Completed",
        "Failed",
        "Canceled",
        "NotResponding",
    ]:
        time.sleep(30)
        created_job = ml_client.jobs.get(created_job.name)
        print("Latest status : {0}".format(created_job.status))
    assert created_job.status == "Completed"
    return created_job


# This is the actual submission
insights_job = submit_and_wait(ml_client, insights_pipeline_job)

---

# Deployment

## Deploy a model to a managed online endpoint

### Create

In [None]:
from azure.ai.ml.entities import ManagedOnlineEndpoint

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name="endpoint-example",
    description="Online endpoint",
    auth_mode="key",
)

ml_client.begin_create_or_update(endpoint).result()

### Deploy

#### Option 1: MLmodel

In [None]:
from azure.ai.ml.entities import Model, ManagedOnlineDeployment
from azure.ai.ml.constants import AssetTypes

# create a blue deployment
model = Model(
    path="./model",
    type=AssetTypes.MLFLOW_MODEL,
    description="my sample mlflow model",
)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="endpoint-example",
    model=model,
    instance_type="Standard_F4s_v2",
    instance_count=1,
)

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

##### Route traffic to a specific deployment, use the following code

In [None]:
# blue deployment takes 100 traffic
endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint).result()

#### Option 2: Custom model

For a custom model, additional requirements must be met:
- Model files stored on local path or registered model.
- A scoring script (details below)
- An execution environment. (to create an environment, check the Environtment - section5)

To deploy a model to an endpoint, we can specify the compute configuration with two parameters:
- `instance_type`: Virtual machine (VM) size to use. Review the list of supported sizes.
- `instance_count`: Number of instances to use.

To deploy the model, use the `ManagedOnlineDeployment` class and run the following command:

In [None]:
from azure.ai.ml.entities import ManagedOnlineDeployment, CodeConfiguration

model = Model(path="./model",

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="endpoint-example",
    model=model,
    environment="deployment-environment",
    code_configuration=CodeConfiguration(
        code="./src", scoring_script="score.py"
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

#### Scoring script

The scoring script needs to include two functions:
- `init()`: called when the deployment is created or updated, to load and cache the model from the model registry.
- `run()`: called for every time the endpoint is invoked, to generate predictions from the input data.

Below is an example:

In [None]:
import json
import joblib
import numpy as np
import os

# called when the deployment is created or updated
def init():
    global model
    # get the path to the registered model file and load it
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    model = joblib.load(model_path)

# called when a request is received
def run(raw_data):
    # get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # get a prediction from the model
    predictions = model.predict(data)
    # return the predictions as any JSON serializable format
    return predictions.tolist()

### Test

Typically, we send data to deployed model in JSON format with the following structure:

In [None]:
# This is JSON code, not Python
{
  "data":[
      [0.1,2.3,4.1,2.0], // 1st case
      [0.2,1.8,3.9,2.1],  // 2nd case,
      ...
  ]
}

The response from the deployed model is a JSON collection with a prediction for each case that was submitted in the data. The following code sample invokes an endpoint and displays the response:

In [None]:
# test the blue deployment with some sample data
response = ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="blue",
    request_file="sample-data.json",
)

if response[1]=='1':
    print("Yes")
else:
    print ("No")

### View endpoints

To list all endpoints, use `online_endpoints.list()`:

In [None]:
endpoints = ml_client.online_endpoints.list()
for endp in endpoints:
    print(endp.name)

To get endpoint's details, use `online_endpoints.get()`:

In [None]:
# Get the details for online endpoint
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

# existing traffic details
print(endpoint.traffic)

# Get the scoring URI
print(endpoint.scoring_uri)

### Delete

In [None]:
ml_client.online_endpoints.begin_delete(name="endpoint-example")

## Deploy a model to a managed online endpoint

### Create

In [None]:
# create a batch endpoint
endpoint = BatchEndpoint(
    name="endpoint-example",
    description="A batch endpoint",
)

ml_client.batch_endpoints.begin_create_or_update(endpoint)

### Deploy

#### Option 1: MLflow model

In [None]:
# Register the model
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

model_name = 'mlflow-model'
model = ml_client.models.create_or_update(
    Model(name=model_name, path='./model', type=AssetTypes.MLFLOW_MODEL)
)

To deploy an MLflow model to a batch endpoint, we'll use the `BatchDeployment` class:

In [None]:
from azure.ai.ml.entities import BatchDeployment, BatchRetrySettings
from azure.ai.ml.constants import BatchDeploymentOutputAction

deployment = BatchDeployment(
    name="classifier-diabetes-mlflow",
    description="A diabetes classifier",
    endpoint_name=endpoint.name,
    model=model,
    compute="aml-cluster",
    instance_count=2,
    max_concurrency_per_instance=2,
    mini_batch_size=2,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)
ml_client.batch_deployments.begin_create_or_update(deployment)

We deployed for example a model with the following parameters:

- `name`: Name of the deployment.
- `description`: Optional description to further clarify what the deployment represents.
- `endpoint_name`: Name of the previously created endpoint the model should be deployed to.
- `model`: Name of the registered model.
- `compute`: Compute to be used when invoking the deployed model to generate predictions.
- `instance_count`: Count of compute nodes to use for generating predictions.
- `max_concurrency_per_instance`: Maximum number of parallel scoring script runs per compute node.
- `mini_batch_size`: Number of files passed per scoring script run.
- `output_action`: Each new prediction will be appended as a new row to the output file.
- `output_file_name`: File to which predictions will be appended.
- `retry_settings`: Settings for a mini-batch fails.
- `logging_level`: The log verbosity level. Allowed values are warning, info, and debug.

#### Option 2: Custom model

The requirements for a custom model running batch predictions are the same as those for online endpoint deployment. The code below demonstrates an example configuration and creation of a deployment:

In [None]:
from azure.ai.ml.entities import BatchDeployment, BatchRetrySettings
from azure.ai.ml.constants import BatchDeploymentOutputAction

deployment = BatchDeployment(
    name="classifier-diabetes-mlflow",
    description="A diabetes classifier",
    endpoint_name=endpoint.name,
    model=model,
    compute="aml-cluster",
    instance_count=2,
    max_concurrency_per_instance=2,
    mini_batch_size=2,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)
ml_client.batch_deployments.begin_create_or_update(deployment)

### Submit the job

Now that we have deployed a model to a batch endpoint, we're ready to invoke the endpoint to generate predictions on the data.

In [None]:
# Define the input by referring to the registered data asset
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes

input = Input(type=AssetTypes.URI_FOLDER, path=patient_dataset_unlabeled.id)

In [None]:
# Invoke the endpoint, which will submit a pipeline job
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name, 
    input=input)

ml_client.jobs.get(job.name)

### Get the results

When the pipeline job that invokes the batch endpoint is completed, we can view the results. All predictions are collected in the `predictions.csv` file that is stored in the default datastore. we can download the file and visualize the data by running the following cells.

In [None]:
ml_client.jobs.download(name=job.name, download_path=".", output_name="score")

In [None]:
with open("predictions.csv", "r") as f:
    data = f.read()

In [None]:
from ast import literal_eval
import pandas as pd

score = pd.DataFrame(
    literal_eval(data.replace("\n", ",")), columns=["file", "prediction"]
)
score

<h1 style="font-family:'Glacial Indifference', sans-serif; font-size:32px; text-align:center; background-color:teal; color:white; border-radius: 15px 50px; ">THE END</h1>