# How to fine tune a foundation model on Azure Machine Learning using SDK v2

Fine tuning a [foundational model](https://learn.microsoft.com/en-us/azure/machine-learning/concept-foundation-models?view=azureml-api-2) has several advantages:
 
* A foundation model may not be optimized for your specific use case, and fine tuning would allow you to customize it for your needs and better performance.
* Fine tuning allows you to incorporate your own data into the model, resulting in better accuracy and more relevant results. 
* Training on your own data could also reduce bias and be more reflective of the unique characteristics of your domain. 

Ultimately, fine tuning gives you a competitive edge on your product. Customizing the model to your specific needs can make a big difference in your product experience. 

In this tutorial, you'll walk through the steps to fine tune a natural language processing (NLP) model to analyze sentiments expressed in single sentences written in English.  The tutorial uses the `emotion dataset` and `text-classification` components from the Azure Machine Learning system registry. 

By the end of this tutorial, you'll have the fine tuned model deployed to an online endpoint for real time inference, which can classify input texts into one of the six emotions: anger, fear, joy, love, sadness, and surprise.  Let's get started!  

The steps are:

>* Pick a model to fine tune
>* Setup pre-requisites such as compute
>* Pick and explore training data
>* Configure & submit the fine tuning job
>* Review training and evaluation metrics
>* Register the fine tuned model
>* Deploy the fine tuned model for real time inference
>* Clean up resources

**Training data**

You'll use the [emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset. A copy of this dataset is available in the [emotion-dataset](./emotion-dataset/) folder. 

**Model**

Models that can perform the `fill-mask` task are generally good foundation models to fine tune for `text-classification`. We will use the `bert-base-uncased` model in this notebook. 


## Prerequisites

1. Open in studio and select a compute instance.
    * If you opened this notebook from Azure Machine Learning studio, you need a compute instance to run the code. If you don't have a compute instance, select **Create compute** on the toolbar to first create one.  You can use all the default settings.  
    
    ![Screenshot shows how to create a compute instance.](../get-started-notebooks/media/create-compute.png)
    
    * If you're seeing this notebook elsewhere, complete [Create resources you need to get started](https://docs.microsoft.com/azure/machine-learning/quickstart-create-resources) to create an Azure Machine Learning workspace and a compute instance.
    
1. View your VM quota and ensure you have enough quota available to create online deployments. In this tutorial, you will need at least 12 cores of `Standard_NC6s_v3` and 4 cores of `Standard_DS3_v2`. @@IS THIS RIGHT?@@ To view your VM quota usage and request quota increases, see [Manage resource quotas](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas#view-your-usage-and-quotas-in-the-azure-portal).  

## Set your kernel

* If your compute instance is stopped, start it now.  
        
    ![Screenshot shows where to start the compute instance.](../get-started-notebooks/media/start-compute.png)

* Once your compute instance is running, make sure the that the kernel, found on the top right, is `Python 3.10 - SDK v2`.  If not, use the dropdown to select this kernel.

    ![Screenshot shows setting the kernel.](../get-started-notebooks/media/set-kernel.png)

### Pick a model to fine tune

For `text-classification`, models that support `fill-mask` tasks are good candidates because they're pretrained language models that can understand the context of a given text and predict the missing words or tokens in it. This ability to understand the context of a text and predict missing words make `fill-mask` models highly effective in capturing the meaning of the text and identifying its underlying sentiment or emotion.

Let's select a model to fine tune.

1. Sign into [Azure Machine Learning studio](ml.azure.com)
2. Select `model catalog` on the left navigation bar
3. Search for `bert-base-uncased` on the model catalog
4. Select the `bert-base-uncased` model to see the model card 

![Screenshot of the model catalog.](./media/model_catalog.png)

On the model card, you can find the model name `bert-base-uncased`. This is the only reference you need in order to fine tune the model on the Notebook using SDK v2. 

## Set up your workstation for fine tuning

Set up your workstation so you can use `Azure Machine Learning SDK v2` to fine tune the model. Follow these steps: 

### Install dependencies.

Install dependencies by running the next cell. This isn't an optional step if running in a new environment. @@DO WE NEED ALL OF THESE ON A COMPUTE INSTANCE?@@

In [13]:
%pip install azure-ai-ml
%pip install azure-identity
%pip install datasets==2.9.0
%pip install mlflow
%pip install azureml-mlflow

Collecting azure-storage-blob<13.0.0,>=12.10.0
  Using cached azure_storage_blob-12.16.0-py3-none-any.whl (387 kB)
Installing collected packages: azure-storage-blob
  Attempting uninstall: azure-storage-blob
    Found existing installation: azure-storage-blob 12.13.0
    Uninstalling azure-storage-blob-12.13.0:
      Successfully uninstalled azure-storage-blob-12.13.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
azureml-mlflow 1.49.0 requires azure-storage-blob<=12.13.0,>=12.5.0, but you have azure-storage-blob 12.16.0 which is incompatible.[0m[31m
[0mSuccessfully installed azure-storage-blob-12.16.0
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel

### Create handle to workspace
Before we dive in the code, you need a way to reference your workspace. Create `ml_client` for a handle to the workspace. Then use `ml_client` to manage resources and jobs.

In the next cell, enter your `Subscription ID`, `Resource Group` name and `Workspace` name. To find these values:

- In the upper right Azure Machine Learning studio toolbar, select your workspace name.
- Copy the value for workspace, resource group and subscription ID into the code.
- You'll need to copy one value, close the area and paste, then come back for the next one.

In [14]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    workspace_ml_client = MLClient.from_config(credential=credential)
except:
    workspace_ml_client = MLClient(
        credential,
        subscription_id="ed2cab61-14cc-4fb3-ac23-d72609214cfd",
        resource_group_name="training_rg",
        workspace_name="swatig_ws",
    )

# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'
compute_cluster = "gpu-cluster-big"
try:
    compute = workspace_ml_client.compute.get(compute_cluster)
except Exception as ex:
    compute = AmlCompute(
        name=compute_cluster,
        size="Standard_NC24rs_v3",
        max_instances=2,  # For multi node training set this to an integer value more than 1
    )
    workspace_ml_client.compute.begin_create_or_update(compute).wait()

### Connect to `azureml` system registry & import the model

In order to access the preregistered foundation models hosted on the model catalog, you need to connect to `azureml` registry. Run the next cell to connect to the system registry and import the `bert-base-uncased` model.

In [15]:
# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml-preview"
registry_ml_client = MLClient(credential, registry_name="azureml")

model_name = "bert-base-uncased"
model_version = "3"
foundation_model = registry_ml_client.models.get(model_name, model_version)
print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(
        foundation_model.name, foundation_model.version, foundation_model.id
    )
)



Using model name: bert-base-uncased, version: 3, id: azureml://registries/azureml/models/bert-base-uncased/versions/3 for fine tuning


### Set an optional experiment name
This step is optional but useful if you want to find this fine tuning job easily.

In [16]:
experiment_name = "text-classification-emotion-detection"

### Check or create a compute cluster  @@OR USE SERVERLESS???@@

For fine tuning tasks, you need a GPU compute cluster for the best results. The duration of the fine tuning depends on the capacity of the GPU SKU you choose. That is because a single GPU node can have multiple GPU cards. 

For example, in one node of `Standard_ND40rs_v2` there are eight NVIDIA GPUs. Meanwhile in `Standard_NC12s_v2` there are two NVIDIA V100 GPUs. When all GPUs in the node get utilized (by configuring the parameter in `gpus_per_node`), you get the most efficient fine tune run. You can read more about Azure's [GPU optimized VM offerings](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) and the recommended compute SKUs ([ncv3-series](https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series), [ndv2-series](https://learn.microsoft.com/en-us/azure/virtual-machines/ndv2-series)).

In this tutorial, you'll use `Standard_NC6s_v3` which takes about 15-20 minutes to complete the fine tuning run.

In [17]:


# This is the number of GPUs in a single node of the selected 'vm_size' compute.
# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.
# Setting this to more than the number of GPUs will result in an error.
gpu_count_found = False
workspace_compute_sku_list = workspace_ml_client.compute.list_sizes()
available_sku_sizes = []
for compute_sku in workspace_compute_sku_list:
    available_sku_sizes.append(compute_sku.name)
    if compute_sku.name.lower() == compute.size.lower():
        gpus_per_node = compute_sku.gpus
        gpu_count_found = True
# if gpu_count_found not found, then print an error
if gpu_count_found:
    print(f"Number of GPU's in copute {compute.size}: {gpus_per_node}")
else:
    raise ValueError(
        f"Number of GPU's in copute {compute.size} not found. Available skus are: {available_sku_sizes}."
        f"This should not happen. Please check the selected compute cluster: {compute_cluster} and try again."
    )
# CPU based finetune works only for single-node single-process
if gpus_per_node == 0:
    print(
        "WARNING! Selected compute doesn't have GPU. CPU based finetune is experimental and works on a single process in a single node"
    )
    gpus_per_node = 1

# genrating a unique timestamp that can be used for names and versions that need to be unique
timestamp = str(int(time.time()))

Number of GPU's in copute STANDARD_ND40RS_V2: 8


## Prepare the dataset for fine-tuning the model

There are two options to prepare the dataset for fine tuning. The first option is to choose the fine tune option on the model catalog where you found ` bert-base-uncased` model earlier. The second option is to prepare a dataset that matches your use case for fine tuning. This tutorial focuses on the second option.  

You're going to use the [emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset. You can find a copy of this dataset in the emotion-dataset folder that came with this notebook. 

### Start by downloading the dataset


In [18]:
# download the dataset using the helper script. This needs datasets library: https://pypi.org/project/datasets/
import os

exit_status = os.system("python ./download-dataset.py --download_dir emotion-dataset")
if exit_status != 0:
    raise Exception("Error downloading dataset")

Downloading builder script: 100%|██████████| 3.97k/3.97k [00:00<00:00, 3.32MB/s]
Downloading metadata: 100%|██████████| 3.28k/3.28k [00:00<00:00, 2.69MB/s]
Downloading readme: 100%|██████████| 8.78k/8.78k [00:00<00:00, 5.88MB/s]
No config specified, defaulting to: emotion/split
No config specified, defaulting to: emotion/split


Downloading and preparing dataset emotion/split to /root/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]
Downloading data: 100%|██████████| 592k/592k [00:00<00:00, 9.54MB/s]
Downloading data files:  33%|███▎      | 1/3 [00:00<00:01,  1.86it/s]
Downloading data: 100%|██████████| 74.0k/74.0k [00:00<00:00, 5.94MB/s]
Downloading data files:  67%|██████▋   | 2/3 [00:00<00:00,  2.09it/s]
Downloading data: 100%|██████████| 74.9k/74.9k [00:00<00:00, 6.37MB/s]
Downloading data files: 100%|██████████| 3/3 [00:01<00:00,  2.15it/s]
Extracting data files: 100%|██████████| 3/3 [00:00<00:00, 80.49it/s]
Creating json from Arrow format: 100%|██████████| 16/16 [00:00<00:00, 244.38ba/s]      


Dataset emotion downloaded and prepared to /root/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd. Subsequent calls will reuse this data.


No config specified, defaulting to: emotion/split
Found cached dataset emotion (/root/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)
Creating json from Arrow format: 100%|██████████| 2/2 [00:00<00:00, 279.34ba/s]
No config specified, defaulting to: emotion/split
Found cached dataset emotion (/root/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)
Creating json from Arrow format: 100%|██████████| 2/2 [00:00<00:00, 358.86ba/s]
No config specified, defaulting to: emotion/split
No config specified, defaulting to: emotion/split
Found cached dataset emotion (/root/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)


### Visualize some data rows
It's important to understand the data and its features. Let's start by taking a look.

In [19]:
# load the ./emotion-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
df = pd.read_json("./emotion-dataset/train.jsonl", lines=True)
df.head()

Unnamed: 0,text,label
0,i didnt feel humiliated,0
1,i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake,0
2,im grabbing a minute to post i feel greedy wrong,3
3,i am ever feeling nostalgic about the fireplace i will know that it is still on the property,2
4,i am feeling grouchy,3


### Replace numerical categories in data with the actual string labels

This data set uses numerical categories. For example, 0 refers to `sadness`. To get string labels such as `anger`, `joy`, etc., replace the categories. Run the next cell to get the string labels.

You can see the detailed mapping in the [./emotion-dataset/label.json](./emotion-dataset/label.json). If you skip this step, the model returns numerical categories such as 0, 1, 2, etc. and you have to map them to what the category represents yourself.

In [20]:
# load the id2label json element of the ./emotion-dataset/label.json file into pandas table with keys as 'label' column of int64 type and values as 'label_string' column as string type
import json

with open("./emotion-dataset/label.json") as f:
    id2label = json.load(f)
    id2label = id2label["id2label"]
    label_df = pd.DataFrame.from_dict(
        id2label, orient="index", columns=["label_string"]
    )
    label_df["label"] = label_df.index.astype("int64")
    label_df = label_df[["label", "label_string"]]
label_df.head()

Unnamed: 0,label,label_string
0,0,sadness
1,1,joy
2,2,love
3,3,anger
4,4,fear


In [21]:
# load test.jsonl, train.jsonl and validation.jsonl form the ./emotion-dataset folder into pandas dataframes
test_df = pd.read_json("./emotion-dataset/test.jsonl", lines=True)
train_df = pd.read_json("./emotion-dataset/train.jsonl", lines=True)
validation_df = pd.read_json("./emotion-dataset/validation.jsonl", lines=True)
# join the train, validation and test dataframes with the id2label dataframe to get the label_string column
train_df = train_df.merge(label_df, on="label", how="left")
validation_df = validation_df.merge(label_df, on="label", how="left")
test_df = test_df.merge(label_df, on="label", how="left")
# show the first 5 rows of the train dataframe
train_df.head()

Unnamed: 0,text,label,label_string
0,i didnt feel humiliated,0,sadness
1,i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake,0,sadness
2,im grabbing a minute to post i feel greedy wrong,3,anger
3,i am ever feeling nostalgic about the fireplace i will know that it is still on the property,2,love
4,i am feeling grouchy,3,anger


### Save data
Now the string labels are applied, let's save the dataset.

For the fine tuning tutorial demonstration purposes, you're going to save a smaller dataset containing 10% of the original dataset into `train`, `validation` and `test` files. **Keep in mind that the fine tuned model will have lower accuracy, hence it should not be put to real-world use.**

In [22]:
# save 10% of the rows from the train, validation and test dataframes into files with small_ prefix in the ./emotion-dataset folder
frac = 1
train_df.sample(frac=frac).to_json(
    "./emotion-dataset/small_train.jsonl", orient="records", lines=True
)
validation_df.sample(frac=frac).to_json(
    "./emotion-dataset/small_validation.jsonl", orient="records", lines=True
)
test_df.sample(frac=frac).to_json(
    "./emotion-dataset/small_test.jsonl", orient="records", lines=True
)

## Configure and submit the fine tuning job using the model and data as inputs

To submit a fine tuning job using a foundation model, you're going to build a pipeline. There are two reasons for using a pipeline. 

First, since you're fine tuning an existing foundation model, you may not have access to the training code. Azure Machine Learning can generate the training code, which is hosted in the `azureml` registry, which requires using a pipeline. Second, fine tuning job requires several steps, including tokenization, converting English text to numeric representation, passing tokenized data to fine tune, and evaluation. It would make sense to componentize these discrete steps, building a pipeline.

You're going to create a job that uses the `text-classification` pipeline component. 

This tutorial is fine tuning a model from the `azureml` system registery.  If you instead want to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, you can either [import](https://github.com/Azure/azureml-examples) the model or use the `huggingface_id` parameter to instruct the components to pull the model directly from [HuggingFace](https://huggingface.co). 

In [23]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import CommandComponent, PipelineComponent, Job, Component
from azure.ai.ml import PyTorchDistribution, Input

# fetch the pipeline component
pipeline_component_func = registry_ml_client.components.get(
    name="text_classification_pipeline", label="latest"
)


# define the pipeline job
@pipeline()
def create_pipeline():
    text_classification_pipeline = pipeline_component_func(
        # specify the foundation model available in the azureml system registry id identified in step #3
        mlflow_model_path=foundation_model.id,
        # huggingface_id = 'bert-base-uncased', # if you want to use a huggingface model, uncomment this line and comment the above line
        compute_model_import=compute_cluster,
        compute_preprocess=compute_cluster,
        compute_finetune=compute_cluster,
        compute_model_evaluation=compute_cluster,
        # map the dataset splits to parameters
        train_file_path=Input(
            type="uri_file", path="./emotion-dataset/small_train.jsonl"
        ),
        validation_file_path=Input(
            type="uri_file", path="./emotion-dataset/small_validation.jsonl"
        ),
        test_file_path=Input(
            type="uri_file", path="./emotion-dataset/small_test.jsonl"
        ),
        evaluation_config=Input(
            type="uri_file", path="./text-classification-config.json"
        ),
        # The following parameters map to the dataset fields
        sentence1_key="text",
        label_key="label_string",
        # Training settings
        number_of_gpu_to_use_finetuning=gpus_per_node,  # set to the number of GPUs available in the compute
        num_train_epochs=3,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        learning_rate=2e-5,
        metric_for_best_model="f1_macro",
    )
    return {
        # map the output of the fine tuning job to the output of pipeline job so that we can easily register the fine tuned model
        # registering the model is required to deploy the model to an online or batch endpoint
        "trained_model": text_classification_pipeline.outputs.mlflow_model_folder
    }


pipeline_object = create_pipeline()

# don't use cached results from previous jobs
pipeline_object.settings.force_rerun = True

Now the pipeline job is configured, submit the job.  @@HOW LONG DOES THIS TAKE?@@ DO WE NEED TO WAIT FOR IT TO FINISH?@@

In [24]:
# submit the pipeline job
pipeline_job = workspace_ml_client.jobs.create_or_update(
    pipeline_object, experiment_name=experiment_name
)
# wait for the pipeline job to complete
workspace_ml_client.jobs.stream(pipeline_job.name)

Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[32mUploading small_train.jsonl[32m (< 1 MB): 100%|██████████| 2.27M/2.27M [00:02<00:00, 983kB/s]
[39m

[32mUploading small_validation.jsonl[32m (< 1 MB): 100%|██████████| 280k/280k [00:00<00:00, 636kB/s]
[39m

[32mUploading small_test.jsonl[32m (< 1 MB): 100%|██████████| 283k/283k [00:00<00:00, 707kB/s]
[39m

[32mUploading text-classification-config.json[32m (< 1 MB): 100%|██████████| 768/768 [00:00<00:00, 7.97kB/s]
[39m



RunId: clever_roti_y385lbt48h
Web View: https://ml.azure.com/runs/clever_roti_y385lbt48h?wsid=/subscriptions/ed2cab61-14cc-4fb3-ac23-d72609214cfd/resourcegroups/training_rg/workspaces/swatig_ws

Streaming logs/azureml/executionlogs.txt

[2023-06-08 20:42:11Z] Submitting 1 runs, first five are: 3fd3b58b:7520ea3e-4b28-408a-b0a1-68ba7b4bf586
[2023-06-08 21:08:17Z] Completing processing run id 7520ea3e-4b28-408a-b0a1-68ba7b4bf586.

Execution Summary
RunId: clever_roti_y385lbt48h
Web View: https://ml.azure.com/runs/clever_roti_y385lbt48h?wsid=/subscriptions/ed2cab61-14cc-4fb3-ac23-d72609214cfd/resourcegroups/training_rg/workspaces/swatig_ws



## Review training and evaluation metrics

Now the pipeline job is submitted, you can view the job in Azure Machine Learning studio to analyze logs, metrics, and outputs of jobs. This way, you can create custom charts and compare metrics across different fine tuning jobs. See [View jobs/runs information in the studio](https://learn.microsoft.com/azure/machine-learning/how-to-log-view-metrics?tabs=interactive#view-jobsruns-information-in-the-studio) to learn more about job metrics.

You may also want to programmatically log the same information so that it can be used by other services. In that case, use the following MLflow code, which is the recommended client for logging and querying metrics.

In [25]:
import mlflow, json

mlflow_tracking_uri = workspace_ml_client.workspaces.get(
    workspace_ml_client.workspace_name
).mlflow_tracking_uri
mlflow.set_tracking_uri(mlflow_tracking_uri)
# concat 'tags.mlflow.rootRunId=' and pipeline_job.name in single quotes as filter variable
filter = "tags.mlflow.rootRunId='" + pipeline_job.name + "'"
runs = mlflow.search_runs(
    experiment_names=[experiment_name], filter_string=filter, output_format="list"
)
training_run = None
evaluation_run = None
# get the training and evaluation runs.
# using a hacky way till 'Bug 2320997: not able to show eval metrics in FT notebooks - mlflow client now showing display names' is fixed
for run in runs:
    # check if run.data.metrics.epoch exists
    if "epoch" in run.data.metrics:
        training_run = run
    # else, check if run.data.metrics.accuracy exists
    elif "accuracy" in run.data.metrics:
        evaluation_run = run

In [26]:
if training_run:
    print("Training metrics:\n\n")
    print(json.dumps(training_run.data.metrics, indent=2))
else:
    print("No Training job found")

Training metrics:


{
  "loss": 0.1013,
  "learning_rate": 0.0,
  "epoch": 3.0,
  "eval_loss": 0.1942712813615799,
  "eval_accuracy": 0.9405,
  "eval_f1_macro": 0.9139922655171362,
  "eval_mcc": 0.9217377761763651,
  "eval_precision_macro": 0.9231576560328755,
  "eval_recall_macro": 0.906752103717829,
  "eval_runtime": 6.4123,
  "eval_samples_per_second": 311.899,
  "eval_steps_per_second": 38.987,
  "train_runtime": 456.156,
  "train_samples_per_second": 105.227,
  "train_steps_per_second": 13.153,
  "total_flos": 1.2629784051843072e+16,
  "train_loss": 0.2425000457763672
}


In [27]:
if evaluation_run:
    print("Evaluation metrics:\n\n")
    print(json.dumps(evaluation_run.data.metrics, indent=2))
else:
    print("No Evaluation job found")

Evaluation metrics:


{
  "average_precision_score_macro": 0.9616133732880718,
  "AUC_macro": 0.993413475301303,
  "recall_score_macro": 0.8795230156675254,
  "average_precision_score_binary": NaN,
  "average_precision_score_micro": 0.9868280556869524,
  "AUC_binary": NaN,
  "recall_score_micro": 0.934,
  "AUC_micro": 0.9961019499999999,
  "norm_macro_recall": 0.8554276188010306,
  "average_precision_score_weighted": 0.983899560696251,
  "weighted_accuracy": 0.9523587534732753,
  "precision_score_micro": 0.934,
  "f1_score_binary": NaN,
  "precision_score_macro": 0.9040907619133997,
  "f1_score_micro": 0.934,
  "precision_score_weighted": 0.9336396752210118,
  "f1_score_weighted": 0.9332863578432629,
  "recall_score_binary": NaN,
  "matthews_correlation": 0.9126364916099332,
  "log_loss": 0.22058146509919424,
  "accuracy": 0.934,
  "precision_score_binary": NaN,
  "balanced_accuracy": 0.8795230156675254,
  "AUC_weighted": 0.9959533664256791,
  "f1_score_macro": 0.8893683288869473,
  "r

## Register the fine tuned model with the workspace

Register the model from the output of the fine tuning job. There are several benefits to register a fine tuned model to the Azure Machine Learning platform.
 
- **Versioning & Traceability**: Tracks lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code.

- **Reusability**: Once a model is registered, it can be reused across different experiments, pipelines, and deployments. This eliminates the need to recreate the model each time and saves time and effort.

- **Collaboration**: Registered models can be easily shared with other team members, making it easier to collaborate on machine learning projects. This enables team members to work together on the same model and share their insights and feedback. 

- **Deployment**: Registered models can be easily deployed to production environments, making it easier to integrate machine learning models into business applications. Azure Machine Learning provides several deployment options, including Azure Kubernetes Service, Azure Container Instances, and Azure Functions. 

- **Monitoring**: Registered models can be monitored and evaluated over time to ensure that they continue to perform well in production environments. This enables you to detect and address issues early on and maintain the performance of your machine learning models.

Use the following code to register the fine tuned model. Once registered, you can find the model under the Models tab of Azure Machine Learning studio.

In [28]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# check if the `trained_model` output is available
print("pipeline job outputs: ", workspace_ml_client.jobs.get(pipeline_job.name).outputs)

# fetch the model from pipeline job output - not working, hence fetching from fine tune child job
model_path_from_job = "azureml://jobs/{0}/outputs/{1}".format(
    pipeline_job.name, "trained_model"
)

finetuned_model_name = model_name + "-emotion-detection"
finetuned_model_name = finetuned_model_name.replace("/", "-")
print("path to register model: ", model_path_from_job)
prepare_to_register_model = Model(
    path=model_path_from_job,
    type=AssetTypes.MLFLOW_MODEL,
    name=finetuned_model_name,
    version=timestamp,  # use timestamp as version to avoid version conflict
    description=model_name + " fine tuned model for emotion detection",
)
print("prepare to register model: \n", prepare_to_register_model)
# register the model from pipeline job output
registered_model = workspace_ml_client.models.create_or_update(
    prepare_to_register_model
)
print("registered model: \n", registered_model)

pipeline job outputs:  {'trained_model': <azure.ai.ml.entities._job.pipeline._io.base.PipelineOutput object at 0x7f9b152547f0>}
path to register model:  azureml://jobs/clever_roti_y385lbt48h/outputs/trained_model
prepare to register model: 
 description: bert-base-uncased fine tuned model for emotion detection
name: bert-base-uncased-emotion-detection
path: azureml://jobs/clever_roti_y385lbt48h/outputs/trained_model
properties: {}
tags: {}
type: mlflow_model
version: '1686256902'

registered model: 
 creation_context:
  created_at: '2023-06-08T21:09:14.899595+00:00'
  created_by: Manoj Bableshwar
  created_by_type: User
  last_modified_at: '2023-06-08T21:09:14.899595+00:00'
  last_modified_by: Manoj Bableshwar
  last_modified_by_type: User
description: bert-base-uncased fine tuned model for emotion detection
flavors:
  hftransformersv2:
    code: ''
    hf_config_class: BertConfig
    hf_pretrained_class: BertForSequenceClassification
    hf_tokenizer_class: BertTokenizerFast
    huggi

## Deploy the fine tuned model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model. In this tutorial, you're going to use Managed Online Endpoint API, which handles many backend configurations for you.

Let's start by creating an online endpoint.

In [29]:
import time, sys
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name

online_endpoint_name = "emotion-" + timestamp
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + registered_model.name
    + ", fine tuned model for emotion detection",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

Deploying a model requires a compute resource. In this tutorial, you're going to use `Standard_DS3_v2` which takes about @@HOW LONG@@ minutes to complete the deployment. 

You can also read about [the list of other SKUs supported for deployment](https://learn.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).

In [34]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=registered_model.id,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

Instance type Standard_DS2_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/en-us/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint emotion-1686256902 exists
data_collector is not a known attribute of class <class 'azure.ai.ml._restclient.v2022_02_01_preview.models._models_py3.ManagedOnlineDeployment'> and will be ignored


.........................................................................................................................................

HttpResponseError: (None) ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready
Code: None
Message: ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready
Exception Details:	(None) ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready
	The build log is available in the workspace blob store "swatigws8304006853" under the path "/azureml/ImageLogs/84cb0c35-04d6-4959-8bd1-233100f976c6/build.log"
	Code: None
	Message: ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready
	The build log is available in the workspace blob store "swatigws8304006853" under the path "/azureml/ImageLogs/84cb0c35-04d6-4959-8bd1-233100f976c6/build.log"

## Test the endpoint with sample data

Now the fine tuned model is deployed, we need to test if the model is working properly. You'll first fetch some sample data from the test dataset, and save as a JSON file.

In [None]:
# read ./emotion-dataset/small_test.jsonl into a pandas dataframe
test_df = pd.read_json("./emotion-dataset/small_test.jsonl", lines=True)
# take 10 random samples
test_df = test_df.sample(n=10)
# rebuild index
test_df.reset_index(drop=True, inplace=True)
# rename the label_string column to ground_truth_label
test_df = test_df.rename(columns={"label_string": "ground_truth_label"})
test_df.head(10)

In [None]:
# create a json object with the key as "inputs" and value as a list of values from the text column of the test dataframe
test_df_copy = test_df[["text"]]
test_json = {"input_data": test_df_copy.to_dict("split")}
# save the json object to a file named sample_score.json in the ./emotion-dataset folder
with open("./emotion-dataset/sample_score.json", "w") as f:
    json.dump(test_json, f)

Now we have a sample data, let's test the online endpoint.

In [None]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./emotion-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the response to a pandas dataframe and rename the label column as scored_label
response_df = pd.read_json(response)
response_df = response_df.rename(columns={0: "scored_label"})
response_df.head(10)

In [None]:
# merge the test dataframe and the response dataframe on the index
merged_df = pd.merge(test_df, response_df, left_index=True, right_index=True)
merged_df.head(10)

## Delete the online endpoint
Congratulation! You have completed the foundational model fine tuning tutorial.

Don't forget to delete the online endpoint, else you'll leave the billing meter running for the compute used by the endpoint.

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()