## Fine Tuning the T5 model with Azure ML using Azure Container for PyTorch leveraging State of art technologies to generate news headlines style summary 

This sample shows how to fine tune T5 model to generate summary of a news article. We then deploy it to an online endpoint for real time inference. The model is trained on tiny sample of the dataset with a small number of epochs to illustrate the fine tuning approach.

### Requirements/Prerequisites
- An Azure acoount with active subscription [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- Azure Machine Learning workspace [Configure workspace](../../../configuration.ipynb) 
- Python Environment
- Install Azure ML Python SDK Version 2

### Learning Objectives
- Fine tune T5 small model for Summarization task with Azure ML 
- Leverage state of art ACPT environment with accelerators
- Increase training efficiency using Deepspeed and Onnxruntime
- Model Evaluation
- Register the model with AzureML
- Deploy and inference using MIR and onnxruntime


### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry
* Set an optional experiment name
* Check or create compute. A single GPU node can have multiple GPU cards. For example, in one node of `Standard_ND40rs_v2` there are 8 NVIDIA V100 GPUs while in `Standard_NC12s_v3`, there are 2 NVIDIA V100 GPUs. Refer to the [docs](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) for this information. The number of GPU cards per node is set in the param `gpus_per_node` below. Setting this value correctly will ensure utilization of all GPUs in the node. The recommended GPU compute SKUs can be found [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series) and [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ndv2-series).

##### 1.1 Install dependencies 
* Run below cell. This is not an optional step if running in a new environment.

In [None]:
%pip install azure-ai-ml
%pip install azure-identity
%pip install datasets==2.9.0
%pip install mlflow
%pip install azureml-mlflow

##### 1.2 Connect to the workspace

Connect to your Azure Machine Learning workspace. The [Azure Machine Learning workspace](concept-workspace.md) is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create when you use Azure Machine Learning.

We're using `DefaultAzureCredential` to get access to the workspace. This credential should be capable of handling most Azure SDK authentication scenarios.

If `DefaultAzureCredential` doesn't work for you, see [`azure-identity reference documentation`](/python/api/azure-identity/azure.identity) or [`Set up authentication`](how-to-setup-authentication.md?tabs=sdk) for more available credentials.

In [None]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
    ClientSecretCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    ml_client = MLClient.from_config(credential=credential)
except:
    ml_client = MLClient(
        credential,
        subscription_id="<SUBSCRIPTION_ID>",
        resource_group_name="<RESOURCE_GROUP>",
        workspace_name="<WORKSPACE_NAME>",
    )



##### 1.3 Create a compute resource to run the job

Azure Machine Learning needs a compute resource to run a job. This resource can be single or multi-node machines with Linux or Windows OS.
In the following example script, we provision a Standard_ND40rs_v2 and create an Azure Machine Learning compute.

In [None]:
experiment_name = "summarization-news-summary"

# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'
compute_cluster = "T5trainingCompute"
try:
    compute = ml_client.compute.get(compute_cluster)
except Exception as ex:
    compute = AmlCompute(
        name=compute_cluster,
        size="Standard_ND40rs_v2",
        max_instances=2,  # For multi node training set this to an integer value more than 1
    )
    ml_client.compute.begin_create_or_update(compute).wait()


### 3. Pick the dataset for fine-tuning the model

> The [CNN DailyMail](https://huggingface.co/datasets/cnn_dailymail) dataset is larger than 1GB when uncompressed. The [download-dataset.py](./news-summary-dataset/download-dataset.py) has supports downloading a smaller fraction of the dataset. 

We want this sample to run quickly, so a copy of the fraction of dataset is used for fine tuning job.This means the fine tuned model will have lower accuracy, hence it should not be put to real-world use. 
* Visualize some data rows. 

In [None]:
dataset_name = "cnn_dailymail"
dataset_config_name = "3.0.0"

from datasets import load_dataset
raw_datasets = load_dataset(
        dataset_name,
        dataset_config_name,
)
raw_datasets["train"].column_names
raw_datasets["train"].to_pandas().head(10)

### 4. Create Custom Environment using Azure Container for Pytorch

> We will be creating a custom environment using existing ACPT curated environment consisting of state of art technologies.

In [None]:
Env_Name = "ACPT-T5"
from azure.ai.ml.entities import Environment, BuildContext
env_docker_context = Environment(
    build=BuildContext(path="environment/context"),
    name=Env_Name,
    description="Environment created from a Docker context.",
)
ml_client.environments.create_or_update(env_docker_context)

### 4. Finetune T5 small model for Summarization task

Models that support `translation` tasks are good candidates to fine tune for `summarization`. In this example, we use the `t5-small` model for summarization task.

In [None]:
from azure.ai.ml import command, Input, Output
from azure.ai.ml.entities import Data, JobService
from azure.ai.ml.constants import AssetTypes

job = command(
    code=".",
    command="python train_summarization_deepspeed_optum.py --model_name_or_path t5-small --dataset_name cnn_dailymail --dataset_config '3.0.0' \
        --do_train \
        --num_train_epochs=1 \
        --per_device_train_batch_size=16 \
        --per_device_eval_batch_size=16  \
        --output_dir outputs \
        --overwrite_output_dir \
        --fp16 \
        --deepspeed ds_config.json \
        --max_train_samples=10 \
        --max_eval_samples=10 \
        --optim adamw_ort_fused",
    compute="T5TrainingCompute",
    environment="acpt-t5@latest",
    instance_count=1,  
    distribution={
        "type": "PyTorch",
        "process_count_per_instance": 8,
    },
) # basic environment comes with my workspace

job = ml_client.jobs.create_or_update(job)
job.studio_url

### 6. Register the fine tuned model with the workspace

We will register the model from the output of the fine tuning job. This will track lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code.

To register the example model, follow these steps:

1. Go to the [Azure Machine Learning studio](https://ml.azure.com).
1. In the left navigation bar, select the **Models** page.
1. Select **Register**, and then choose **From local files**.
1. Select __Unspecified type__ for the __Model type__.
1. Select __Browse__, and choose __Browse folder__ and point to folder containing model and weights
1. Select __Next__ after the folder upload is completed.
1. Enter a friendly __Name__ for the model. The steps in this article assume the model is named `model-1`.
1. Select __Next__, and then __Register__ to complete registration.

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

timestamp = str(int(time.time()))
model_name = "T5Model"

#Onnx model registration
modelpath = "azureml://jobs/{jobname}/outputs/artifacts/outputs/onnx".format(jobname = job.name)
cloud_model = Model(
    path=modelpath,
    name=model_name+"_onnx",
    type=AssetTypes.CUSTOM_MODEL,
    description="Model created from cloud path.",
    version=timestamp,
)
ml_client.models.create_or_update(cloud_model)

#MLFlow model registration
mlflow_modelpath = "azureml://jobs/{jobname}/outputs/artifacts/outputs/mlflow".format(jobname = job.name)
cloud_model = Model(
    path=mlflow_modelpath,
    name=model_name+"_mlflow",
    type=AssetTypes.MLFLOW_MODEL,
    description="Model created from cloud path.",
    version=timestamp,
)
ml_client.models.create_or_update(cloud_model)

### Model Evaluation

In [None]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes

test_data = "small_test-inference.jsonl"

# fetch the pipeline component
registry = "azureml"
subscription_id = ml_client.subscription_id
resource_group = ml_client.resource_group_name

pipeline_component_func = registry_ml_client.components.get(
    name="model_evaluation_pipeline", label="latest"
)

model = ml_client.models.get(name=model_name+"_mlflow", version = timestamp)

# define the pipeline job
@pipeline()
def evaluation_pipeline(mlflow_model):
    evaluation_job = pipeline_component_func(
        # specify the foundation model available in the azureml system registry or a model from the workspace
        # mlflow_model = Input(type=AssetTypes.MLFLOW_MODEL, path=f"{mlflow_model_path}"),
        mlflow_model=mlflow_model,
        # test data
        test_data=Input(type=AssetTypes.URI_FILE, path=test_data),
        # The following parameters map to the dataset fields
        input_column_names="article",
        label_column_name="highlights",
        # Evaluation settings
        task="text-summarization",
        # config file containing the details of evaluation metrics to calculate
        # evaluation_config=Input(type=AssetTypes.URI_FILE, path="eval-config.json"),
        # config cluster/device job is running on
        # set device to GPU/CPU on basis if GPU count was found
        device="gpu",
    )
    return {"evaluation_result": evaluation_job.outputs.evaluation_result}

In [None]:
# submit the pipeline job for each model that we want to evaluate
# you could consider submitting the pipeline jobs in parallel, provided your cluster has multiple nodes
import time
pipeline_jobs = []


pipeline_object = evaluation_pipeline(
    mlflow_model=Input(type=AssetTypes.MLFLOW_MODEL, path=f"{model.id}"),
)
# don't reuse cached results from previous jobs
pipeline_object.settings.force_rerun = True
pipeline_object.settings.default_compute = compute_cluster
pipeline_object.display_name = f"eval-{model.name}-{timestamp}"
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_object, experiment_name=experiment_name
)
# add model['name'] and pipeline_job.name as key value pairs to a dictionary
pipeline_jobs.append({"model_name": model.name, "job_name": pipeline_job.name})
# wait for the pipeline job to complete
ml_client.jobs.stream(pipeline_job.name)

In [None]:
import mlflow, json
import pandas as pd

mlflow_tracking_uri = ml_client.workspaces.get(
    ml_client.workspace_name
).mlflow_tracking_uri
mlflow.set_tracking_uri(mlflow_tracking_uri)

metrics_df = pd.DataFrame()
for job in pipeline_jobs:
    # concat 'tags.mlflow.rootRunId=' and pipeline_job.name in single quotes as filter variable
    filter = "tags.mlflow.rootRunId='" + job["job_name"] + "'"
    runs = mlflow.search_runs(
        experiment_names=[experiment_name], filter_string=filter, output_format="list"
    )
    # get the compute_metrics runs.
    # using a hacky way till 'Bug 2320997: not able to show eval metrics in FT notebooks - mlflow client now showing display names' is fixed
    for run in runs:
        # else, check if run.data.metrics.accuracy exists
        if "rouge1" in run.data.metrics:
            # get the metrics from the mlflow run
            run_metric = run.data.metrics
            # add the model name to the run_metric dictionary
            run_metric["model_name"] = job["model_name"]
            # convert the run_metric dictionary to a pandas dataframe
            temp_df = pd.DataFrame(run_metric, index=[0])
            # concat the temp_df to the metrics_df
            metrics_df = pd.concat([metrics_df, temp_df], ignore_index=True)

# move the model_name columns to the first column
cols = metrics_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
metrics_df = metrics_df[cols]
metrics_df.head()

### 7. Deploy the fine tuned model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
# Define an endpoint name
endpoint_name = "my-endpoint"

# Example way to define a random name
import datetime

endpoint_name = "endpt-" + datetime.datetime.now().strftime("%m%d%H%M%f")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name, 
    description="this is a endpoint for T5 summarization model",
    auth_mode="key"
)

env = Environment(
    image="mcr.microsoft.com/azureml/curated/acpt-t5:latest",
)

model = ml_client.models.get(name=model_name+"_onnx", version = timestamp)

blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=endpoint_name,
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code=".", scoring_script="score_onnx.py"
    ),
    instance_type="Standard_F16s_v2",
    instance_count=1,
)
ml_client.online_endpoints.begin_create_or_update(endpoint).wait()


In [None]:
ml_client.online_deployments.begin_create_or_update(blue_deployment).wait()
ml_client.begin_create_or_update(endpoint).result()

### Invoke the endpoint to score data by using your model

In [None]:
# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name="blue",
    request_file="payload.json",
)

### 9. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint_name).wait()