## Translation - Translate english to romanian

This sample shows how to use `translation` components from the `azureml` system registry to fine tune a model to translate english language to romanian language. We then deploy it to an online endpoint for real time inference. The model is trained on tiny sample of the dataset with a small number of epochs to illustrate the fine tuning approach.

### Training data
We will use the [wmt16 (ro-en)](https://huggingface.co/datasets/wmt16) dataset. A copy of this dataset is available in the [wmt16-en-ro-dataset](./wmt16-en-ro-dataset/) folder for easy access. 

### Model
Models that can perform the `translation` task are used here. We will use the `t5-small` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. Optionally, if you need to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, you can either [import](https://github.com/Azure/azureml-examples) the model or use the `huggingface_id` parameter instruct the components to pull the model directly from HuggingFace.  

### Outline
* Setup pre-requisites such as compute.
* Pick a model to fine tune.
* Pick and explore training data.
* Configure the fine tuning job.
* Run the fine tuning job.
* Register the fine tuned model. 
* Deploy the fine tuned model for real time inference.
* Clean up resources.

### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry
* Set an optional experiment name
* Check or create compute. A single GPU node can have multiple GPU cards. For example, in one node of `Standard_ND40rs_v2` there are 8 NVIDIA V100 GPUs while in `Standard_NC12s_v3`, there are 2 NVIDIA V100 GPUs. Refer to the [docs](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) for this information. The number of GPU cards per node is set in the param `gpus_per_node` below. Setting this value correctly will ensure utilization of all GPUs in the node. The recommended GPU compute SKUs can be found [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series) and [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ndv2-series).

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential, ClientSecretCredential
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
        credential,
        subscription_id = "ed2cab61-14cc-4fb3-ac23-d72609214cfd",
        resource_group_name = "training_rg",
        workspace_name =  "train-finetune-dev-workspace"
)

# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml-preview"
registry_ml_client = MLClient(credential, registry_name="azureml-preview")

experiment_name = "translation-wmt16-en-ro"

# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'
compute_cluster = "gpu-cluster-big"
try:
    workspace_ml_client.compute.get(compute_cluster)
except Exception as ex:
    compute = AmlCompute(
        name = compute_cluster, 
        size= "Standard_ND40rs_v2",
        max_instances= 2 # For multi node training set this to an integer value more than 1
    )
    workspace_ml_client.compute.begin_create_or_update(compute).wait()

# This is the number of GPUs in a single node of the selected 'vm_size' compute. 
# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.
# Setting this to more than the number of GPUs will result in an error.
gpus_per_node = 8

# genrating a unique timestamp that can be used for names and versions that need to be unique
timestamp = str(int(time.time())) 

### 2. Pick a foundation model to fine tune

Models that support `translation` tasks are picked to fine tune. You can browse these models in the Model Catalog in the AzureML Studio, filtering by the `translation` task. In this example, we use the `t5-small` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

Note the model id property of the model. This will be passed as input to the fine tuning job. This is also available as the `Asset ID` field in model details page in AzureML Studio Model Catalog. 

In [None]:
model_name = "t5-small"
model_version = "4"
foundation_model=registry_ml_client.models.get(model_name, model_version)
print ("\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(foundation_model.name, foundation_model.version, foundation_model.id))

### 3. Pick the dataset for fine-tuning the model 

A copy of the dataset is available in the [wmt16-en-ro-dataset](./wmt16-en-ro-dataset/) folder. 
* Visualize some data rows. 
* We want this sample to run quickly, so save smaller `train`, `validation` and `test` files containing 20% of the already trimmed rows. This means the fine tuned model will have lower accuracy, hence it should not be put to real-world use. 

> The [download-dataset.py](./wmt16-en-ro-dataset/download-dataset.py) is used to download the wmt16 (ro-en) dataset and transform the dataset into finetune pipeline component consumable format. Also as the dataset is large, hence we here have only part of the dataset.

> **Note** : Some language models have different language codes and hence the column names in the dataset should reflect the same.

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', 0) # set the max column width to 0 to display the full text
# load the train.jsonl, test.jsonl and validation.jsonl files from the ./wmt16-en-ro-dataset/ folder and show first 5 rows
train_df = pd.read_json("./wmt16-en-ro-dataset/train.jsonl", lines=True)
validation_df = pd.read_json("./wmt16-en-ro-dataset/validation.jsonl", lines=True)
test_df = pd.read_json("./wmt16-en-ro-dataset/test.jsonl", lines=True)

In [None]:
# save 20% of the rows from the dataframes into files with small_ prefix in the ./wmt16-en-ro-dataset folder
train_df.sample(frac=0.2).to_json("./wmt16-en-ro-dataset/small_train.jsonl", orient='records', lines=True)
validation_df.sample(frac=0.2).to_json("./wmt16-en-ro-dataset/small_validation.jsonl", orient='records', lines=True)
test_df.sample(frac=0.2).to_json("./wmt16-en-ro-dataset/small_test.jsonl", orient='records', lines=True)

### 4. Submit the fine tuning job using the the model and data as inputs
 
Create the job that uses the `translation` pipeline component. [Learn more]() about all the parameters supported for fine tuning.

In [None]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import CommandComponent, PipelineComponent, Job, Component
from azure.ai.ml import PyTorchDistribution, Input

# fetch the pipeline component
pipeline_component_func = registry_ml_client.components.get(name="translation_pipeline", label="latest")

# define the pipeline job
@pipeline()
def create_pipeline():
    finetuning_job = pipeline_component_func( 

        # specify the foundation model available in the azureml system registry id identified in step #3
        mlflow_model_path = foundation_model.id,
        # huggingface_id = 't5-small', # if you want to use a huggingface model, uncomment this line and comment the above line
        
        compute_model_selector = compute_cluster,
        compute_preprocess = compute_cluster,
        compute_finetune = compute_cluster,
        compute_model_evaluation = compute_cluster,
        # map the dataset splits to parameters
        train_file_path = Input(type="uri_file", path="./wmt16-en-ro-dataset/small_train.jsonl"),
        validation_file_path = Input(type="uri_file", path="./wmt16-en-ro-dataset/small_validation.jsonl"),
        test_file_path = Input(type="uri_file", path="./wmt16-en-ro-dataset/small_test.jsonl"),
        # The following parameters map to the dataset fields
        # source_lang parameter maps to the "en" field in the wmt16 dataset
        source_lang = "en",
        # target_lang parameter maps to the "ro" field in the wmt16 dataset
        target_lang = "ro",
        # training settings
        number_of_gpu_to_use_finetuning = gpus_per_node, # set to the number of GPUs available in the compute
        num_train_epochs = 3,
        learning_rate = 2e-5, 
    )
    return {
        # map the output of the fine tuning job to the output of the pipeline job so that we can easily register the fine tuned model
        # registering the model is required to deploy the model to an online or batch endpoint
        "trained_model": finetuning_job.outputs.mlflow_model_folder
    }

pipeline_object = create_pipeline()

# don't use cached results from previous jobs
pipeline_object.settings.force_rerun = True

Submit the job

In [None]:
# submit the pipeline job
pipeline_job = workspace_ml_client.jobs.create_or_update(pipeline_object, experiment_name=experiment_name)
# wait for the pipeline job to complete
workspace_ml_client.jobs.stream(pipeline_job.name)

### 5. Register the fine tuned model with the workspace

We will register the model from the output of the fine tuning job. This will track lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code.

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
# check if the `trained_model` output is available
print ("pipeline job outputs: ", workspace_ml_client.jobs.get(pipeline_job.name).outputs)

#fetch the model from pipeline job output - not working, hence fetching from fine tune child job
model_path_from_job = ("azureml://jobs/{0}/outputs/{1}".format(pipeline_job.name, "trained_model"))

finetuned_model_name = model_name + "-wmt16-en-ro"
print("path to register model: ", model_path_from_job)
prepare_to_register_model = Model(
    path=model_path_from_job,
    type=AssetTypes.MLFLOW_MODEL,
    name=finetuned_model_name,
    version=timestamp, # use timestamp as version to avoid version conflict
    description=model_name + " fine tuned model for translation wmt16 en to ro"
)
print("prepare to register model: \n", prepare_to_register_model)
#register the model from pipeline job output 
registered_model = workspace_ml_client.models.create_or_update(prepare_to_register_model)
print ("registered model: \n", registered_model)


### 6. Deploy the fine tuned model to an online endpoint
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
import time, sys
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name

online_endpoint_name = "translation-en-ro-src" + timestamp
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for " + registered_model.name + ", fine tuned model for emotion detection",
    auth_mode="key"
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

You can find here the list of SKU's supported for deployment - [Managed online endpoints SKU list](https://learn.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list)

In [None]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=registered_model.id,
    instance_type="Standard_ND40rs_v2",
    instance_count=1,
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

### 7. Test the endpoint with sample data

We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels

In [None]:
# read ./wmt16-en-ro-dataset/small_test.jsonl into a pandas dataframe
import pandas as pd
import json
test_df = pd.read_json("./wmt16-en-ro-dataset/test.jsonl", orient='records', lines=True)
# take 1 random sample 
test_df = test_df.sample(n=1)
# rebuild index
test_df.reset_index(drop=True, inplace=True)
test_df.head(1)

In [None]:
# create a json object with the key as "inputs" and value as a list of values from the en column of the test dataframe
test_json = {"inputs": test_df["en"].tolist()}
# save the json object to a file named sample_score.json in the ./wmt16-en-ro-dataset folder
with open("./wmt16-en-ro-dataset/sample_score.json", "w") as f:
    json.dump(test_json, f)

> If the input data is long or number of records are too may, you may run into the following error: "Failed to test real-time endpoint
upstream request timeout Please check this guide to understand why this error code might have been returned [https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints#http-status-codes]". Try to submit smaller and fewer inputs.

In [None]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response=workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./wmt16-en-ro-dataset/sample_score.json"
)
print("raw response: \n", response, "\n")
# convert the response to a pandas dataframe
response_df = pd.read_json(response)
response_df.head(1)

In [None]:
# merge the test dataframe and the response dataframe on the index
merged_df = pd.merge(test_df, response_df, left_index=True, right_index=True)
merged_df.head(1)

### 8. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()