## Text Generation P-tuning Task Sample

This sample shows how to use `nemo-ptune`, `nemo-model-prediction` and `nemo-compute-metrics` component from the `nvidia-ai` system registry to p-tune a nemo model to summarize a dialog between 2 people using samsum dataset. 

### Training data
We will use the [samsum](https://huggingface.co/datasets/samsum) dataset. This dataset is intended to summarize dialogues between 2 people. with this notebook we will summarize the dialogues and calculate bleu and rouge scores for the summarized text vs provided ground_truth summaries

### Model
We will use the `Nemotron-3-8B-4k` model to show how user can p-tune a model for text-generation task. If you opened this notebook from a specific model card, remember to replace the specific model name.

### Outline
* Setup pre-requisites such as compute.
* Pick a model to p-tune.
* Pick and explore training data.
* Configure the p-tuning job.
* Run the pipeline job which will do ptuning , evaluation and compute the metrics.
* Register the p-tuned model. 


### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `nvidia-ai` system registry
* Set an optional experiment name
* Check or create compute. Supported SKUs for pTuning: `Standard_ND96asr_v4`, `Standard_ND96amsr_A100_v4`, `Standard_ND96amsr_v4`, `Standard_NC24ads_A100_v4`

Install dependencies by running below cell. This is not an optional step if running in a new environment.

In [None]:
%pip install azure-ai-ml
%pip install azure-identity
%pip install datasets==2.9.0
%pip install py7zr

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)
from azure.ai.ml.entities import AmlCompute
import time

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    workspace_ml_client = MLClient.from_config(credential=credential)
except:
    workspace_ml_client = MLClient(
        credential,
        # subscription_id="<SUBSCRIPTION_ID>",
        # resource_group_name="<RESOURCE_GROUP>",
        # workspace_name="<WORKSPACE_NAME>",
        subscription_id="9ec1d932-0f3f-486c-acc6-e7d78b358f9b",
        resource_group_name="amlhuggingface",
        workspace_name="OSSTesting",
    )

# the models, p tuning pipelines and environments are available in the AzureML system registry, "nvidia-ai"
registry_ml_client = MLClient(credential, registry_name="nvidia-ai")

experiment_name = "text-generation-samsum"

# generating a unique timestamp that can be used for names and versions that need to be unique
timestamp = str(int(time.time()))

### 2. Pick a Nemotron-3 model to p-tune

We need to ptune the model for our specific purpose in order to use it. You can browse these models in the Model Catalog in the AzureML Studio, filtering by the `text-generation` task. In this example, we use the `Nemotron-3-8B-4k` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

Note the model id property of the model. This will be passed as input to the p-tuning job. This is also available as the `Asset ID` field in model details page in AzureML Studio Model Catalog. 

In [None]:
model_name = "Nemotron-3-8B-4k"
foundation_model = registry_ml_client.models.get(model_name, label="latest")
print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(
        foundation_model.name, foundation_model.version, foundation_model.id
    )
)

### 3. Create a compute to be used with the job

The p-tune job works `ONLY` with `GPU` compute. The size of the compute depends on how big the model is and in most cases it becomes tricky to identify the right compute for the job. In this cell, we guide the user to select the right compute for the job.

`NOTE1` The computes listed below work with the most optimized configuration. Any changes to the configuration might lead to Cuda Out Of Memory error. In such cases, try to upgrade the compute to a bigger compute size.

`NOTE2` While selecting the compute_cluster_size below, make sure the compute is available in your resource group. If a particular compute is not available you can make a request to get access to the compute resources.

Supported SKUs for pTuning: `Standard_ND96asr_v4`, `Standard_ND96amsr_A100_v4`, `Standard_ND96amsr_v4`, `Standard_NC24ads_A100_v4`


In [None]:
import ast

if "computes_allow_list" in foundation_model.tags:
    computes_allow_list = ast.literal_eval(
        foundation_model.properties["finetune-recommended-sku"]
    )  # convert string to python list
    print(f"Please create a compute from the above list - {computes_allow_list}")
else:
    computes_allow_list = None
    print("Computes allow list is not part of model properties")

In [None]:
# If you have a specific compute size to work with change it here. By default we use the 96 x A100 compute from the above list
compute_cluster_size = "Standard_ND96amsr_A100_v4"

# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'
compute_cluster = "ghyadav-westus-a100"

try:
    compute = workspace_ml_client.compute.get(compute_cluster)
    print("The compute cluster already exists! Reusing it for the current run")
except Exception as ex:
    print(
        f"Looks like the compute cluster doesn't exist. Creating a new one with compute size {compute_cluster_size}!"
    )
    try:
        print("Attempt #1 - Trying to create a dedicated compute")
        compute = AmlCompute(
            name=compute_cluster,
            size=compute_cluster_size,
            tier="Dedicated",
            max_instances=2,  # For multi node training set this to an integer value more than 1
        )
        workspace_ml_client.compute.begin_create_or_update(compute).wait()
    except Exception as e:
        try:
            print(
                "Attempt #2 - Trying to create a low priority compute. Since this is a low priority compute, the job could get pre-empted before completion."
            )
            compute = AmlCompute(
                name=compute_cluster,
                size=compute_cluster_size,
                tier="LowPriority",
                max_instances=2,  # For multi node training set this to an integer value more than 1
            )
            workspace_ml_client.compute.begin_create_or_update(compute).wait()
        except Exception as e:
            print(e)
            raise ValueError(
                f"WARNING! Compute size {compute_cluster_size} not available in workspace"
            )


# Sanity check on the created compute
compute = workspace_ml_client.compute.get(compute_cluster)
if compute.provisioning_state.lower() == "failed":
    raise ValueError(
        f"Provisioning failed, Compute '{compute_cluster}' is in failed state. "
        f"please try creating a different compute"
    )

if computes_allow_list is not None:
    computes_allow_list_lower_case = [x.lower() for x in computes_allow_list]
    if compute.size.lower() not in computes_allow_list_lower_case:
        raise ValueError(
            f"VM size {compute.size} is not in the allow-listed computes for finetuning"
        )
else:
    # Computes with K80 GPUs are not supported
    unsupported_gpu_vm_list = [
        "standard_nc6",
        "standard_nc12",
        "standard_nc24",
        "standard_nc24r",
    ]
    if compute.size.lower() in unsupported_gpu_vm_list:
        raise ValueError(
            f"VM size {compute.size} is currently not supported for finetuning"
        )


# This is the number of GPUs in a single node of the selected 'vm_size' compute.
# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.
# Setting this to more than the number of GPUs will result in an error.
gpu_count_found = False
workspace_compute_sku_list = workspace_ml_client.compute.list_sizes()
available_sku_sizes = []
for compute_sku in workspace_compute_sku_list:
    available_sku_sizes.append(compute_sku.name)
    if compute_sku.name.lower() == compute.size.lower():
        gpus_per_node = compute_sku.gpus
        gpu_count_found = True
# if gpu_count_found not found, then print an error
if gpu_count_found:
    print(f"Number of GPU's in compute {compute.size}: {gpus_per_node}")
else:
    raise ValueError(
        f"Number of GPU's in compute {compute.size} not found. Available skus are: {available_sku_sizes}."
        f"This should not happen. Please check the selected compute cluster: {compute_cluster} and try again."
    )

### 4. Pick the dataset for p-tuning the model

We use the [samsum](https://huggingface.co/datasets/samsum) dataset. The next few cells show basic data preparation for p tuning:
* Visualize some data rows
* Preprocess the data and format it in required format. This is an important step for performing text generation as we add the required sequences/separators in the data. This is how we repurpose the text-generation task to any specific task like summarization, translation, text-completion, etc.
* While ptuning, input data needs to be a 2 column data, first column should contain training data and second column should contain ground-truth
* bos and eos tokens are added to the data by ptuning pipeline, you do not need to add it explicitly 
* We want this sample to run quickly, so save smaller `train`, `validation` and `test` files containing 10% of the original. This means the p-tuned model will have lower accuracy, hence it should not be put to real-world use. 

##### Here is an example of how the data should look like

text generation requires the training data to include at least 2 fields – one for ‘text’ and ‘ground_truth’ like in this example. The below examples are from Samsum dataset. 

Original dataset:

| dialogue (text) | summary (ground_truth) |
| :- | :- |
| Eric: MACHINE!\r\nRob: That's so gr8!\r\nEric: I know! And shows how Americans see Russian ;)\r\nRob: And it's really funny!\r\nEric: I know! I especially like the train part!\r\nRob: Hahaha! No one talks to the machine like that!\r\nEric: Is this his only stand-up?\r\nRob: Idk. I'll check.\r\nEric: Sure.\r\nRob: Turns out no! There are some of his stand-ups on youtube.\r\nEric: Gr8! I'll watch them now!\r\nRob: Me too!\r\nEric: MACHINE!\r\nRob: MACHINE!\r\nEric: TTYL?\r\nRob: Sure :) | Eric and Rob are going to watch a stand-up on youtube. | 
| Will: hey babe, what do you want for dinner tonight?\r\nEmma:  gah, don't even worry about it tonight\r\nWill: what do you mean? everything ok?\r\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\r\nWill: Well what time will you be home?\r\nEmma: soon, hopefully\r\nWill: you sure? Maybe you want me to pick you up?\r\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home. \r\nWill: Alright, love you. \r\nEmma: love you too. | Emma will be home soon and she will let Will know. | 

Formatted dataset the user might pass:

| text (text) | summary (ground_truth) |
| :- | :- |
| Summarize this dialog:\nEric: MACHINE!\r\nRob: That's so gr8!\r\nEric: I know! And shows how Americans see Russian ;)\r\nRob: And it's really funny!\r\nEric: I know! I especially like the train part!\r\nRob: Hahaha! No one talks to the machine like that!\r\nEric: Is this his only stand-up?\r\nRob: Idk. I'll check.\r\nEric: Sure.\r\nRob: Turns out no! There are some of his stand-ups on youtube.\r\nEric: Gr8! I'll watch them now!\r\nRob: Me too!\r\nEric: MACHINE!\r\nRob: MACHINE!\r\nEric: TTYL?\r\nRob: Sure :)\n---\nSummary:\n | Eric and Rob are going to watch a stand-up on youtube. | 
| Summarize this dialog:\nWill: hey babe, what do you want for dinner tonight?\r\nEmma:  gah, don't even worry about it tonight\r\nWill: what do you mean? everything ok?\r\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\r\nWill: Well what time will you be home?\r\nEmma: soon, hopefully\r\nWill: you sure? Maybe you want me to pick you up?\r\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home. \r\nWill: Alright, love you. \r\nEmma: love you too. \n---\nSummary:\n | Emma will be home soon and she will let Will know. | 
 

In [None]:
# download the dataset using the helper script. This needs datasets library: https://pypi.org/project/datasets/
import os
from datasets import load_dataset, get_dataset_split_names

dataset_dir = "samsum-dataset"
dataset_name = "samsum"
# create the download directory if it does not exist
if not os.path.exists(dataset_dir):
    os.makedirs(dataset_dir)


# import hugging face datasets library


for split in get_dataset_split_names(dataset_name):
    # load the split of the dataset
    dataset = load_dataset(dataset_name, split=split)
    # save the split of the dataset to the download directory as json lines file
    dataset.to_json(os.path.join(dataset_dir, f"{split}.jsonl"))
    # print dataset features

In [None]:
# load the ./samsum-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
df = pd.read_json("./samsum-dataset/train.jsonl", lines=True)
df.head()

In [None]:
# create a function to preprocess the dataset in desired format


def get_preprocessed_samsum(df):
    prompt = f"Summarize this dialog:\n{{}}\n---\nSummary:\n"

    df["text"] = df["dialogue"].map(prompt.format)
    df = df.drop(columns=["dialogue", "id"])
    df = df[["text", "summary"]]

    return df

In [None]:
# load test.jsonl, train.jsonl and validation.jsonl form the ./samsum-dataset folder into pandas dataframes
test_df = pd.read_json("./samsum-dataset/test.jsonl", lines=True)
train_df = pd.read_json("./samsum-dataset/train.jsonl", lines=True)
validation_df = pd.read_json("./samsum-dataset/validation.jsonl", lines=True)
# map the train, validation and test dataframes to preprocess function
train_df = get_preprocessed_samsum(train_df)
validation_df = get_preprocessed_samsum(validation_df)
test_df = get_preprocessed_samsum(test_df)
# show the first 5 rows of the train dataframe
train_df.head()

In [None]:
# save 10% of the rows from the train, validation and test dataframes into files with small_ prefix in the ./samsum-dataset folder
frac = 1
os.makedirs("./samsum-dataset/train", exist_ok=True)
os.makedirs("./samsum-dataset/val", exist_ok=True)
os.makedirs("./samsum-dataset/test", exist_ok=True)
train_df.sample(frac=frac).to_json(
    "./samsum-dataset/train/small_train.json", orient="records", lines=True
)
validation_df.sample(frac=frac).to_json(
    "./samsum-dataset/val/small_validation.json", orient="records", lines=True
)
test_df.sample(frac=frac).to_json(
    "./samsum-dataset/test/small_test.jsonl", orient="records", lines=True
)

### 5. Submit the p-tuning and evaluation job using the the model and data as inputs
 
Create the job that uses the `nemo p-tuning`, `model_prediction` and `compute_metric` components from `nvidia-ai` registry.

Define p-tune parameters

In [None]:
# Training parameters
training_parameters = dict(
    compute=compute_cluster,  # name of the compute cluster
    ptuned_model_name="Nemotron-3-8B-4k-ft",  # name of the ptuned model
    input_column_name="text",  # name of the input column in the dataset
    target_column_name="summary",  # name of the target column in the dataset
    max_steps=50,  # max number of steps to train for
    num_nodes=1,  # number of nodes to train on
    learning_rate=1e-5,  # learning rate for the optimizer
    concat_sampling_probs=1.0,  # probability of sampling from the concatenated dataset
    eval_dataset_input_column_name="text",  # name of the input column in the eval dataset
)
print(f"The following training parameters are enabled - {training_parameters}")

In [None]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import Input

# fetch the pipeline component
component = registry_ml_client.components.get(
    name="nemo_peft_text_generation_evaluation", label="latest"
)


# define the pipeline job
@pipeline()
def create_pipeline():
    text_generation_pipeline = component(
        # specify the foundation model available in the azureml system registry id identified in step #3
        model_path=foundation_model.id,
        # map the dataset splits to parameters
        train_dataset_path=Input(  # path to the train dataset
            type="uri_folder", path="./samsum-dataset/train/"
        ),
        valid_dataset_path=Input(  # path to the validation dataset
            type="uri_folder", path="./samsum-dataset/val/"
        ),
        dataset_path=Input(  # path to the test dataset
            type="uri_folder", path="./samsum-dataset/test/"
        ),
        **training_parameters,
    )
    return {
        # map the output of the p-tuning job to the output of pipeline job so that we can easily register the p-tuned model
        # registering the model is required to deploy the model to an online or batch endpoint
        "ptuned_model": text_generation_pipeline.outputs.ptuned_model,
        "predicted_output": text_generation_pipeline.outputs.predicted_output,
        "evaluation_result": text_generation_pipeline.outputs.evaluation_result,
    }


pipeline_object = create_pipeline()

# don't use cached results from previous jobs
pipeline_object.settings.force_rerun = False

# set continue on step failure to False
pipeline_object.settings.continue_on_step_failure = False

pipeline_object.default_compute = compute_cluster

Submit the job

In [None]:
# submit the pipeline job
pipeline_job = workspace_ml_client.jobs.create_or_update(
    pipeline_object, experiment_name=experiment_name
)
# wait for the pipeline job to complete
workspace_ml_client.jobs.stream(pipeline_job.name)

### 6. Register the p-tuned model with the workspace

We will register the model from the output of the p-tuning job. This will track lineage between the p-tuned model and the p-tuning job. The  p-tuning job, further, tracks lineage to the foundation model, data and training code.

In [None]:
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# check if the `trained_model` output is available
print("pipeline job outputs: ", workspace_ml_client.jobs.get(pipeline_job.name).outputs)

# fetch the model from pipeline job output - not working, hence fetching from fine tune child job
model_path_from_job = "azureml://jobs/{0}/outputs/{1}".format(
    pipeline_job.name, "ptuned_model"
)

finetuned_model_name = training_parameters["ptuned_model_name"]

print("path to register model: ", model_path_from_job)
prepare_to_register_model = Model(
    path=model_path_from_job,
    type=AssetTypes.TRITON_MODEL,
    name=finetuned_model_name,
    version=timestamp,  # use timestamp as version to avoid version conflict
    description=model_name + " p-tuned model for samsum textgen",
)
print("prepare to register model: \n", prepare_to_register_model)
# register the model from pipeline job output
registered_model = workspace_ml_client.models.create_or_update(
    prepare_to_register_model
)
print("registered model: \n", registered_model)