# RAFT

In [4]:
! pip install -r ../requirements.txt

[0m

## Synthetic data generation phase

### Generate Q/A/CoT fine-tuning dataset using RAFT from the domain specific documents

In [3]:
ds_name = "vampire-bats"
doc_path = "../sample_data/vampire-bats/"
ds_path = "dataset/vampire-bats_test"
print("Creating dataset: " + ds_name)

Creating dataset: vampire-bats


In [5]:
! python3 ../raft.py \
    --datapath $doc_path \
    --output $ds_path \
    --distractors 3 \
    --doctype pdf \
    --chunk_size 512 \
    --questions 5 \
    --checkpoint-size 1 \
    --system-prompt-key llama \
    --completion_model Meta-Llama-3-70B-Instruct \
    --embedding_model text-embedding-ada-002

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[32m2024-05-17 18:18:23[0m [1;30m INFO[0m [  0%] [34mraft[0m Using checkpoint chunks /workspaces/gorilla/raft/azure-ai-studio-ft/dataset/vampire-bats_test-checkpoints/chunks
[32m2024-05-17 18:18:23[0m [1;30m INFO[0m [  0%] [34mraft[0m Retrieving chunks from ../sample_data/vampire-bats of type pdf
[32m2024-05-17 18:18:24[0m [1;30m INFO[0m [  0%] [34mraft[0m Retrieving chunks from ../sample_data/vampire-bats/bats/Giant golden-crowned flying fox - Wikipedia.pdf using the text-embedding-ada-002 model.
[32m2024-05-17 18:18:25[0m [1;30m INFO[0m [  0%] [34mraft[0m Splitting text into 51 chunks.
[32m2024-05-17 18:18:27[0m [1;30m INFO[0m [  0%] [34mraft[0m Retrieving chunks from ../sample_data/vampire-bats/bats/Desmodus draculae - Wikipedia.pdf using the text-embedding-ada-002 model.
[32m2024-05-17 18:18:30

### Convert generated HuggingFace arrow dataset to JSONL format suitable for fine-tuning

In [158]:
raft_arrow_file = f"{ds_path}/data-00000-of-00001.arrow"
dataset_path = f"{ds_path}-files/{ds_name}-full.jsonl"
dataset_path_hf = f"{ds_path}-files/{ds_name}-hf.full.jsonl"

dataset_path_hf_train = f"{ds_path}-files/{ds_name}-hf.train.jsonl"
dataset_path_hf_valid = f"{ds_path}-files/{ds_name}-hf.valid.jsonl"
dataset_path_hf_eval = f"{ds_path}-files/{ds_name}-hf.eval.jsonl"

dataset_path_ft_train = f"{ds_path}-files/{ds_name}-ft.train.jsonl"
dataset_path_ft_valid = f"{ds_path}-files/{ds_name}-ft.valid.jsonl"
dataset_path_ft_eval = f"{ds_path}-files/{ds_name}-ft.eval.jsonl"


print(f"Reading arrow file {raft_arrow_file}")

Reading arrow file dataset/vampire-bats/data-00000-of-00001.arrow


In [159]:
! python ../format.py \
    --input $raft_arrow_file \
    --output $dataset_path_hf \
    --output-format hf

Generating train split: 1461 examples [00:00, 14295.26 examples/s]
[32m2024-05-16 04:50:35[0m [1;30m INFO[0m [    ] [34mraft[0m Converting arrow file dataset/vampire-bats/data-00000-of-00001.arrow to jsonl hf file dataset/vampire-bats-files/vampire-bats-hf.full.jsonl
Creating json from Arrow format: 100%|████████████| 2/2 [00:00<00:00,  5.54ba/s]


In [160]:
import pandas as pd
pd.set_option("display.max_colwidth", 0)
hf_full_df = pd.read_json(dataset_path_hf, lines=True)
hf_full_df.head(1)

Unnamed: 0,id,type,question,context,oracle_context,cot_answer,instruction
0,seed_task_0,general,"When did Darren Naish publish ""What did giant extinct vampire bats eat?""?","{'sentences': [['(2003). ""Late quaternary bats from Cebada Cave, Chiquibul cave system, Belize"". Caribbean Journal of Science . 39 (1): 23–33. 5. Pardiñas, U. F . J.; Tonni, E. P .', 'Czaplewski, N. J.; Krejca, J.; Miller , T. E.', '(2000). ""A giant vampire (Mammalia, Chiroptera) in the Late Holocene from the Argentinean pampas: paleoenvironmental signiﬁcance"". Palaeogeography , Palaeoclimatology , Palaeoecology .', '(2005). ""Taxonomic status of the cuban vampire bat (Chiroptera: Phyllostomidae: Desmodontinae: Desmodus)"" (http://www .redciencia.cu/geobiblio/paper/2005_Suarez,%20W ._Tax onomic%20Status%20of%20the%20Cuban%20V ampire%20Bat.pdf) (PDF) . Caribbean Journal of Science .']], 'title': [['placeholder_title', 'placeholder_title', 'placeholder_title', 'placeholder_title']]}","101 (4):\n912–928. 7. Naish, Darren (July 14, 2013). ""What did giant extinct vampire bats eat?"" (https://blogs.scientiﬁca\nmerican.com/tetrapod-zoology/what-did-giant-extinct-vampire-bats-eat/) . ScientiﬁcAmerican.com .","To answer the question, we need to identify when Darren Naish published ""What did giant extinct vampire bats eat?"".\n\nThe context provided gives us the necessary information to determine this.\n\nFirst, we look for the part of the context that directly mentions Darren Naish's publication. \nWe find it in the sentence: ##begin_quote##7. Naish, Darren (July 14, 2013). ""What did giant extinct vampire bats eat?"" (https://blogs.scientificamerican.com/tetrapod-zoology/what-did-giant-extinct-vampire-bats-eat/) . ScientiﬁcAmerican.com .##end_quote##.\n\nFrom this sentence, we understand that the publication ""What did giant extinct vampire bats eat?"" was written by Darren Naish, and the date of publication is explicitly mentioned as July 14, 2013.\n\nTherefore, based on the context provided, we can conclude that Darren Naish published ""What did giant extinct vampire bats eat?"" on July 14, 2013.\n\n<ANSWER>: July 14, 2013","<DOCUMENT>(2003). ""Late quaternary bats from Cebada Cave,\nChiquibul cave system, Belize"". Caribbean Journal of Science . 39 (1): 23–33. 5. Pardiñas, U. F . J.; Tonni, E. P .</DOCUMENT>\n<DOCUMENT>Czaplewski, N. J.; Krejca, J.; Miller , T. E.</DOCUMENT>\n<DOCUMENT>(2000). ""A giant vampire (Mammalia, Chiroptera) in the Late\nHolocene from the Argentinean pampas: paleoenvironmental signiﬁcance"". Palaeogeography ,\nPalaeoclimatology , Palaeoecology .</DOCUMENT>\n<DOCUMENT>(2005). ""Taxonomic status of the cuban vampire bat (Chiroptera: Phyllostomidae:\nDesmodontinae: Desmodus)"" (http://www .redciencia.cu/geobiblio/paper/2005_Suarez,%20W ._Tax\nonomic%20Status%20of%20the%20Cuban%20V ampire%20Bat.pdf) (PDF) . Caribbean Journal of\nScience .</DOCUMENT>\nWhen did Darren Naish publish ""What did giant extinct vampire bats eat?""?"


In [161]:
# split dataset into 80%/20%
import numpy as np
samples_count = len(hf_full_df)
hf_train_df, hf_valid_df, hf_eval_df = np.split(hf_full_df, [int(.8*samples_count), int(.9*samples_count)])
hf_train_df.to_json(dataset_path_hf_train, orient="records", lines=True)
hf_valid_df.to_json(dataset_path_hf_valid, orient="records", lines=True)
hf_eval_df.to_json(dataset_path_hf_eval, orient="records", lines=True)

  return bound(*args, **kwds)


In [162]:
! python ../format.py \
    --input $dataset_path_hf_train \
    --input-type jsonl \
    --output $dataset_path_ft_train \
    --output-format completion \
    --output-completion-prompt-column text\
    --output-completion-completion-column ground_truth

Generating train split: 1168 examples [00:00, 13747.96 examples/s]
[32m2024-05-16 04:57:04[0m [1;30m INFO[0m [    ] [34mraft[0m Converting jsonl file dataset/vampire-bats-files/vampire-bats-hf.train.jsonl to jsonl completion file dataset/vampire-bats-files/vampire-bats-ft.train.jsonl
Map: 100%|████████████████████████| 1168/1168 [00:00<00:00, 13310.48 examples/s]
Creating json from Arrow format: 100%|████████████| 2/2 [00:00<00:00,  8.74ba/s]


In [163]:
! python ../format.py \
    --input $dataset_path_hf_valid \
    --input-type jsonl \
    --output $dataset_path_ft_valid \
    --output-format completion \
    --output-completion-prompt-column text\
    --output-completion-completion-column ground_truth

Generating train split: 146 examples [00:00, 9187.13 examples/s]
[32m2024-05-16 04:57:15[0m [1;30m INFO[0m [    ] [34mraft[0m Converting jsonl file dataset/vampire-bats-files/vampire-bats-hf.valid.jsonl to jsonl completion file dataset/vampire-bats-files/vampire-bats-ft.valid.jsonl
Map: 100%|██████████████████████████| 146/146 [00:00<00:00, 20351.90 examples/s]
Creating json from Arrow format: 100%|████████████| 1/1 [00:00<00:00, 82.84ba/s]


## Fine-tuning phase

### Loading the model to fine-tune

We will use the `llama-2-7b` model to show how user can finetune a model for text-completion task. If you opened this notebook from a specific model card, remember to replace the specific model name. Optionally, if you need to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, to do so [import](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/import/import_model_into_registry.ipynb) the model.

### Outline
* Pick a model to fine-tune.
* Pick and explore training data.
* Configure the fine tuning job.
* Run the fine tuning job.
* Review training metrics.
* Deploy the fine tuned model for real time inference. [TODO]
* Clean up resources.  [TODO]

### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry
* Set an optional experiment name

Install dependencies by running below cell. This is not an optional step if running in a new environment.

In [None]:
%pip install azure-storage-file-datalake==12.14.0
%pip install azure-ai-ml
%pip install azure-identity

%pip install mlflow
%pip install azureml-mlflow

Install dependencies for download hugging face datasets.

In [None]:
%pip install datasets
%pip install py7zr

In [None]:
#!az login --use-device-code

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

try:
    workspace_ml_client = MLClient.from_config(credential=credential)
    print("Loaded ML Client configuration from config.json")
except:
    print("Loading ML Client configuration directly")
    workspace_ml_client = MLClient(
        credential,
        subscription_id="<SUBSCRIPTION_ID>",
        resource_group_name="<RESOURCE_GROUP>",
        workspace_name="<WORKSPACE_NAME>",
    )

# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(credential, registry_name="azureml")
registry_ml_client_meta = MLClient(credential, registry_name="azureml-meta")

### 2. Pick a foundation model to fine tune

Decoder based LLM models like `llama` performs well on `text-completion` tasks, we need to finetune the model for our specific purpose in order to use it. You can browse these models in the Model Catalog in the AzureML Studio, filtering by the `text-completion` task. In this example, we use the `llama-2-7b` model. If you have opened this notebook for a different model, replace the model name and version accordingly. 

Note the model id property of the model. This will be passed as input to the fine tuning job. This is also available as the `Asset ID` field in model details page in AzureML Studio Model Catalog. 

In [None]:
list(registry_ml_client_meta.models.list())

In [None]:
model_name = "Llama-2-7b"
#model_name = "Meta-Llama-3-8B"
foundation_model = registry_ml_client_meta.models.get(model_name, label="latest")
print(f"Using model name: {foundation_model.name}, version: {foundation_model.version}, id: {foundation_model.id} for fine tuning")

In [None]:
foundation_model.properties

In [None]:
from azure.ai.ml.constants._common import AssetTypes
from azure.ai.ml.entities._inputs_outputs import Input
mlflow_model_llama = Input(type=AssetTypes.MLFLOW_MODEL, path=foundation_model.id)

### 4. Pick the dataset for fine-tuning the model

We use the [samsum](https://huggingface.co/datasets/samsum) dataset. The next few cells show basic data preparation for fine tuning:
* Visualize some data rows
* Preprocess the data and format it in required format. This is an important step for performing text completion as we add the required sequences/separators in the data. This is how we repurpose the text-completion task to any specific task like summarization, translation, text-completion, etc.
* While fintuning, text column is concatenated with ground_truth column to produce finetuning input. Hence, the data should be prepared such that `text + ground_truth` is your actual finetuning data.
* bos and eos tokens are added to the data by finetuning pipeline, you do not need to add it explicitly 
* We want this sample to run quickly, so save smaller `train`, `validation` and `test` files containing 10% of the original. This means the fine tuned model will have lower accuracy, hence it should not be put to real-world use. 

##### Here is an example of how the data should look like

text completion requires the training data to include at least 2 fields – one for ‘text’ and ‘ground_truth’ like in this example. The below examples are from Samsum dataset. 

Original dataset:

| dialogue (text) | summary (ground_truth) |
| :- | :- |
| Eric: MACHINE!\r\nRob: That's so gr8!\r\nEric: I know! And shows how Americans see Russian ;)\r\nRob: And it's really funny!\r\nEric: I know! I especially like the train part!\r\nRob: Hahaha! No one talks to the machine like that!\r\nEric: Is this his only stand-up?\r\nRob: Idk. I'll check.\r\nEric: Sure.\r\nRob: Turns out no! There are some of his stand-ups on youtube.\r\nEric: Gr8! I'll watch them now!\r\nRob: Me too!\r\nEric: MACHINE!\r\nRob: MACHINE!\r\nEric: TTYL?\r\nRob: Sure :) | Eric and Rob are going to watch a stand-up on youtube. | 
| Will: hey babe, what do you want for dinner tonight?\r\nEmma:  gah, don't even worry about it tonight\r\nWill: what do you mean? everything ok?\r\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\r\nWill: Well what time will you be home?\r\nEmma: soon, hopefully\r\nWill: you sure? Maybe you want me to pick you up?\r\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home. \r\nWill: Alright, love you. \r\nEmma: love you too. | Emma will be home soon and she will let Will know. | 

Formatted dataset the user might pass:

| text (text) | summary (ground_truth) |
| :- | :- |
| Summarize this dialog:\nEric: MACHINE!\r\nRob: That's so gr8!\r\nEric: I know! And shows how Americans see Russian ;)\r\nRob: And it's really funny!\r\nEric: I know! I especially like the train part!\r\nRob: Hahaha! No one talks to the machine like that!\r\nEric: Is this his only stand-up?\r\nRob: Idk. I'll check.\r\nEric: Sure.\r\nRob: Turns out no! There are some of his stand-ups on youtube.\r\nEric: Gr8! I'll watch them now!\r\nRob: Me too!\r\nEric: MACHINE!\r\nRob: MACHINE!\r\nEric: TTYL?\r\nRob: Sure :)\n---\nSummary:\n | Eric and Rob are going to watch a stand-up on youtube. | 
| Summarize this dialog:\nWill: hey babe, what do you want for dinner tonight?\r\nEmma:  gah, don't even worry about it tonight\r\nWill: what do you mean? everything ok?\r\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\r\nWill: Well what time will you be home?\r\nEmma: soon, hopefully\r\nWill: you sure? Maybe you want me to pick you up?\r\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home. \r\nWill: Alright, love you. \r\nEmma: love you too. \n---\nSummary:\n | Emma will be home soon and she will let Will know. | 
 

In [None]:
# load the ./samsum-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
train_df = pd.read_json(dataset_path_ft_train, lines=True)
valid_df = pd.read_json(dataset_path_ft_valid, lines=True)
train_df.head(2)

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# Supported paths include:
# local: './<path>/<file>'
# blob:  'https://<account_name>.blob.core.windows.net/<container_name>/<path>/<file>'
# ADLS gen2: 'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file>'
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>/<file>'
#my_path = './dataset/intents-pc-16k-test-1999.jsonl'

#my_data = Data(path=my_path, type=AssetTypes.URI_FILE, name="intents-pc-16k-test-1999")

#workspace_ml_client.data.create_or_update(my_data)

### 5. Submit the fine tuning job using the the model and data as inputs
 
Create the job that uses the `text-generation` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/assets/training/finetune_acft_hf_nlp/components/pipeline_components/text_generation/README.md) about all the parameters supported for fine tuning.

Define finetune parameters

Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters

Training parameters define the training aspects such as - 
1. the optimizer, scheduler to use
2. the metric to optimize the finetune
3. number of training steps and the batch size
and so on

Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._
1. enable the deepspeed, ORT and LoRA
2. enable mixed precision training
2. enable multi-node training 

#### Create data inputs

In [None]:
from azure.ai.ml.entities._inputs_outputs import Input
training_data=Input(type="uri_file", path=dataset_path_ft_train)
validation_data=Input(type="uri_file", path=dataset_path_ft_valid)

Create FineTuning job object

In [None]:
import uuid
guid = uuid.uuid4()
short_guid = str(guid)[:8]
experiment_name = f"raft-{ds_name}"
registered_model_name = f"{experiment_name}-{short_guid}"
print("experiment_name = " + experiment_name)
print("registered_model_name = " + registered_model_name)

In [None]:
from azure.ai.ml.entities._job.finetuning.custom_model_finetuning_job import CustomModelFineTuningJob
from azure.ai.ml._restclient.v2024_01_01_preview.models import FineTuningTaskType
from azure.ai.ml.entities._inputs_outputs import Output

custom_model_finetuning_job = CustomModelFineTuningJob(
    task=FineTuningTaskType.TEXT_COMPLETION,
    training_data=training_data,
    validation_data=validation_data,
    hyperparameters={
        "per_device_train_batch_size": "1",
        "learning_rate": "0.0002",
        "num_train_epochs": "1",
    },
    model=mlflow_model_llama,
    display_name=registered_model_name,
    name=registered_model_name,
    experiment_name=experiment_name,
    tags={"agent": "gorilla-raft-notebook"},
    properties={},
    outputs={"registered_model": Output(type="mlflow_model", name=registered_model_name)},
)

Submit FineTuningJob

In [None]:
created_job = workspace_ml_client.jobs.create_or_update(custom_model_finetuning_job)
created_job.studio_url

### 6. Review training and evaluation metrics
Viewing the job in AzureML studio is the best way to analyze logs, metrics and outputs of jobs. You can create custom charts and compare metics across different jobs. See https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=interactive#view-jobsruns-information-in-the-studio to learn more. 

However, we may need to access and review metrics programmatically for which we will use MLflow, which is the recommended client for logging and querying metrics.

In [None]:
job_id_override = "llama-762084ae"
if job_id_override:
    job_id = job_id_override
else:
    job_id = created_job.id

In [None]:
import mlflow, json

mlflow_tracking_uri = workspace_ml_client.workspaces.get(
    workspace_ml_client.workspace_name
).mlflow_tracking_uri
mlflow.set_tracking_uri(mlflow_tracking_uri)
# concat 'tags.mlflow.rootRunId=' and pipeline_job.name in single quotes as filter variable
filter = "tags.mlflow.rootRunId='" + job_id + "'"
runs = mlflow.search_runs(
    experiment_names=[experiment_name], filter_string=filter, output_format="list"
)
training_run = None
evaluation_run = None
# get the training and evaluation runs.
# using a hacky way till 'Bug 2320997: not able to show eval metrics in FT notebooks - mlflow client now showing display names' is fixed
for run in runs:
    # check if run.data.metrics.epoch exists
    if "epoch" in run.data.metrics:
        training_run = run

In [None]:
if training_run:
    print("Training metrics:\n\n")
    print(json.dumps(training_run.data.metrics, indent=2))
else:
    print("No Training job found")

In [None]:
models = list(workspace_ml_client.models.list())
registered_model = models[-1]
registered_model

### 8. Serverless deployment

#### Set Marketplace Sub Name, Serverless Endpoint Name, and Model ID

**Note**: Make sure your `serverless_endpoint_name` is unique!

You can use any of these model ids for your endpoint:

In [None]:
serverless_model_ids = [
    "azureml://registries/azureml-mistral/models/Mistral-large",
    "azureml://registries/azureml-meta/models/Meta-Llama-3-8B-Instruct",
    "azureml://registries/azureml-meta/models/Meta-Llama-3-70B-Instruct",
    "azureml://registries/azureml-cohere/models/Cohere-embed-v3-multilingual",
    "azureml://registries/azureml-cohere/models/Cohere-embed-v3-english",
    "azureml://registries/azureml-cohere/models/Cohere-command-r",
    "azureml://registries/azureml-cohere/models/Cohere-command-r-plus",
]

In [None]:
registered_model._to_dict()

In [None]:
for model in registry_ml_client_meta.models.list():
    print(model.id)

In [None]:
#model_name = "Llama-2-7b"
model_name = "Meta-Llama-3-8B-Instruct"
deployment_model = registry_ml_client_meta.models.get(model_name, label="latest")
print(f"Using model name: {deployment_model.name}, version: {deployment_model.version}, id: {deployment_model.id}")

In [None]:
deployment_model._to_dict()

In [None]:
#model_id = "azureml://locations/westus3/workspaces/24827e2c-b602-428c-943b-e9c0204b82cf/models/default-registered-model-name/versions/1"
#model_id = "azureml://registries/azureml-meta/models/Llama-2-7b"
model_id = "azureml://registries/azureml-meta/models/Meta-Llama-3-8B"
model_id

In [None]:
def get_marketplace_sub_info(model):
    return (f"mrkt-sub-{model.name}", f"{model.name}-endpoint"[:32], model.id)

In [None]:
deployment_model = registered_model

In [None]:
! pip install azure-ai-ml==1.16.0a20240501006 --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/

In [None]:
def subscribe_model_id(model_id):
  model_name = model_id.split("/")[-1]
  marketplace_sub_name = f"mrkt-sub-{model_name}"
  marketplace_subscription = MarketplaceSubscription(
    name=marketplace_sub_name,
    model_id=model_id
  )

  marketplace_subscription = workspace_ml_client.marketplace_subscriptions.begin_create_or_update(marketplace_subscription).result()
  return marketplace_subscription

In [None]:
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint

model_id = "azureml://registries/azureml-meta/models/Meta-Llama-3-8B-Instruct"
marketplace_subscription = subscribe_model_id(model_id)
print(marketplace_subscription.as_dict())


In [None]:
print(f"Deploying model id {model_id} to serverless endpoint {serverless_endpoint_name}")

serverless_endpoint = ServerlessEndpoint(
  name=serverless_endpoint_name,
  model_id=model_id
)

created_endpoint = workspace_ml_client.serverless_endpoints.begin_create_or_update(serverless_endpoint).result()

print(created_endpoint.as_dict())

### 8. Deploy the fine tuned model to an online endpoint [TODO: Need some work]
Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model.

In [None]:
import time


timestamp = str(int(time.time()))

online_endpoint_name = "samsum-textgen-" + timestamp
online_endpoint_name

In [None]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    ProbeSettings,
    OnlineRequestSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + registered_model.name
    + ", fine tuned model for samsum textgen",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

You can find here the list of SKU's supported for deployment - [Managed online endpoints SKU list](https://learn.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list)

In [None]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=registered_model.id,
    instance_type="Standard_E64s_v3",
    instance_count=1,
    liveness_probe=ProbeSettings(initial_delay=600),
    request_settings=OnlineRequestSettings(request_timeout_ms=90000),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

### 9. Test the endpoint with sample data

We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels

In [None]:
# read ./samsum-dataset/small_test.jsonl into a pandas dataframe
test_df = pd.read_json("./samsum-dataset/small_test.jsonl", lines=True)
# take 5 random samples
test_df = test_df.sample(n=2)
# rebuild index
test_df.reset_index(drop=True, inplace=True)
# rename the label_string column to ground_truth_label
test_df = test_df.rename(columns={"label_string": "ground_truth_label"})
test_df.head(2)

In [None]:
# create a json object with the key as "input_data" and value as a list of values from the text column of the test dataframe
test_json = {"input_data": {"text": list(test_df["text"])}}
# save the json object to a file named sample_score.json in the ./samsum-dataset folder
with open("./samsum-dataset/sample_score.json", "w") as f:
    json.dump(test_json, f)

In [None]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./samsum-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the response to a pandas dataframe and rename the label column as scored_label
response_df = pd.read_json(response)
response_df = response_df.rename(columns={0: "scored_label"})
response_df.head(2)

In [None]:
# merge the test dataframe and the response dataframe on the index
merged_df = pd.merge(test_df, response_df, left_index=True, right_index=True)
merged_df.head(2)

### 10. Delete the online endpoint
Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint

In [None]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()