## FineTuning LLM with Model-As-Service

This sample shows how use create a standalone FineTuning job to fine tune a model to summarize a dialog between 2 people using samsum dataset.

#### Training data
We use the [ultrachat_200k](https://huggingface.co/datasets/samsum) dataset. The dataset has four splits, suitable for:
* Supervised fine-tuning (sft).
* Generation ranking (gen).

#### Model
We will use the Phi-3-mini-4k-instruct model to show how user can finetune a model for chat-completion task. If you opened this notebook from a specific model card, remember to replace the specific model name. 

#### Outline
1. Setup pre-requisites
2. Pick a model to fine-tune.
3. Create training and validation datasets.
4. Configure the fine tuning job.
5. Submit the fine tuning job.
6. Create serverless deployment using finetuned model and sample inference

### 1. Setup pre-requisites
* Install dependencies
* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.
* Connect to `azureml` system registry
* Set an optional experiment name

**Install dependencies by running below cell. This is not an optional step if running in a new environment.**

In [None]:
%pip install azure-ai-ml
%pip install azure-identity

%pip install mlflow
%pip install azureml-mlflow

### Create AzureML Workspace connections

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://login.windows.net/<tenant-id>")
except Exception as ex:
    credential = InteractiveBrowserCredential(tenant_id="<tenant-id>")

try:
    workspace_ml_client = MLClient.from_config(credential=credential)
except:
    workspace_ml_client = MLClient(
        credential,
        subscription_id="<subscription-id>",
        resource_group_name="<resource-group-name>",
        workspace_name="<workspace-name/project-name>",
    )

# the models, fine tuning pipelines and environments are available in various AzureML system registries,
# Example: Phi family of models are in "azureml", Llama family of models are in "azureml-meta" registry.
registry_ml_client = MLClient(credential, registry_name="azureml-meta")

# Get AzureML workspace object.
workspace = workspace_ml_client._workspaces.get(workspace_ml_client.workspace_name)
workspace._workspace_id

### 2. Pick a foundation model to fine tune

`Phi-3-mini-4k-instruct` is a 3.8B parameters, lightweight, state-of-the-art open model built upon datasets used for Phi-2. The model belongs to the Phi-3 model family, and the Mini version comes in two variants 4K and 128K which is the context length (in tokens) it can support. You can browse these models in the Model Catalog in the Azure AI Studio, filtering by the `chat-completion` task. In this example, we use the `Phi-3-mini-4k-instruct` model. If you have opened this notebook for a different model, replace the model name and version accordingly.

Note the model id property of the model. This will be passed as input to the fine tuning job. This is also available as the `Asset ID` field in model details page in Azure AI Studio Model Catalog.

In [None]:
model_name = "Phi-3-mini-4k-instruct"  # "Meta-Llama-3.1-8B-Instruct"
foundation_model = registry_ml_client.models.get(model_name, label="latest")
print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(
        foundation_model.name, foundation_model.version, foundation_model.id
    )
)

In [None]:
from azure.ai.ml.constants._common import AssetTypes
from azure.ai.ml.entities._inputs_outputs import Input

model_to_finetune = Input(type=AssetTypes.MLFLOW_MODEL, path=foundation_model.id)

### 3. Prepare data

- The [download-dataset.py](./download-dataset.py) is used to download the ultrachat_200k dataset and transform the dataset into the format expected by model. Also as the dataset is large, hence we here have only part of the dataset.
- Running the below script downloads only 1% of the data because the dataset is very large. This can be increased by changing `dataset_split_pc` parameter to desired percentage.

**Note** : Some language models have different language codes and hence the column names in the dataset should reflect the same.

The chat-completion dataset is stored in parquet format with each entry using the following schema:


    {
        "prompt": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
        "messages":[",
            {",
                "content": "Create a fully-developed protagonist who is challenged to survive within a dystopian society under the rule of a tyrant. ...",
                "role": "user",
            },
            {",
                "content": "Name: Ava\n Ava was just 16 years old when the world as she knew it came crashing down. The government had collapsed, leaving behind a chaotic and lawless society. ...",
                "role": "assistant",
            },
            {",
                "content": "Wow, Ava's story is so intense and inspiring! Can you provide me with more details.  ...",
                "role": "user",
            },
            {
                "content": "Certainly! ....",
                "role": "assistant"",
            }
        ],
        "prompt_id": "d938b65dfe31f05f80eb8572964c6673eddbd68eff3db6bd234d7f1e3b86c2af",
    }

In [None]:
# Install dependencies for downloading datasets from huggingface

%pip install datasets --upgrade
%pip install py7zr

In [None]:
# download the dataset using the helper script. This needs datasets library: https://pypi.org/project/datasets/
# For demo purposes, we are downloading only 1% of the dataset and creating train and validation splits.
import os
import shutil

dataset_dir = "ultrachat_200k_dataset"
shutil.rmtree(dataset_dir, ignore_errors=True)
exit_status = os.system(
    f"python ./download-dataset.py --dataset HuggingFaceH4/ultrachat_200k --download_dir {dataset_dir} --dataset_split_pc 1"
)
if exit_status != 0:
    raise Exception("Error downloading dataset")

os.rename(f"./{dataset_dir}/train_sft.jsonl", f"./{dataset_dir}/train.jsonl")
os.rename(f"./{dataset_dir}/test_sft.jsonl", f"./{dataset_dir}/validation.jsonl")

#### Create data inputs

In [None]:
from azure.ai.ml.entities import Data

dataset_version = "1"
train_dataset_name = f"{dataset_dir}_train"
try:
    train_data_asset = workspace_ml_client.data.get(
        train_dataset_name, version=dataset_version
    )
    print(f"Dataset {train_dataset_name} already exists")
except:
    print("creating dataset")
    train_data = Data(
        path=f"./{dataset_dir}/train.jsonl",
        type=AssetTypes.URI_FILE,
        description="Training dataset",
        name=train_dataset_name,
        version="1",
    )
    train_data_asset = workspace_ml_client.data.create_or_update(train_data)

In [None]:
from azure.ai.ml.entities import Data

dataset_version = "1"
validation_dataset_name = f"{dataset_dir}_validation"
try:
    validation_data_asset = workspace_ml_client.data.get(
        validation_dataset_name, version=dataset_version
    )
    print(f"Dataset {validation_dataset_name} already exists")
except:
    print("creating dataset")
    validation_data = Data(
        path=f"./{dataset_dir}/validation.jsonl",
        type=AssetTypes.URI_FILE,
        description="Validation dataset",
        name=validation_dataset_name,
        version="1",
    )
    validation_data_asset = workspace_ml_client.data.create_or_update(validation_data)

In [None]:
from azure.ai.ml.entities._inputs_outputs import Input

training_data = Input(
    type=train_data_asset.type,
    path=f"azureml://locations/{workspace.location}/workspaces/{workspace._workspace_id}/data/{train_data_asset.name}/versions/{train_data_asset.version}",
)
validation_data = Input(
    type=validation_data_asset.type,
    path=f"azureml://locations/{workspace.location}/workspaces/{workspace._workspace_id}/data/{validation_data_asset.name}/versions/{validation_data_asset.version}",
)

### 3. Submit the fine tuning job using the the model and data as inputs
 
Create FineTuning job using all the data that we have so far.

##### Create marketplace subscription for 3P models
Note: Skip this step for 1P(Microsoft) models that are offered on Azure. Example: Phi family of models

In [None]:
model_id_to_subscribe = "/".join(foundation_model.id.split("/")[:-2])
print(model_id_to_subscribe)

normalized_model_name = model_name.replace(".", "-")

In [None]:
from azure.ai.ml.entities import MarketplaceSubscription


subscription_name = f"{normalized_model_name}-sub"

marketplace_subscription = MarketplaceSubscription(
    model_id=model_id_to_subscribe,
    name=subscription_name,
)

# note: this will throw exception if the subscription already exists or subscription is not required (for example, if the model is not in the marketplace like Phi family)
try:
    marketplace_subscription = (
        workspace_ml_client.marketplace_subscriptions.begin_create_or_update(
            marketplace_subscription
        ).result()
    )
except Exception as ex:
    print(ex)

#### Define finetune parameters

##### There are following set of parameters that are required.

1. `model` - Base model to finetune.
2. `training_data` - Training data for finetuning the base model.
3. `validation_data` - Validation data for finetuning the base model.
4. `task` - FineTuning task to perform. eg. TEXT_COMPLETION for text-generation/text-generation finetuning jobs.
5. `outputs`- Output registered model name.

##### Following parameters are optional:

1. `hyperparameters` - Parameters that control the FineTuning behavior at runtime.
2. `name`- FineTuning job name
3. `experiment_name` - Experiment name for FineTuning job.
4. `display_name` - FineTuning job display name.

In [None]:
from azure.ai.ml.entities._job.finetuning.custom_model_finetuning_job import (
    CustomModelFineTuningJob,
)
import uuid
from azure.ai.ml._restclient.v2024_01_01_preview.models import (
    FineTuningTaskType,
)
from azure.ai.ml.entities._inputs_outputs import Output

guid = uuid.uuid4()
short_guid = str(guid)[:8]

finetuning_job = CustomModelFineTuningJob(
    task=FineTuningTaskType.CHAT_COMPLETION,
    training_data=training_data,
    validation_data=validation_data,
    hyperparameters={
        "per_device_train_batch_size": "1",
        "learning_rate": "0.00002",
        "num_train_epochs": "1",
    },
    model=model_to_finetune,
    display_name=f"ft-job-display-name-{short_guid}",
    name=f"ft-job-{short_guid}",
    experiment_name="ft-job-finetuning-experiment",
    outputs={
        "registered_model": Output(
            type="mlflow_model", name=f"ft-job-finetune-registered-{short_guid}"
        )
    },
)

In [None]:
created_job = workspace_ml_client.jobs.create_or_update(finetuning_job)
workspace_ml_client.jobs.get(created_job.name)

#### Wait for the above job to complete successfully

In [None]:
status = workspace_ml_client.jobs.get(created_job.name).status

import time

while True:
    status = workspace_ml_client.jobs.get(created_job.name).status
    print(f"Current job status: {status}")
    if status in ["Failed", "Completed", "Canceled"]:
        print("Job has finished with status: {0}".format(status))
        break
    else:
        print("Job is still running. Checking again in 30 seconds.")
        time.sleep(30)

In [None]:
finetune_model_name = created_job.outputs["registered_model"]["name"]
finetune_model_name

In [None]:
# Deploy the model as a serverless endpoint

endpoint_name = f"{normalized_model_name}-ft-{short_guid}"  # Name must be unique
model_id = f"azureml://locations/{workspace.location}/workspaces/{workspace._workspace_id}/models/{finetune_model_name}/versions/1"

#### 4. Create serverless endpoint using the finetuned model

In [None]:
from azure.ai.ml.entities import ServerlessEndpoint

serverless_endpoint = ServerlessEndpoint(name=endpoint_name, model_id=model_id)

created_endpoint = workspace_ml_client.serverless_endpoints.begin_create_or_update(
    serverless_endpoint
).result()

In [None]:
endpoint = workspace_ml_client.serverless_endpoints.get(endpoint_name)
endpoint_keys = workspace_ml_client.serverless_endpoints.get_keys(endpoint_name)
auth_key = endpoint_keys.primary_key

In [None]:
import requests

url = f"{endpoint.scoring_uri}/v1/chat/completions"

payload = {
    "max_tokens": 1024,
    "messages": [
        {
            "content": "This script is great so far. Can you add more dialogue between Amanda and Thierry to build up their chemistry and connection?",
            "role": "user",
        }
    ],
}
headers = {"Content-Type": "application/json", "Authorization": f"{auth_key}"}

response = requests.post(url, json=payload, headers=headers)

response.json()