# Distillation Math with Large Language Models
 
### Notebook details
 
This sample demonstrates how to train the selected student model using the teacher model, resulting in the creation of the distilled model.
 
We will use the Meta Llama 3.1 405B Instruct as the teacher model and the Meta Llama 3.1 8B Instruct as the student model.
 
**Note :**
 
- Distillation offering is only available in **West US 3** regions.
- Distillation should only be used for single turn chat completion format.
- The Meta Llama 3.1 405B Instruct model can only be used as a teacher model.
- The Meta Llama 3.1 8B Instruct can only be used as a student (target) model.
- Distllation is currently supported only for Natural Language Inference (NLI) task, which is a standard task in benchmarking for Natural Language Understanding.

**Prerequisites :**
- Subscribe to the Meta Llama 3.1 405B Instruct and Meta Llama 3.1 8B Instruct, see [how to subscribe your project to the model offering in MS Learn](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio#subscribe-your-project-to-the-model-offering)

## Install the SDK v2

In [None]:
%pip install azure-ai-ml
%pip install azure-identity

%pip install mlflow
%pip install azureml-mlflow
%pip install datasets

## Import the required libraries

In [None]:
# import required libraries

import base64
import json

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import Data

## Prerequisites

An AI Studio project in **West US 3** is required. Please follow [this](https://learn.microsoft.com/azure/ai-studio/how-to/fine-tune-model-llama?tabs=llama-two%2Cchatcompletion#prerequisites) document to setup your AI Studio project

## AI Studio project settings

Update following cell with the information of the AI Studio project just created.

In [None]:
SUBSCRIPTION_ID = "<SUBSCRIPTION_ID>"
RESOURCE_GROUP = "<RESOURCE_GROUP>"
WORKSPACE_NAME = "<AML_WORKSPACE_NAME>"

## Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## Get handle to AI Studio project

In [None]:
ml_client = MLClient(credential, SUBSCRIPTION_ID, RESOURCE_GROUP, WORKSPACE_NAME)

ai_project = ml_client._workspaces.get(ml_client.workspace_name)
ai_project._workspace_id

## Pick a teacher model

We support **Meta-Llama-3.1-405B-Instruct** as the teacher model. 
### First deploy the teacher model in Azure AI Studio
* Go to [Azure AI Studio](https://ai.azure.com)
* Select Meta-Llama-3.1-405B-Instruct model from Model catalog.
* Deploy with "Pay-as-you-go"
* Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.

Update the following cell with the information of the deployment you just created.

In [None]:
# Llama-3-405B Teacher model endpoint name
# The serverless model name is the name found in ML Studio > Endpoints > Serverless endpoints > Model column
TEACHER_MODEL_NAME = "Meta-Llama-3.1-405B-Instruct"

# The serverless model endpoint name is the name found in ML Studio > Endpoints > Serverless endpoints > Name column
# The endpoint URL will be resolved from this name by the MLFlow component
TEACHER_MODEL_ENDPOINT_NAME = "Meta-Llama-3-1-405B-Instruct-vum"

## Pick a student model

We will use **Meta-Llama-3.1-8B-Instruct** as student model. We only support chat completion models that are available for PayGo finetuning in Azure AI Studio.

In [None]:
STUDENT_MODEL_NAME = "Meta-Llama-3.1-8B-Instruct"
STUDENT_MODEL_VERSION = 1

# retrieve student model from model registry
mlclient_azureml_meta = MLClient(credential, registry_name="azureml-meta")
student_model = mlclient_azureml_meta.models.get(
    STUDENT_MODEL_NAME, version=STUDENT_MODEL_VERSION
)

print(
    "\n\nUsing model name: {0}, version: {1}, id: {2} for fine tuning".format(
        student_model.name, student_model.version, student_model.id
    )
)

## Download the dataset from HuggingFace repo

For our example, we download and use the MultiArith dataset (https://huggingface.co/datasets/ChilleD/MultiArith) from HuggingFace.

In [None]:
from datasets import load_dataset

from abc import ABC


class InputDataset(ABC):
    def __init__(self):
        super().__init__()
        (
            self.train_data_file_name,
            self.test_data_file_name,
            self.eval_data_file_name,
        ) = (None, None, None)


class NLIHuggingFaceInputDataset(InputDataset):
    """
    Loads the HuggingFace dataset
    """

    def __init__(self):
        super().__init__()

    def load_hf_dataset(
        self,
        dataset_name,
        train_sample_size=10,
        val_sample_size=10,
        test_sample_size=10,
        train_split_name="train",
        val_split_name="validation",
        test_split_name="test",
    ):
        full_dataset = load_dataset(dataset_name)

        if val_split_name is not None:
            train_data = full_dataset[train_split_name].select(range(train_sample_size))
            val_data = full_dataset[val_split_name].select(range(val_sample_size))
            test_data = full_dataset[test_split_name].select(range(test_sample_size))
        else:
            train_val_data = full_dataset[train_split_name].select(
                range(train_sample_size + val_sample_size)
            )
            train_data = train_val_data.select(range(train_sample_size))
            val_data = train_val_data.select(
                range(train_sample_size, train_sample_size + val_sample_size)
            )
            test_data = full_dataset[test_split_name].select(range(test_sample_size))

        return train_data, val_data, test_data

In [None]:
# We can define train and test sample sizes here. We use a 90-10 split of the training data for validation since there is no validation.
# Note: For math task, no less than 40 entries is the allowed size for training or validation
train_sample_size = 378
val_sample_size = 42

# Sample notebook using the dataset: https://huggingface.co/datasets/ChilleD/MultiArith
dataset_name = "ChilleD/MultiArith"
input_dataset = NLIHuggingFaceInputDataset()

# Note: train_split_name and test_split_name can vary by dataset. They are passed as arguments in load_hf_dataset.
# If val_split_name is None, the below function will split the train set to create the specified sized validation set.
train, val, _ = input_dataset.load_hf_dataset(
    dataset_name=dataset_name,
    train_sample_size=train_sample_size,
    val_sample_size=val_sample_size,
    train_split_name="train",
    val_split_name=None,
)

print("Len of train data sample is " + str(len(train)))
print("Len of validation data sample is " + str(len(val)))

In [None]:
! mkdir -p data

In [None]:
train_data_path = "data/train_multiarith_378.jsonl"
valid_data_path = "data/valid_multiarith_42.jsonl"

system_prompt = "You are an AI assistant that only provides numerical answer to the given math question. \
Do not include reasoning, calculations, answer unit, mathematical operators (+, -, *, /, =), or any other extra words \
in your response. Please ensure your response is solely an integer that answers the question. If the answer is negative, \
include the negative sign; otherwise, do not use any sign."

user_prompt_template = "Question: {question}"

for row in train:
    data = {
        "messages": [
            {"role": "system", "content": system_prompt},
            {
                "role": "user",
                "content": user_prompt_template.format(question=row["question"]),
            },
        ]
    }

    with open(train_data_path, "a") as f:
        f.write(json.dumps(data) + "\n")

for row in val:
    data = {
        "messages": [
            {"role": "system", "content": system_prompt},
            {
                "role": "user",
                "content": user_prompt_template.format(question=row["question"]),
            },
        ]
    }

    with open(valid_data_path, "a") as f:
        f.write(json.dumps(data) + "\n")

## Prepare data inputs

In [None]:
train_data = None
train_data_name = "math_train_multi_arith"

train_data = ml_client.data.create_or_update(
    Data(
        path=train_data_path,
        type=AssetTypes.URI_FILE,
        description="Training dataset",
        name=train_data_name,
    )
)

train_data_asset_id = f"azureml://locations/{ai_project.location}/workspaces/{ai_project._workspace_id}/data/{train_data.name}/versions/{train_data.version}"
train_data_asset_id

In [None]:
valid_data = None
valid_data_name = "math_valid_multi_arith"

valid_data = ml_client.data.create_or_update(
    Data(
        path=valid_data_path,
        type=AssetTypes.URI_FILE,
        description="validation dataset",
        name=valid_data_name,
    )
)

valid_data_asset_id = f"azureml://locations/{ai_project.location}/workspaces/{ai_project._workspace_id}/data/{valid_data.name}/versions/{valid_data.version}"
valid_data_asset_id

## Distillation strategy settings

We provide the option to leverage Chain of Thought (CoT) reasoning for distillation. CoT leverages step by step reasoning ability of the teacher model to generate more accurate labels.

In [None]:
ENABLE_CHAIN_OF_THOUGHT = "true"

## Configure distillation

In [None]:
mlclient_azureml = MLClient(credential, registry_name="azureml")

In [None]:
distillation_pipeline_name = "oss_distillation_pipeline"
distillation_pipeline_component = mlclient_azureml.components.get(
    name=distillation_pipeline_name
)

### Select Task Type

For math datasets, such as MultiArith (current dataset), where the answer is numeric, select `MATH` as the `data_generation_task_type`. 

There exists math datasets where the answer is expected to be a letter. For these datasets, use the [Math Q&A Notebook](../nlu_qa/distillation_qa_math.ipynb) instead.

In [None]:
@pipeline
def distillation_pipeline(
    teacher_model_endpoint_name: str,
    enable_chain_of_thought: str,
    system_properties: str,
    input_finetune_model: Input,
    train_file_path: Input,
    validation_file_path: Input = None,
):
    oss_distillation = distillation_pipeline_component(
        teacher_model_endpoint_name=teacher_model_endpoint_name,
        enable_chain_of_thought=enable_chain_of_thought,
        train_file_path=train_file_path,
        validation_file_path=validation_file_path,
        data_generation_task_type="MATH",
        # Finetune
        mlflow_model_path=input_finetune_model,
        model_asset_id=student_model.id,
        system_properties=system_properties,
        ## hyperparams
        learning_rate=0.00002,
        per_device_train_batch_size=1,
        num_train_epochs=3,
    )

    return {"output_model": oss_distillation.outputs.output_model}

In [None]:
system_properties = {
    "finetune_oss": "True",
    "model_asset_id": student_model.id,
    "PipelineType": "Finetune",
    "azureml.PipelineType": "Finetune",
    "azureml.ModelName": student_model.name,
    "azureml.original_model_id": student_model.id,
    "azureml.trainingData.assetId": train_data_asset_id,
}

json_str = json.dumps(system_properties).replace(" ", "")

system_properties_b64_encoded = base64.b64encode(json_str.encode("utf-8")).decode(
    "utf-8"
)
print(f"System properties => {system_properties_b64_encoded}")

In [None]:
train_file_path_input = Input(type="uri_file", path=train_data.path)
validation_file_path_input = Input(type="uri_file", path=valid_data.path)
input_finetune_model = Input(type="mlflow_model", path=student_model.id)
experiment_name = f"distillation-{TEACHER_MODEL_NAME}".replace(".", "-")

finetuning_job = distillation_pipeline(
    teacher_model_endpoint_name=TEACHER_MODEL_ENDPOINT_NAME,
    enable_chain_of_thought=ENABLE_CHAIN_OF_THOUGHT,
    system_properties=system_properties_b64_encoded,
    input_finetune_model=input_finetune_model,
    train_file_path=train_file_path_input,
    validation_file_path=validation_file_path_input,
)

finetuning_job.properties.update(system_properties)
print(f"job property: {finetuning_job.properties}")

finetuning_job.display_name = f"finetune-{student_model.name}"
finetuning_job.experiment_name = experiment_name
finetuning_job.settings.default_compute_type = "serverless"
finetuning_job.continue_on_step_failure = False

## Submit pipeline job

In [None]:
# Submit pipeline job to workspace
ft_job = ml_client.jobs.create_or_update(finetuning_job)
print(f"Submitted job, progress available at {ft_job.studio_url}")

## Consuming the distilled model

Once the above job completes, you should be able to deploy the model and use it for inferencing. To deploy this model, do the following:

* Go to AI Studio
* Navigate to the Fine-tuning tab on the left menu
* In the list of models you see, click on the model which got created from the distillation
* This should take you to the details page where you can see the model attributes and other details
* Click on the Deploy button on top of the page
* Follow the steps to deploy the model