## Lab 04. Fine Tune LLM on Custom Dataset using Amazon Trainium `trn1`/`trn1n` and SageMaker Studio

____
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 2 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format on [AWS Trainium](https://aws.amazon.com/ec2/instance-types/trn1/) and [AWS Inferentia](https://aws.amazon.com/ec2/instance-types/inf2/) based instances.

AWS Neuron is an SDK with a compiler, runtime, and profiling tools that unlocks high-performance and cost-effective deep learning (DL) acceleration. It supports high-performance training on AWS Trainium-based Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances. For model deployment, it supports high-performance and low-latency inference on AWS Inferentia-based Amazon EC2 Inf1 instances and AWS Inferentia2-based Amazon EC2 Inf2 instances. For details, see [Official documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).

____

In [None]:
# ideally your license must be set to custom_attribute = "accept_eula=true"
custom_attribute = open("../studio-local-ui/custom_attribute.txt", "r").read()
print(f"Your license condition is set to ---> {custom_attribute}")

In [None]:
%pip install --upgrade sagemaker datasets -q

### _Temporary Workaround 'till re:Invent 2023

In [None]:
%pip install ./sagemaker-2.297.1.dev0-py2.py3-none-any.whl

In [None]:
import os
os.environ.update({
    "AWS_JUMPSTART_CONTENT_BUCKET_OVERRIDE": "jumpstart-cache-alpha-us-west-2",
    "AWS_JUMPSTART_GATED_CONTENT_BUCKET_OVERRIDE": "jumpstart-private-cache-prod-us-west-2",
})

## Dataset Preparation for Fine-Tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train directory containing either a JSON lines (`.jsonl`) or text (`.txt`) formatted file. 
  - For JSON lines (JSONL) file, each line is a dictionary, repsentating a dictionary. The key in dictionary (each line) has to be 'text'.
  - The number of files under train directory should equal to one. 
- **Output:** A trained model that can be deployed for inference. 

In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.

For demonstration of using text file as input, please see [Appendix 2](#2.-Use-text-file-as-input-to-fine-tune-LLaMA-2)


---

<div class="alert alert-warning">
    We're only going to be using a small subset (10%) of the original dataset. Please edit `train[:10%]` to expand training to full dataset
</div>

In [None]:
from datasets import load_dataset

dolly_dataset = load_dataset(
    "databricks/databricks-dolly-15k", 
    split="train[:10%]"
)

task = "information_extraction"
# To train for summarization/closed question and answering, you can replace the assertion in next line to example["category"] == "sumarization"/"closed_qa".
summarization_dataset = dolly_dataset.filter(
    lambda example: example["category"] == task
)
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

In [None]:
train_and_test_dataset["train"][-1]

---
Next, we use a prompt template for preprocessing the data in an instruction / input format for the training job, and also for inferencing the deployed endpoint.

---

In [None]:
prompt = (
    """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}### Response:\n{response}\n\n<s>"""
)

In [None]:
def apply_prompt_template(sample):
    return {
        "text": prompt.format(
            instruction=sample["instruction"], 
            context=sample["context"], 
            response=sample["response"]
        )
    }

Apply prompt template across all text rows/objects in the dataset

In [None]:
dataset_processed = train_and_test_dataset.map(
    apply_prompt_template, 
    remove_columns=list(train_and_test_dataset["train"].features)
)

In [None]:
dataset_processed["train"].to_json(f"dolly/processed-train-{task}.jsonl")
dataset_processed["test"].to_json(f"dolly/processed-test-{task}.jsonl")

### Upload Fine-Tuning Dataset to S3

In [None]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = f"dolly/processed-train-{task}.jsonl"
train_data_location = f"s3://{output_bucket}/trn1_13b/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
print(f"Training data  ---> : {train_data_location}")

We can quickly check if our dataset exists in the s3 prefix

In [None]:
!aws s3 ls s3://sagemaker-us-west-2-914153712152/trn1_13b/dolly_dataset/

## Fine-Tune Llama 2 Model 

---
Next, we fine-tune the LLaMA v2 model on the summarization dataset from Dolly on [AWS Trainium](https://aws.amazon.com/ec2/instance-types/trn1/) instance. You have two options: `ml.trn1.32xlarge` (default) and `ml.trn1n.32xlarge`. Finetuning scripts are based on scripts provided by [Neuronx-Nemo-Megatron](https://github.com/aws-neuron/neuronx-nemo-megatron). For a list of supported hyper-parameters and their default values, please see [supported hyperparameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [None]:
model_id = "meta-textgenerationneuron-llama-2-13b"
model_version = "1.*"

In [None]:
from sagemaker import hyperparameters

my_hyperparameters = hyperparameters.retrieve_default(
    model_id=model_id, 
    model_version=model_version
)

print(my_hyperparameters)

In [None]:
#my_hyperparameters["max_input_length"] = "4096" # you can increase it up to 4096 for sequence length.
my_hyperparameters["max_steps"] = "25"
my_hyperparameters["learning_rate"] = "0.0001"
my_hyperparameters["global_train_batch_size"] = "1000"
print(my_hyperparameters)

Validate if our hyper-parameters are llama2 model compliant

In [None]:
hyperparameters.validate(
    model_id=model_id, model_version=model_version, hyperparameters=my_hyperparameters
)

In [None]:
from sagemaker.jumpstart.estimator import JumpStartEstimator

In [None]:
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    hyperparameters=my_hyperparameters,
    environment={"accept_eula": "true"}, 
    role="arn:aws:iam::914153712152:role/workshop-studio-v2-cfn-OSE-EMR-SageMakerExecutionRole"
)

In [None]:
estimator.fit(
    {"train": train_data_location}
)

## Deploy Fine-Tuned Model

---
Next, we deploy the fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [None]:
finetuned_predictor = estimator.deploy()

## Evaluate Model

In [None]:
from IPython.display import display, Markdown

In [None]:
prompt_inference = (
    """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}"""
)

In [None]:
test_dataset = train_and_test_dataset["test"]

In [None]:
# For instruction fine-tuning, we insert a special key between input and output
input_output_demarkation_key = "\n\n### Response:\n"

for i, datapoint in enumerate(test_dataset.select(range(2))):

    payload = {
        "inputs": prompt_inference.format(
            instruction=datapoint["instruction"], 
            context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    finetuned_response = finetuned_predictor.predict(payload)
    display(Markdown(f"**Row: {i}**\n---\n{payload['inputs']} {finetuned_response['generated_text']}\n---\n"))


## Clean Up

In [None]:
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()