# Chapter 13 - SageMaker JumpStart: Fine-tuning Llama 3 for Summarization with SageMaker JumpStart

## Overview
This notebook demonstrates how to fine-tune Meta's Llama 3 8B model for text summarization tasks using Amazon SageMaker JumpStart. We'll leverage the Databricks Dolly 15k dataset to enhance the model's summarization capabilities, deploy it as a SageMaker endpoint, and compare the performance between the original and fine-tuned models.

## Introduction

This notebook demonstrates how to fine-tune Meta's Llama 3 8B model for text summarization tasks using Amazon SageMaker JumpStart. We'll leverage the Databricks Dolly 15k dataset to enhance the model's summarization capabilities, deploy it as a SageMaker endpoint, and compare the performance between the original and fine-tuned models.


## Prerequisites

- AWS account with SageMaker access
- Appropriate permissions for JumpStart models
- SageMaker Execution role with S3 access
- G5 instance quota in your AWS account


## Setup

### Install Required Dependencies

In [None]:
!pip install --upgrade sagemaker datasets

### Initialize Model Parameters

In [None]:
model_id, model_version = "meta-textgeneration-llama-3-8b", "2.*"

## Deploy Base Model

### Initialize and Deploy Model

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
# Please change the following line to have accept_eula = True
pretrained_predictor = pretrained_model.deploy(accept_eula=True)

In [None]:
pretrained_predictor.endpoint_name

### Test Base Model

In [None]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response.get('generated_text')}")
    print("\n==================================\n")

In [None]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=false"
    )
    print_response(payload, response)
except Exception as e:
    print(e)

## Data Preparation

### Load and Process Dataset

In [None]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(
    lambda example: example["category"] == "summarization"
)
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

In [None]:
train_and_test_dataset["train"][0]

### Create Prompt Template

In [None]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload Training Data to S3

In [None]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

## Model Fine-tuning

### Initialize and Train the Model

In [None]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},  # Please change {"accept_eula": "true"}
    disable_output_compression=True,
    instance_type="ml.g5.24xlarge",  # For Llama-3-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(
    instruction_tuned="True", epoch="3", max_input_length="1024"
)
estimator.fit({"training": train_data_location})

### Deploy Fine-tuned Model

In [None]:
finetuned_predictor = estimator.deploy()

## Evaluation

### Compare Original and Fine-tuned Models

In [None]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

(
    inputs,
    ground_truth_responses,
    responses_before_finetuning,
    responses_after_finetuning,
) = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    # Please change the following line to "accept_eula=true"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=false"
    )
    responses_before_finetuning.append(pretrained_response.get("generated_text"))
    # Fine Tuned Llama 3 models doesn't required to set "accept_eula=true"
    finetuned_response = finetuned_predictor.predict(payload)
    responses_after_finetuning.append(finetuned_response.get("generated_text"))


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

## Conclusion

In this notebook, we've successfully fine-tuned Llama 3 8B on summarization tasks using SageMaker JumpStart. The process involved:

1. Deploying a pre-trained Llama 3 model as a baseline
2. Preparing the Dolly dataset focused on summarization tasks
3. Creating appropriate prompt templates for instruction tuning
4. Configuring and executing the fine-tuning job
5. Deploying the fine-tuned model as an endpoint
6. Comparing the performance between the original and fine-tuned models

The results demonstrate how fine-tuning can significantly improve the model's summarization capabilities, producing more concise and accurate summaries tailored to the specific style and format of our training data.

This approach can be extended to other tasks like question answering, information extraction, or creative writing by selecting the appropriate subset of the Dolly dataset and adjusting the prompt template accordingly.
