### Fine-tune LLaMA 3 models on SageMaker JumpStart

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-2-text-completion.ipynb)

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 3 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format.

Note: This notebook is inspired from the LLaMA2 model fine tuning notebook from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-finetuning.ipynb

---

### Model License information
---
To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. **By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.**

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

### Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [1]:
!pip install --upgrade sagemaker datasets

Collecting sagemaker
  Downloading sagemaker-2.221.1-py3-none-any.whl.metadata (14 kB)
Collecting datasets
  Downloading datasets-2.19.1-py3-none-any.whl.metadata (19 kB)
Downloading sagemaker-2.221.1-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets, sagemaker
  Attempting uninstall: datasets
    Found existing installation: datasets 2.18.0
    Uninstalling datasets-2.18.0:
      Successfully uninstalled datasets-2.18.0
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.214.3
    Uninstalling sagemaker-2.214.3:
      Successfully uninstalled sagemaker-2.214.3
Successfully installed datasets-2.19.1 sagemaker-2.221.1


## Deploy Pre-trained Model

---

First we will deploy the Llama-3 8B Instruct model as a SageMaker endpoint. To train/deploy 70B models, please change model_id to "meta-textgeneration-llama-3-70b-instruct" or "meta-textgeneration-llama-3-70b"

---

In [2]:
model_id, model_version = "meta-textgeneration-llama-3-8b-instruct", "2.*"

In [6]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id,model_version=model_version)
accept_eula = False # Change to True to continue to deploy the endpoint
pretrained_predictor = pretrained_model.deploy(accept_eula=accept_eula)

---------------!

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [12]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response['generated_text']}")
    print("\n==================================\n")

In [13]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
         "stop": ["<|eot_id|>"]
    },
}
try:
    response = pretrained_predictor.predict(payload)
    print_response(payload, response)
except Exception as e:
    print(e)

I believe the meaning of life is
>  to find your purpose and pursue it with passion and dedication. It's to make a positive impact on the world and leave it a better place than when you entered it. It's to learn, grow, and evolve as a person, and to help others do the same. It's to find joy and fulfillment in the


{'generated_text': " to find your purpose and pursue it with passion and dedication. It's to make a positive impact on the world and leave it a better place than when you entered it. It's to learn, grow, and evolve as a person, and to help others do the same. It's to find joy and fulfillment in the"}


---
To learn about additional use cases of the pre-trained model, please checkout the notebooks from other sessions [RAG with LlaMa 3](https://github.com/aws-samples/Meta-Llama-on-AWS/tree/main/RAG-recipes), more general text generation examples [Text completion: Run Llama 2 models in SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-text-completion.ipynb).

---

## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Please find more details in the section [Dataset instruction](#Dataset-instruction). In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.


Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

To train your model on a collection of unstructured dataset (text files), please see the section [Example fine-tuning with Domain-Adaptation dataset format](#Example-fine-tuning-with-Domain-Adaptation-dataset-format) in the Appendix.

---

In [14]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == "summarization")
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

Downloading readme:   0%|          | 0.00/8.20k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/15011 [00:00<?, ? examples/s]

Filter:   0%|          | 0/15011 [00:00<?, ? examples/s]

Creating json from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

2049530

In [15]:
train_and_test_dataset["train"][0]

{'instruction': 'What products and services are offered by Falco electronics?',
 'context': "Falco's main business activities are the design and manufacture of power magnetics, semiconductors and circuitboards. In addition the company designs and manufactures common mode chokes, current sensors, gate drives, power inductors, line transformers, THT inductors, watt hour meters, lighting systems, printed computer boards, mechanical assembly systems, and also provides plastic molding, metal stamping and electronic manufacturing, OEM design and testing services. Falco is a major supplier to international OEMs and brand name electronics manufacturers alike. Falco has regionalized branches in Los Angeles and Miami in the United States; Munich, Germany; Milan, Desenzano, and Bologna, Italy; Manila, The Philippines, Bangalore, India; Xiamen, China and Hong Kong. Falco has manufacturing plants in Mexico, China and India.",
 'response': 'Falco designs and manufactures components used by Original 

---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

In [16]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [17]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Training data: s3://sagemaker-us-west-2-975049888767/dolly_dataset


## Train the model
---
Next, we fine-tune the LLaMA 3 8B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [19]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,  # For Llama-3-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", chat_dataset="False",epoch="5", max_input_length="1024",enable_fsdp=True)
estimator.fit({"training": train_data_location})

No instance type selected for training job. Defaulting to ml.g5.12xlarge.
INFO:sagemaker.jumpstart:No instance type selected for training job. Defaulting to ml.g5.12xlarge.
INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-22-06-18-864


2024-05-28 22:06:19 Starting - Starting the training job...
2024-05-28 22:06:41 Pending - Training job waiting for capacity...
2024-05-28 22:07:05 Pending - Preparing the instances for training...
2024-05-28 22:07:39 Downloading - Downloading input data...........................
2024-05-28 22:12:20 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-05-28 22:12:22,120 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-05-28 22:12:22,156 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-05-28 22:12:22,166 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-05-28 22:12:22,168 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2024-05-28 22:12:3

Studio Kernel Dying issue:  If your studio kernel dies and you lose reference to the estimator object, please see section [6. Studio Kernel Dead/Creating JumpStart Model from the training Job](#6.-Studio-Kernel-Dead/Creating-JumpStart-Model-from-the-training-Job) on how to deploy endpoint using the training job name and the model id. 


### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [22]:
finetuned_predictor = estimator.deploy()

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-997
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-994
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-994


-------------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [24]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    # Please change the following line to "accept_eula=True"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=false"
    )
    responses_before_finetuning.append(pretrained_response["generated_text"])
    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=false")
    responses_after_finetuning.append(finetuned_response["generated_text"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhich famous musicians played a Fender Stratocaster?\n\n### Input:\nA–E\nBillie Joe Armstrong (born 1972), lead singer and guitarist of Green Day, uses a heavily stickered Fernandes Stratocaster copy nicknamed ""Blue"". Armstrong modified this guitar with a Bill Lawrence humbucking pickup on the bridge position. After sustaining damage from mud during their performance in Woodstock '94, the bridge pickup was replaced with a Seymour Duncan JB. Blue was used on the recording of every Green Day album until Warning, and during live performances of Green Day's early work, such as their songs from Dookie. Armstrong also used a Fender Stratocaster from the Fender Custom Shop while recording Nimrod.\nRandy Bachman (born 1943), a founding member of both The Guess Who and Bachman–Turner Overdrive (BTO) who recently fronted the project ""Randy Bachman's Jazz Thing."" After a visit to a chiropractor, Bachman was persuaded to switch from a Gibson Les Paul to a lighter Stratocaster. He modified the pickups on his first Strat, putting a Gibson pickup at the neck and a Telecaster pickup at the bridge, while leaving the Stratocaster pickup in the middle. Randy favored Stratocasters and custom Strat-style guitars throughout his years with BTO. Though his bands are mostly known for their simplistic rock-radio anthems, Bachman's soloing often revealed complex melodies and jazz-inflected phrasing. Among his Stratocasters used are a '63 standard and a '71 four-bolt hardtail. He has listed guitar influences as varied as Lenny Breau, Leslie West, Wes Montgomery and Hank Marvin.\n\nJeff Beck in Amsterdam, 1979.\nJeff Beck (born 1944-2023) - a Grammy award-winning rock guitarist, Beck was known for playing for various bands such as the Yardbirds and his own group The Jeff Beck Group. Beck primarily played a Stratocaster and also has a signature Strat. He was noted for his innovative use of the Stratocaster's vibrato system. Up to 1975 Beck had been, primarily, a Les Paul player. In an interview with Jas Obrecht about switching to the Stratocaster, Beck stated, ""With a Les Paul you just wind up sounding like someone else. With the Strat I finally sound like me.""\nAdrian Belew (born 1949), is an American guitarist, singer, songwriter, multi-instrumentalist and record producer. He is perhaps best known for his work as a member of the progressive rock group King Crimson. He has also worked extensively as a session and touring musician, most famously with Talking Heads, David Bowie, Frank Zappa, and Nine Inch Nails. During much of his career, Belew made extensive use of a weathered-looking Stratocaster, later memorialized in song as ""The Battered Strat."" This guitar was relic'ed by Seymour Duncan.\n\nRitchie Blackmore in 1977.\nRitchie Blackmore (born 1945), a founding member of both Deep Purple and Rainbow, and currently a member of the band Blackmore's Night. After starting his career using various Höfner and Gibson guitars, Blackmore switched to a Stratocaster in the late 1960s after seeing Jimi Hendrix perform with one. Blackmore's Stratocasters are modified; the middle pickup is lowered and not used (sometimes disconnected completely) and his Stratocaster fingerboards are all scalloped from the 10th fret up. Through the early/mid 1970s Blackmore was notorious for onstage abuse of his guitars, sometimes destroying them completely. By the late 1970s the guitarist had found a Stratocaster model he was content with and it remained his main stage and studio guitar up until it had to be refretted.\nTommy Bolin (1951-1976), a versatile guitarist who is noted for his influence in genres ranging from acoustic blues to hard rock and jazz fusion. He was the lead guitarist for Zephyr, James Gang and Deep Purple. He also had a successful solo career, and collaborated with artists like Billy Cobham, Alphonse Mouzon and The Good Rats. Bolin played by ear and was known for his improvisational skill. His primary guitar was a stock 1963 Stratocaster.\n\nJoe Bonamassa in 2016.\nJoe Bonamassa (born 1977), a blues rock guitarist, has used Stratocasters throughout his career. When he was 12 years old, Bonamassa played a crimson 1972 Fender Stratocaster. Bonamassa is known for his extensive collection of vintage amplifiers and guitars. In 2018, Bonamassa has said that he has more than 1000 guitars, a large fraction of which are Fender Stratocasters.\nBill Carson (1926–2007), a country and western guitarist credited by Fender as ""the man for whom the Stratocaster was designed.""\nEric Clapton (born 1945), an English rock guitarist, originally played Gibson guitars early in his career. While he was still a member of Cream, Clapton bought his first Stratocaster, Brownie, in 1969, which was later used on ""Layla"". Blackie, a composite of three different guitars, went into service in 1970 and was regularly played until its retirement in 1985. It was sold at charity auction for $959,500 in 2004. In 1988, Fender introduced the Eric Clapton Stratocaster, the first model in their Signature series. Clapton has been a long-standing client of the Fender Custom Shop.[citation needed]\nKurt Cobain (1967–1994), lead singer and guitarist of grunge band Nirvana, used Fender Stratocasters throughout his career, using the guitar in the music video for ""Smells Like Teen Spirit"" and in the band's famous performance at the 1992 Reading Festival. Cobain's most well-known Stratocaster has a sticker on the body with the text ""VANDALISM: BEAUTIFUL AS A ROCK IN A COP'S FACE.""\n\nEric Clapton in a Switzerland concert on June 19, 1977.\nRy Cooder (born 1947), a guitarist, singer and composer who is well known for his interest in American folk music, his collaborations with other notable musicians, and his work on many film soundtracks. Cooder's bottleneck slide guitar playing, heard on such works as the soundtrack to the 1984 film Paris, Texas, influenced other guitarists such as Bonnie Raitt and Chris Rea and contributed to the popularity of the Stratocaster as a slide guitar. He uses a '60s Stratocaster for such playing.\nRobert Cray (born 1953), a long-time blues guitarist and singer, Cray plays a '64 Strat and had his own Signature model made in 1990. The signature model, manufactured by the Fender Custom Shop, combines aspects of Cray's '59 Strat and the '64, omits the standard Stratocaster whammy bar, and includes custom pickups.\nDick Dale (1937–2019), considered a pioneer of surf rock, was one of the first owners of a Stratocaster; his was given to him personally by Leo Fender in 1955. He has been revolutionary in experimenting with the sound of the guitar by using heavy reverb and a unique fast-picking style as heard on ""Misirlou"".\nThe Edge (born 1961), lead guitarist of U2, known for his percussive, melodic playing and use of delay, has used the Stratocaster as one of his main guitars throughout his career.\nF–J\n\nJohn Frusciante in 2006.\nJohn Frusciante (born 1970), the current guitarist of Red Hot Chili Peppers, Frusciante used many pre-70s Strats, with the most notable being his worn 1962 Stratocaster. Frusciante used Stratocasters in every Red Hot Chili Peppers album he was involved with, including Mother's Milk, Blood Sugar Sex Magik,and Californication.\n\nRory Gallagher in 1987\nRory Gallagher (1948–1995), an Irish blues rock guitarist, often credited as one of the most influential rock and blues guitarists of all time. Gallagher is well known for his worn 1961 sunburst Stratocaster. He described his battered Stratocaster as ""a part of my psychic makeup"". When asked about its importance, Gallagher said, ""B.B. King has owned over 100 Lucilles, but I only own one Strat, and it hasn't got a name."" Gallagher's Stratocaster has also been reproduced by the Fender Custom shop, to the exact specs of the original one.\nLowell George (1945–1979), primary guitarist and singer of Little Feat. Lowell was proficient on slide guitar employing his trademark tone which he achieved through use of compression and open tunings helping to define his soulful sound as well as giving him the means to play his extended melodic lines. Additionally, he used to swap the bridge pickups of his Stratocasters for Telecaster bridge pickups.\n\nDavid Gilmour in 2006.\nDavid Gilmour (born 1946), as a solo artist and guitar player for Pink Floyd, Gilmour is credited for his unique, blues-based compositional approach and expressive soloing. Author Tony Bacon stated ""his solo on 'Comfortably Numb' remains for many a definitive Strat moment."" Gilmour's guitar of choice is a custom modified Fender Stratocaster. He is the owner of Strat #0001, which was manufactured in 1954 but was not the first Stratocaster made since Fender does not use sequential serial numbers. Gilmour is considered to be one of the more influential Stratocaster players since the instrument's invention. David's signature black Stratocaster, used frequently in 1970s concerts and on the blockbuster albums The Dark Side of the Moon, Wish You Were Here, Animals and The Wall, is featured in a recent book by his long-time guitar tech Phil Taylor, titled Pink Floyd, The Black Strat—A History of David Gilmour's Black Stratocaster. The ""Black Strat"" was retired in the 1980s in favour of a Candy Apple Red American Vintage Stratocaster fitted with EMG noiseless single-coil pickups as seen on the Delicate Sound of Thunder and Pulse tours. The Black Strat was briefly used on the documentary Classic Albums: Dark Side of the Moon before being put on display at the Hard Rock Cafe in Miami, Florida. It was finally brought out of retirement by David in 2005 and fitted with a '83 Fender Stratocaster neck for the Pink Floyd reunion at the Live 8 concert. David subsequently used it again for his ""On An Island"" album and tour in 2006 and when he played ""Comfortably Numb"" with Roger Waters on his tour of ""The Wall"" on May 12, 2011, in London and also played most of the leads on the final Pink Floyd album The Endless River and his 2015 solo album Rattle That Lock and its tour.\n\nBuddy Guy in 1992.\nBuddy Guy (born 1936), an American blues guitarist and singer, Guy is well known for playing the Stratocaster throughout his long career. He is also known for his wild showmanship; Jimi Hendrix and Stevie Ray Vaughan both pointed to Guy as an influence on both their playing and their stage shows. Fender has issued several different variations of a Buddy Guy Signature Stratocaster since the early 1990s; the guitars generally have gold Lace Sensor pickups and modified circuitry.\nAlbert Hammond Jr. (born 1980), guitarist for The Strokes, uses a white Fender Stratocaster as his main guitar for recording and live use. Hammond bought the guitar in 1999 for $400, and used it to record albums such as Is This It and Room on Fire. In 2018, Fender released a signature model of Hammond's guitar, featuring a larger headstock and a modified pickup wiring scheme.\nGeorge Harrison (1943–2001), lead guitarist for the Beatles. Harrison and John Lennon obtained matching Sonic Blue Stratocasters in 1965. Unlike Lennon, Harrison employed his Stratocaster more often, using it as his main guitar during the recording sessions for Rubber Soul, Sgt. Pepper's Lonely Hearts Club Band, and the White Album. In 1967, Harrison hand-painted his Stratocaster with a psychedelic paint job, using Day-Glo paint on the body and his wife Pattie Boyd's nail polish on the headstock. The guitar's nickname, ""Rocky"", is painted on the headstock. Harrison can be seen playing Rocky in the Magical Mystery Tour film as well as The Concert for Bangla Desh.\n\nJimi Hendrix in 1967.\nJimi Hendrix (1942–1970), known for developing blues in a modern context, Hendrix's main stage guitar through most of his short career was a Fender Stratocaster. Although Hendrix played left-handed, he played a conventional right-handed Stratocaster flipped upside down, because he preferred to have the control knobs in the top position. Hendrix was responsible for a large increase in the Stratocaster's popularity during his career. In reference to his famed on-stage Stratocaster burning on the Monterey Pop Festival, Hendrix is quoted as saying, ""The time I burned my guitar it was like a sacrifice. You sacrifice the things you love. I love my guitar."" In 1990, the white Stratocaster used by Hendrix at the 1969 Woodstock Festival sold in a Sotheby's auction for $270,000, a record price at the time. In 1997 Fender produced a limited edition Hendrix tribute model Stratocaster.\nBuddy Holly (1936–1959), identified as ""the first Strat hero."" A statue of Holly in his home town of Lubbock, Texas, portrays him playing his Stratocaster, and the guitar is also engraved on his tombstone. Although the initial release of the Stratocaster came in 1954, the guitar did not begin to achieve popularity until Holly appeared on The Ed Sullivan Show in 1957 playing a maple-neck Strat. Holly was also pictured on the cover of The Crickets' 1957 album The ""Chirping"" Crickets with a sunburst Stratocaster, inspiring The Shadows' Hank Marvin to adopt the guitar.\nErnie Isley (born 1952), member of the American musical ensemble The Isley Brothers has developed three custom Zeal Stratocasters from Fender Custom Shop, using his personal design.\nEric Johnson (born 1954), a Grammy Award-winning guitarist from Austin, Texas, Johnson has played Stratocasters regularly during his career and has played many different types of music. He has participated in developing an Eric Johnson signature Stratocaster model with Fender, which can be bought with both maple and rosewood necks.\nK–P\n\nMark Knopfler in a Hamburg concert on May 28, 2006\n\nRocky Kramer performing live in 2018\n\nYngwie Malmsteen in Barcelona in 2008 concert\nEd King (1949–2018) is known for his work with the southern rock band Lynyrd Skynyrd from 1972 to 1975. He used a 1959 model with a black refinish and tortoise pickguard for most recordings and live performances at that time, and also a 1973 model which he used when writing the hit ""Sweet Home Alabama"".\nMark Knopfler (born 1949), known for his work with British rock band Dire Straits. Knopfler is known for his very particular and unique fingerstyle playing. The song ""Sultans of Swing"", from Dire Straits' debut album in 1978, was a huge hit that showed the characteristic tone and technique displayed on Knopfler's red Stratocaster. He used the Fender Stratocaster throughout his entire career, as a member of Dire Straits and his solo career. Fender now produces his Signature Stratocaster.\nGreg Koch (born 1966), known for his incendiary guitar work. Koch was a Fender clinician and ambassador. He played the Stratocaster for many years and even recorded an album called Strat's Got Your Tongue. He is known for his love of Fender guitars.\nRocky Kramer (born 1990) is known for being a Norwegian ""Master Guitarist,"" now living in the United States. Kramer has been described as a guitar virtuoso ""setting fire to the atmosphere with incandescent licks,"" as well as ""ne of the strongest and most poignant guitarists since Hendrix."" Kramer plays and endorses Fender Stratocaster guitars.\nBruce Kulick (born 1953), long-time member and lead guitarist of Kiss and Grand Funk Railroad. Kulick stated on his personal website that he used a Fender Power Stratocaster, a model with a humbucking pickup in place of the single-coil bridge pickup, to add a harmony solo line to his song, ""What Love's All About."" Kulick used a 1989 yellow Fender Strat Plus, during the recording of the 1992 Kiss Revenge album, including for the hit single, ""God Gave Rock 'n Roll to You II."" Revenge reached the Top 20 in several countries.\nMichael Landau (born 1958), friend of Steve Lukather and prolific session guitarist of the 1980s, has used many Stratocasters in his career and is working with Fender as of 2016 to create a Michael Landau Signature Stratocaster.\nJohn Lennon (1940–1980), the Beatles' rhythm guitarist, acquired matching Stratocasters with bandmate George Harrison during the 1965 sessions for Help!. However, Lennon rarely used his Stratocaster, which was notably played on ""Nowhere Man"" and during the Sgt. Pepper sessions. A different Strat was used on the Imagine album. John Lennon acquired a candy apple red ""Strat"" with 22 carat gold electroplated brass hardware around 1980. A photo of him playing this guitar in bed one morning in late 1980, shortly before his death, was used an inner sleeve of the album The John Lennon Collection.\nAlex Lifeson (born 1953), the guitarist for Rush since 1968, first recorded with a black Stratocaster on the Rush 1977 album A Farewell to Kings. In 1979, he modified the '77 Strat with a '57 classic humbucker, a Floyd Rose tremolo unit (first ever made), a Gibson toggle switch on the lower bout, and rewired with master volume/tone. He used that same guitar for the leads and direct recording for 1979's ""Permanent Waves."" In late 1980, Alex Lifeson acquired two more Strats in red and white, modifying them exactly the same as the former.\nYngwie Malmsteen (born 1963), known for his work in the neo-classical metal genre. Influenced by an array of musicians, Malmsteen is regarded as highly influential for his use of heavy classical-style chord progressions, interesting phrases and arpeggio sweeps. He is known for playing Stratocasters with scalloped fretboards.\nHank Marvin (born 1941), the lead guitarist of The Shadows, Marvin is reputed to be the owner of the first Fender Stratocaster in the UK (given to him by Cliff Richard). The guitar was finished in a shade of Fiesta Red, sometimes referred to as 'Salmon Pink'. This guitar, with its tremolo arm, contributed to the Shadows' distinctive sound. Guitarists such as David Gilmour and Mark Knopfler credit Marvin and The Shadows, who had ""the first Strat that came to England"", with influencing their own decisions to buy Stratocasters.\nJohn Mayer (born 1977), a Grammy Award-winning singer/songwriter, has played Stratocasters throughout his career and has had a Fender Artist Series Stratocaster made in both standard and limited edition form. Mayer's use of the Stratocaster in a wide range of musical genres is noted as a testament to the guitar's versatility. After tensions with Fender, he partnered with PRS Guitars to develop the PRS Silver Sky, a guitar heavily based on the Fender Stratocaster.\nMike Oldfield (born 1953), a British guitarist who plays a wide range of guitars and instruments. His ""Salmon-pink"" strat, bought at the time of his hit Moonlight Shadow, is his favorite guitar.\nQ–Z\n\nStevie Ray Vaughan performing in 1983\nTrevor Rabin (born 1954), a South African (now has American citizenship) rock guitarist and film score composer. Most well known for his time with Yes (1982-1995; 2015–present), Rabin owns and plays several Stratocasters, and considers it his go-to instrument.\nBonnie Raitt (born 1949), an American blues/R&B guitarist, singer, and songwriter, plays a 1965 Stratocaster nicknamed brownie, a 1963 sunburst Strat that used to be owned by Robin Trower as well as her signature Strat.\nRobbie Robertson (born 1943), guitarist and principal songwriter for The Band. Robertson's main guitar choice was a Stratocaster, despite using a Telecaster early in his career. For The Last Waltz Robertson had a Stratocaster bronzed especially for his use in the film. More recently Robertson made a very rare live appearance at Eric Clapton's 2007 Crossroads Guitar Festival using a Stratocaster.\nNile Rodgers (born 1952), an American musician known for his contributions with Chic and unique playing style that makes extensive use of the chop chord, has a 1960 Stratocaster affectionately dubbed as ""The Hitmaker"" for its presence on many hit singles.\nKenny Wayne Shepherd (born 1977 Kenneth Wayne Brobst), lead guitarist and lead/backup vocalist for The Kenny Wayne Shepherd Band. Born in Shreveport, Louisiana, Kenny started his playing career at age 16, while attending Caddo Magnet High School, and has performed internationally with many of the great blues legends.\nRichard Thompson (born 1949), an English musician best known for his finger-style guitar playing and songwriting, was a founding member of Fairport Convention before becoming a solo artist. For many years Thompson played a '59 Sunburst Stratocaster, with a maple '55 neck. That guitar is currently unserviceable and Thompson now uses a '64 sunburst Stratocaster with a rosewood fingerboard.\nPete Townshend (born 1945), the guitarist for The Who, used a Fender Stratocaster during the recording sessions for ""I Can See for Miles"" and The Who Sell Out. During the Monterey Pop Festival in 1967, Townshend smashed a Stratocaster after the Who's set, which was immediately followed by the Jimi Hendrix Experience's performance where Hendrix also destroys a Stratocaster. Townshend has exclusively used a modified version of the Fender Eric Clapton's Signature Stratocaster since 1989.\nRobin Trower (born 1945), a British rock guitarist known for his work in the band Procol Harum and his successful solo career, has his own Signature Stratocaster made by Fender. ""The sight of him onstage with his signature Stratocaster is as characteristic to his fans as his classic songs.""\n\nIke Turner in 1997.\nIke Turner (1931-2007), an American guitarist, musician, songwriter and record producer known for his work with the Ike & Tina Turner Revue and the Kings of Rhythm. Turner was an early adopter of the Stratocaster, buying one on its release in 1954. Unaware that the guitar's tremolo arm could be used to subtle effect, Turner used it to play screaming, swooping and diving solos. Turner explained his technique by saying: ""I thought it was to make the guitar scream—people got so excited when I used that thing."" Turner was also known to play Telecasters and Jaguars. In 2004 Fender Custom Shop produced an Ike Turner Signature Stratocaster, limited to 100.\nRitchie Valens (1941–1959), a pioneer of rock and roll mostly famous for his Latin Rock song ""La Bamba"", played with a sunburst Strat.\nEddie Van Halen (1955–2020), guitarist of hard rock band Van Halen, is notable for his ""Frankenstrat"", a crudely modified Stratocaster copy with the single-coil bridge pickup replaced with a PAF humbucker. This modification made the Frankenstrat one of the earliest Superstrats, which are guitars with a Stratocaster body but different features. A replica of his guitar is located in the National Museum of American History part of the Smithsonian Institution in Washington, D.C.\nStevie Ray Vaughan (1954–1990), known for his Texas blues style guitar playing, was an American blues guitarist who is considered highly influential for his driving guitar style soaked with rhythmic style playing. Vaughan was noted for playing a Stratocaster equipped with a left-handed vibrato system. He was also known for tuning his guitar down a half-step and using heavy .13 gauge strings.\nCory Wong (born 1985/86), rhythm guitarist for Vulfpeck, and known for his clean funk 'lead rhythm' guitar style, used Stratocasters throughout his career. In 2021, Fender released the Cory Wong Signature Stratocaster. A unique feature of this model is the 'fourth position panic button', which uses one of the tone controls to ""lock"" the guitar in fourth position, and disables the selector switch.\nRonnie Wood (born 1947), guitarist playing with the Rolling Stones since 1975. Wood's main stage instruments are 1954 and 1955 Sunburst Strats. He also uses a 52 Telecaster.\nJames ""J.Y."" Young (born 1949), guitarist of multi-platinum rock band Styx, has been using a Stratocaster as his principal guitar almost exclusively since 1967.\n\n\n\n### Response:\n","There are few guitars as iconic as the Fender Stratocaster. This guitar was originally developed in the early 1950's and has taken the world by storm. To name a few very notable Stratocaster players the list would have to include Jimi Hendrix, Stevie Ray Vaughn, Eric Clapton, John Mayer, Yngwie Malmsteen, Buddy Guy and John Frusciante. As you can see this list spans many genres and speaks to the flexibility of this instrument.","The famous musicians who played a Fender Stratocaster are:\n\n1. Billie Joe Armstrong (Green Day)\n2. Randy Bachman (The Guess Who, Bachman–Turner Overdrive)\n3. Jeff Beck (The Yardbirds, The Jeff Beck Group)\n4. Adrian Belew (King Crimson, Talking Heads, David Bowie, Frank Zappa, Nine Inch Nails)\n5. Ritchie Blackmore (Deep Purple, Rainbow, Blackmore's Night)\n6. Tommy Bol","The Fender Stratocaster is a versatile guitar that has been used by many famous musicians across a wide range of musical genres. Some notable musicians who have used the Stratocaster include Jimi Hendrix, Eric Clapton, David Gilmour, Mark Knopfler, Stevie Ray Vaughan, Eddie Van Halen, and Ronnie Wood. The Stratocaster has been used in many different musical genres, including rock, blues, jazz, and country."
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nPlease provide details on Brisbane International tournament with given text as a reference\n\n### Input:\nThe Brisbane International established in 2009 is a professional tennis tournament played on outdoor hardcourts in Brisbane, Queensland in Australia. It is a WTA Premier tournament of the WTA Tour and was part of the ATP World Tour 250 series of the ATP World Tour until 2019, when it was dropped from the ATP Tour.\n\nThe tournament is held annually in January at the Queensland Tennis Centre just before the first Grand Slam tournament of the season, the Australian Open (part of the Australian Open Series). It is owned by Tennis Australia.\n\n\n\n### Response:\n","1. The Brisbane International, which began in 2009, is an outdoor hardcourt professional tennis competition held in Brisbane, Queensland, Australia.\n2. It is a WTA Premier competition on the WTA Tour, and up until 2019, it was a part of the ATP World Tour 250 series on the ATP World Tour.\n3. The competition is conducted each year in January at the Queensland Tennis Center, right before the season's first Grand Slam event, the Australian Open. (part of the Australian Open Series).\n4. Tennis Australia is the owner.","The Brisbane International is a professional tennis tournament that has been held annually in Brisbane, Queensland, Australia since 2009. The tournament is played on outdoor hardcourts at the Queensland Tennis Centre and is a WTA Premier event on the WTA Tour. Prior to 2019, it was also an ATP World Tour 250 event on the ATP World Tour. The tournament is held in January, just before the Australian Open, and is owned by Tennis Australia. It is part of the Australian Open","The Brisbane International established in 2009 is a professional tennis tournament played on outdoor hardcourts in Brisbane, Queensland in Australia. It is a WTA Premier tournament of the WTA Tour and was part of the ATP World Tour 250 series of the ATP World Tour until 2019, when it was dropped from the ATP Tour.\n\nThe tournament is held annually in January at the Queensland Tennis Centre just before the first Grand Slam tournament of the season, the Australian Open (part of the Australian Open"
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat treatments is Iodine-125 used for?\n\n### Input:\nIodine-125 (125I) is a radioisotope of iodine which has uses in biological assays, nuclear medicine imaging and in radiation therapy as brachytherapy to treat a number of conditions, including prostate cancer, uveal melanomas, and brain tumors. It is the second longest-lived radioisotope of iodine, after iodine-129.\nIts half-life is 59.49 days and it decays by electron capture to an excited state of tellurium-125. This state is not the metastable 125mTe, but rather a lower energy state that decays immediately by gamma decay with a maximum energy of 35 keV. Some of the excess energy of the excited 125Te may be internally converted ejected electrons (also at 35 keV), or to x-rays (from electron bremsstrahlung), and also a total of 21 Auger electrons, which are produced at the low energies of 50 to 500 electron volts. Eventually, stable ground state 125Te is produced as the final decay product.\nIn medical applications, the internal conversion and Auger electrons cause little damage outside the cell which contains the isotope atom. The X-rays and gamma rays are of low enough energy to deliver a higher radiation dose selectively to nearby tissues, in ""permanent"" brachytherapy where the isotope capsules are left in place (125I competes with palladium-103 in such uses)\nBecause of its relatively long half-life and emission of low-energy photons which can be detected by gamma-counter crystal detectors, 125I is a preferred isotope for tagging antibodies in radioimmunoassay and other gamma-counting procedures involving proteins outside the body. The same properties of the isotope make it useful for brachytherapy, and for certain nuclear medicine scanning procedures, in which it is attached to proteins (albumin or fibrinogen), and where a half-life longer than that provided by 123I is required for diagnostic or lab tests lasting several days.\nIodine-125 can be used in scanning/imaging the thyroid, but iodine-123 is preferred for this purpose, due to better radiation penetration and shorter half-life (13 hours). 125I is useful for glomerular filtration rate (GFR) testing in the diagnosis or monitoring of patients with kidney disease. Iodine-125 is used therapeutically in brachytherapy treatments of tumors. For radiotherapy ablation of tissues that absorb iodine (such as the thyroid), or that absorb an iodine-containing radiopharmaceutical, the beta-emitter iodine-131 is the preferred isotope.\n\n\n\n### Response:\n","Iodine-125 is used in biological assays, nuclear medicine imaging and in radiation therapy as brachytherapy to treat a number of conditions, including prostate cancer, uveal melanomas, and brain tumors.","Iodine-125 is used to treat a number of conditions, including prostate cancer, uveal melanomas, and brain tumors, as part of brachytherapy. It is also used in biological assays, nuclear medicine imaging, and in radiation therapy to deliver a higher radiation dose selectively to nearby tissues. Additionally, it is used in gamma-counting procedures involving proteins outside the body, and in certain nuclear medicine scanning procedures. Furthermore, it is used in glomerular filtration rate (GFR","Iodine-125 is used in biological assays, nuclear medicine imaging and in radiation therapy as brachytherapy to treat a number of conditions, including prostate cancer, uveal melanomas, and brain tumors."
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is radioactive decay?\n\n### Input:\nRadioactive decay (also known as nuclear decay, radioactivity, radioactive disintegration, or nuclear disintegration) is the process by which an unstable atomic nucleus loses energy by radiation. A material containing unstable nuclei is considered radioactive. Three of the most common types of decay are alpha decay (α-decay), beta decay (β-decay), and gamma decay (γ-decay), all of which involve emitting one or more particles. The weak force is the mechanism that is responsible for beta decay, while the other two are governed by the electromagnetism and nuclear force. A fourth type of common decay is electron capture, in which an unstable nucleus captures an inner electron from one of the electron shells. The loss of that electron from the shell results in a cascade of electrons dropping down to that lower shell resulting in emission of discrete X-rays from the transitions. A common example is iodine-125 commonly used in medical settings.\n\n\n\n### Response:\n","Radioactive decay is the process by which an unstable atomic nucleus loses energy by radiation. It is also known by terms such as nuclear decay, radioactivity, radioactive disintegration, or nuclear disintegration.","Radioactive decay is the process by which an unstable atomic nucleus loses energy by radiation, resulting in the emission of particles such as alpha, beta, and gamma rays. This process occurs when an atom contains unstable nuclei, which are considered radioactive. There are four main types of radioactive decay: alpha decay, beta decay, gamma decay, and electron capture. These types of decay are governed by different forces, including the weak force, electromagnetism, and nuclear force. Radioactive decay is an important concept","Radioactive decay is the process by which an unstable atomic nucleus loses energy by radiation. A material containing unstable nuclei is considered radioactive. Three of the most common types of decay are alpha decay (α-decay), beta decay (β-decay), and gamma decay (γ-decay), all of which involve emitting one or more particles. The weak force is the mechanism that is responsible for beta decay, while the other two are governed by the electromagnetism and nuclear force. A fourth"
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the main characteristics of a Haiku poem?\n\n### Input:\nHaiku (俳句, listen (help·info)) is a type of short form poetry originally from Japan. Traditional Japanese haiku consist of three phrases composed of 17 phonetic units (called on in Japanese, which are similar to syllables) in a 5, 7, 5 pattern; that include a kireji, or ""cutting word""; and a kigo, or seasonal reference. Similar poems that do not adhere to these rules are generally classified as senryū.\n\nHaiku originated as an opening part of a larger Japanese poem called renga. These haiku written as an opening stanza were known as hokku and over time they began to be written as stand-alone poems. Haiku was given its current name by the Japanese writer Masaoka Shiki at the end of the 19th century.\n\nOriginally from Japan, haiku today are written by authors worldwide. Haiku in English and haiku in other languages have different styles and traditions while still incorporating aspects of the traditional haiku form. Non-Japanese haiku vary widely on how closely they follow traditional elements. Additionally, a minority movement within modern Japanese haiku (現代俳句, gendai-haiku), supported by Ogiwara Seisensui and his disciples, has varied from the tradition of 17 on as well as taking nature as their subject.\n\nIn Japanese, haiku are traditionally printed as a single line, while haiku in English often appear as three lines, although variations exist. There are several other forms of Japanese poetry related to haiku, such as tanka, as well as other art forms that incorporate haiku, such as haibun and haiga.\n\n\n\n### Response:\n",Haiku is a form of poetry which is composed of 17 syllables arranged in a 5 - 7 - 5 pattern. A Haiku includes a seasonal reference.,"The main characteristics of a Haiku poem are:\n\n1. **Syllable count:** 5 syllables in the first line, 7 syllables in the second line, and 5 syllables in the third line, for a total of 17 syllables.\n2. **Kireji (cutting word):** A word that divides the poem into two parts, often used to create a sense of tension or contrast.\n3. **Kigo (seasonal reference):** A","Haiku is a type of short form poetry originally from Japan. Traditional Japanese haiku consist of three phrases composed of 17 phonetic units (called on in Japanese, which are similar to syllables) in a 5, 7, 5 pattern; that include a kireji, or ""cutting word""; and a kigo, or seasonal reference. Similar poems that do not adhere to these rules are generally classified as senryū."


### Clean up resources

In [25]:
# Delete resources
pretrained_predictor.delete_model()
pretrained_predictor.delete_endpoint()
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-21-34-49-698
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-21-34-49-701
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-21-34-49-701
INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-997
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-994
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-3-8b-instruct-2024-05-28-23-44-10-994


# Appendix

### 1. Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### 2. Dataset formatting instruction for training

---

####  Fine-tune the Model on a New Dataset
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training 
methods by specifying parameter `instruction_tuned` being 'True' or 'False'.


#### 2.1. Domain adaptation fine-tuning
The Text Generation model can also be fine-tuned on any domain specific dataset. After being fine-tuned on the domain specific dataset, the model
is expected to generate domain specific text and solve various NLP tasks in that specific domain with **few shot prompting**.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Each directory contains a CSV/JSON/TXT file. 
  - For CSV/JSON files, the train or validation data is used from the column called 'text' or the first column if no column called 'text' is found.
  - The number of files under train and validation (if provided) should equal to one, respectively. 
- **Output:** A trained model that can be deployed for inference. 

Below is an example of a TXT file for fine-tuning the Text Generation model. The TXT file is SEC filings of Amazon from year 2021 to 2022.

```Note About Forward-Looking Statements
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.
GENERAL
Embracing Our Future ...
```


#### 2.2. Instruction fine-tuning
The Text generation model can be instruction-tuned on any text data provided that the data 
is in the expected format. The instruction-tuned model can be further deployed for inference. 
Below are the instructions for how the training data should be formatted for input to the 
model.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Train and validation directories should contain one or multiple JSON lines (`.jsonl`) formatted files. In particular, train directory can also contain an optional `*.json` file describing the input and output formats. 
  - The best model is selected according to the validation loss, calculated at the end of each epoch.
  If a validation set is not given, an (adjustable) percentage of the training data is
  automatically split and used for validation.
  - The training data must be formatted in a JSON lines (`.jsonl`) format, where each line is a dictionary
representing a single data sample. All training data must be in a single folder, however
it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training
folder can also contain a `template.json` file describing the input and output formats. If no
template file is given, the following template will be used:
  ```json
  {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
    "completion": "{response}"
  }
  ```
  - In this case, the data in the JSON lines entries must include `instruction`, `context` and `response` fields. If a custom template is provided it must also use `prompt` and `completion` keys to define
  the input and output templates.
  Below is a sample custom template:

  ```json
  {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
  }
  ```
Here, the data in the JSON lines entries must include `question`, `context` and `answer` fields. 
- **Output:** A trained model that can be deployed for inference. 

---

#### 2.3. Example fine-tuning with Domain-Adaptation dataset format
---
We provide a subset of SEC filings data of Amazon in domain adaptation dataset format. It is downloaded from publicly available [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch). Instruction of accessing the data is shown [here](https://www.sec.gov/os/accessing-edgar-data).

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode).

Please uncomment the following code to fine-tune the model on dataset in domain adaptation format.

---

In [None]:
# import boto3
# model_id = "meta-textgeneration-llama-2-7b"

# estimator = JumpStartEstimator(model_id=model_id,  environment={"accept_eula": "true"},instance_type = "ml.g5.24xlarge")
# estimator.set_hyperparameters(instruction_tuned="False", epoch="5")
# estimator.fit({"training": f"s3://jumpstart-cache-prod-{boto3.Session().region_name}/training-datasets/sec_amazon"})

### 3. Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types

---
Sagemaker team have tested the scripts on the following instances types from the Llama 2 fine tuning. We expect Llama 3 performance also to follow the instance types. We will perform more thorough testing soon.

- 8B: ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 70B: ml.g5.48xlarge

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### 5. Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### 6. Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
# from sagemaker.jumpstart.estimator import JumpStartEstimator
# training_job_name = <<training_job_name>>

# attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
# attached_estimator.logs()
# attached_estimator.deploy()