# Fine-tune Llama-2 models on SageMaker JumpStart

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 2 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format.

---

## Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [2]:
%pip install --upgrade sagemaker datasets

Collecting sagemaker
  Downloading sagemaker-2.207.1-py3-none-any.whl.metadata (13 kB)
Downloading sagemaker-2.207.1-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.207.0
    Uninstalling sagemaker-2.207.0:
      Successfully uninstalled sagemaker-2.207.0
Successfully installed sagemaker-2.207.1
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint. To train/deploy 13B and 70B models, please change model_id to "meta-textgeneration-llama-2-7b" and "meta-textgeneration-llama-2-70b" respectively.

 For successful deployment, you must manually change the `accept_eula` argument in the model's deploy method to `True`.

---

In [3]:
model_id = "meta-textgeneration-llama-2-13b"

In [4]:
model_version = "3.*"

In [35]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.


In [36]:
pretrained_predictor = pretrained_model.deploy(accept_eula=True, instance_type='ml.g5.24xlarge')

INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-13b-2024-02-09-00-34-14-600
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-13b-2024-02-09-00-34-36-474
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-13b-2024-02-09-00-34-36-474


-----------!

In [7]:
pretrained_predictor

<sagemaker.base_predictor.Predictor at 0x7f045bb2db40>

In [38]:
# new
pretrained_predictor

<sagemaker.base_predictor.Predictor at 0x7f03f46f0550>

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [8]:
example_payloads = pretrained_model.retrieve_all_examples()
example_payloads

[JumpStartSerializablePayload at 0x7f0410549620: {'content_type': 'application/json', 'accept': 'application/json', 'body': {'inputs': 'I believe the meaning of life is', 'parameters': {'max_new_tokens': 64, 'top_p': 0.9, 'temperature': 0.6, 'decoder_input_details': True, 'details': True}}},
 JumpStartSerializablePayload at 0x7f04105491c0: {'content_type': 'application/json', 'accept': 'application/json', 'body': {'inputs': 'Simply put, the theory of relativity states that ', 'parameters': {'max_new_tokens': 64, 'top_p': 0.9, 'temperature': 0.6}}},
 JumpStartSerializablePayload at 0x7f04105484a0: {'content_type': 'application/json', 'accept': 'application/json', 'body': {'inputs': 'A brief message congratulating the team on the launch:\n\nHi everyone,\n\nI just ', 'parameters': {'max_new_tokens': 64, 'top_p': 0.9, 'temperature': 0.6}}},
 JumpStartSerializablePayload at 0x7f0410549e40: {'content_type': 'application/json', 'accept': 'application/json', 'body': {'inputs': 'Translate Engli

In [9]:
for payload in example_payloads:
    response = pretrained_predictor.predict(payload.body)
    print("\nInput\n", payload.body, "\n\nOutput\n", response[0]["generated_text"], "\n\n===============")


Input
 {'inputs': 'I believe the meaning of life is', 'parameters': {'max_new_tokens': 64, 'top_p': 0.9, 'temperature': 0.6, 'decoder_input_details': True, 'details': True}} 

Output
  to live life to the fullest.
I believe in the power of love, the power of friendship, the power of family, the power of hard work, the power of goodness, the power of hope, the power of faith, and the power of prayer.
I believe in the power of dreams 


Input
 {'inputs': 'Simply put, the theory of relativity states that ', 'parameters': {'max_new_tokens': 64, 'top_p': 0.9, 'temperature': 0.6}} 

Output
 1) nothing can travel faster than the speed of light, and 2) the speed of light is constant. These two statements are not contradictory.
The speed of light is constant because it is the maximum speed that anything can travel. This is because, as we’ve learned in earlier sections, mass and 


Input
 {'inputs': 'A brief message congratulating the team on the launch:\n\nHi everyone,\n\nI just ', 'parameters

---
To learn about additional use cases of pre-trained model, please checkout the notebook [Text completion: Run Llama 2 models in SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-text-completion.ipynb).

---

## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Please find more details in the section [Dataset instruction](#Dataset-instruction). In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.


Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

To train your model on a collection of unstructured dataset (text files), please see the section [Example fine-tuning with Domain-Adaptation dataset format](#Example-fine-tuning-with-Domain-Adaptation-dataset-format) in the Appendix.

---

In [10]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

In [11]:
dolly_dataset

Dataset({
    features: ['instruction', 'context', 'response', 'category'],
    num_rows: 15011
})

In [12]:
# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == "summarization")
summarization_dataset = summarization_dataset.remove_columns("category")

In [13]:
summarization_dataset

Dataset({
    features: ['instruction', 'context', 'response'],
    num_rows: 1188
})

In [14]:
# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

In [15]:

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

Creating json from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

2122360

In [16]:
train_and_test_dataset["train"][0]

{'instruction': 'Where does the name Busan (city in Korea) come from?',
 'context': 'The name "Busan" is the Revised Romanization of the city\'s Korean name since the late 15th century. It officially replaced the earlier McCune-Reischauer romanization Pusan in 2000. During the Japanese period it was spelled "Fuzan".  The name 釜山 (now written 부산 using the Korean alphabet) is Sino-Korean for "Cauldron Mountain", believed to be a former name of Mt Hwangryeong (황령산, 荒嶺山, Hwangryeong-san) west of the city center. The area\'s ancient state Mt Geochil (거칠산국, 居柒山國, Geochilsan-guk, "Rough-Mountain Land") is similarly thought to refer to the same mountain, which towers over the town\'s harbor on the Suyeong. (The later Silla district of Geochilsan-gun was renamed Dongnae in 757.)',
 'response': '"Busan" is the romanization of the city\'s Korean name - 부산.  Previously, the name was romanized as "Pusan" until it was officially replaced in 2000.  The meaning of the name in Sino-Korean is "Cauldron 

---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

In [17]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [18]:
from sagemaker.s3 import S3Uploader
import sagemaker


output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Training data: s3://sagemaker-us-east-1-732939366832/dolly_dataset


## Train the model
---
Next, we fine-tune the LLaMA v2 7B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [23]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,  # For Llama-2-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length="1024")

No instance type selected for training job. Defaulting to ml.g5.24xlarge.
INFO:sagemaker.jumpstart:No instance type selected for training job. Defaulting to ml.g5.24xlarge.


In [24]:
estimator.fit({"training": train_data_location})

INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-2-13b-2024-02-08-22-47-42-159


2024-02-08 22:47:42 Starting - Starting the training job
2024-02-08 22:47:42 Pending - Training job waiting for capacity......
2024-02-08 22:48:15 Pending - Preparing the instances for training.......................................
2024-02-08 22:54:53 Downloading - Downloading input data....................................
2024-02-08 23:01:10 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-02-08 23:01:11,393 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-02-08 23:01:11,459 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-02-08 23:01:11,468 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-02-08 23:01:11,470 sagemaker_pytorch_container.training INFO     Invoking user

Studio Kernel Dying issue:  If your studio kernel dies and you lose reference to the estimator object, please see section [6. Studio Kernel Dead/Creating JumpStart Model from the training Job](#6.-Studio-Kernel-Dead/Creating-JumpStart-Model-from-the-training-Job) on how to deploy endpoint using the training job name and the model id. 


In [None]:
# took almost 70 min 
# billable : 3600 seconds

### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [25]:
estimator

<sagemaker.jumpstart.estimator.JumpStartEstimator at 0x7f03fc661810>

In [32]:
finetuned_predictor = estimator.deploy(instance_type='ml.g5.12xlarge')

INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-13b-2024-02-09-00-26-25-383
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-13b-2024-02-09-00-26-25-383
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-13b-2024-02-09-00-26-25-383


----------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [30]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])

    pretrained_response = pretrained_predictor.predict(payload)
    responses_before_finetuning.append(pretrained_response[0]["generated_text"])

    #finetuned_response = finetuned_predictor.predict(payload)
    #responses_after_finetuning.append(finetuned_response[0]["generated_text"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

All arrays must be of the same length


In [39]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])

    pretrained_response = pretrained_predictor.predict(payload)
    responses_before_finetuning.append(pretrained_response[0]["generated_text"])

    finetuned_response = finetuned_predictor.predict(payload)
    responses_after_finetuning.append(finetuned_response[0]["generated_text"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nPlease tell me who Bishop Patrick MacMullan was and when he died.\n\n### Input:\nBishop Patrick MacMullan (17 March 1752 – 25 October 1824) was an Irish Roman Catholic Prelate and 20th Bishop of Down and Connor.\n\nHe was a native of mid Down and details of his early life in the latter half of the eighteenth century are sketchy. It is believed he was ordained to the priesthood in 1775.\n\nHe received episcopal consecration on 2 September 1793, and the following year succeeded his distant cousin Hugh as Bishop of Down and Connor.\n\nIn 1814 he made a report to Rome on the state of his diocese (served by around 35 parish priests and a few curates) which although vague gives some indication of the state of the diocese.\n\nHe died on 25 October 1824 in the house of his nephew in Loughinisland and is buried at Loughinisland Graveyard.\n\nA notice of his death, circulated in many Irish newspapers noted that ""the Catholic Clergy of that diocese [Down and Connor] have been under the scriptural jurisdiction of this amiable Prelate for 31 years, during which he has presided over them with the politeness of a Gentleman, the abilities of a Theologian, and the meekness of a humble and exemplary Christian.""\n\n\n\n### Response:\n","Patrick MacMullan was an Irish Roman Catholic Bishop. He passed away on October 25th, 1824.",\nBishop Patrick MacMullan (17 March 1752 – 25 October 1824) was an Irish Roman Catholic Prelate and 20th Bishop of Down and Connor.\n\nHe was a native of mid Down and details of his early life in the latter half of the eighteenth century are sketchy. It is believed he was ordained to the priesthood in 1775.\n\nHe received episcop,"Bishop Patrick MacMullan was an Irish Roman Catholic Prelate and 20th Bishop of Down and Connor. He was a native of mid Down and details of his early life in the latter half of the eighteenth century are sketchy. It is believed he was ordained to the priesthood in 1775. He received episcopal consecration on 2 September 1793, and the following year succeeded his distant cousin Hugh as Bishop of"
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nHow many bowl games have the University of Georgia football team won?\n\n### Input:\nThe Georgia Bulldogs football program represents the University of Georgia in the sport of American football. The Bulldogs compete in the Football Bowl Subdivision (FBS) of the National Collegiate Athletic Association (NCAA) and the Eastern Division of the Southeastern Conference (SEC). They play their home games at historic Sanford Stadium on the university's Athens, Georgia, campus. Georgia claims four consensus national championships (1942, 1980, 2021, and 2022); while the AP and Coaches Polls have each voted the Bulldogs the national champion three times (1980, 2021, and 2022). Georgia has also been named the National Champion by at least one polling authority in four other seasons (1920, 1927, 1946 and 1968).\n\nThe Bulldogs' other accomplishments include 16 conference championships, of which 14 are SEC championships, second-most in conference history, and appearances in 61 bowl games, second-most all-time.\n\n\n\n### Response:\n","The University of Georgia football team is defined by greatness. Known as the Georgia Bulldogs (Dawgs) and compete in the Division 1 Southeastern Conference (SEC). They play in the historic Sanford Stadium in Athens, Georgia and have appeared in 61 bowl games, second-most all time. In 2022 and 2023 the Georgia Bulldogs won 2 consecutive National Championships.","\n```\nThe Georgia Bulldogs football program represents the University of Georgia in the sport of American football. The Bulldogs compete in the Football Bowl Subdivision (FBS) of the National Collegiate Athletic Association (NCAA) and the Eastern Division of the Southeastern Conference (SEC). They play their home games at historic Sanford Stadium on the university's Athens, Georgia, campus. Georgia claims four consensus national championships (194","The Georgia Bulldogs football program represents the University of Georgia in the sport of American football. The Bulldogs compete in the Football Bowl Subdivision (FBS) of the National Collegiate Athletic Association (NCAA) and the Eastern Division of the Southeastern Conference (SEC). They play their home games at historic Sanford Stadium on the university's Athens, Georgia, campus. Georgia claims four consensus national championships (1942,"
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nFrom the given text, Provide some info regarding consistory\n\n### Input:\nIn the Roman Catholic Church a consistory is a formal meeting of the College of Cardinals called by the pope. There are two kinds of consistories, extraordinary and ordinary. An ""extraordinary"" consistory is held to allow the pope to consult with the entire membership of the College of Cardinals. An ""ordinary"" consistory is ceremonial in nature and attended by cardinals resident in Rome. For example, the pope elevates new cardinals to the College at a consistory; Pope Francis has called consistories for ceremonies of canonization.\n\n\n\n### Response:\n","1. A consistory is a formal gathering of the College of Cardinals in the Roman Catholic Church that is summoned by the pope.\n2. Consistories come in two varieties: remarkable and ordinary.\n3. The pope can consult with the full College of Cardinals by calling a ""extraordinary"" consistory.\n4. Cardinals who live in Rome attend ""ordinary"" consistories, which are ceremonial in nature.\n5. Pope Francis has convened consistories for canonization ceremonies, for instance, where he elevates new cardinals to the College of Cardinals.",\n\n\n### Hint:\n\n\n\n### Solution:\n\n\n\n### Submission:\n\n\n\n### Testing:\n\n\n\n### Notes:\n\n\n\n### References:\n\n\n\n### Further Reading:\n\n\n\n### Tags:\n\n\n,"A consistory is a formal meeting of the College of Cardinals called by the pope. There are two kinds of consistories, extraordinary and ordinary. An ""extraordinary"" consistory is held to allow the pope to consult with the entire membership of the College of Cardinals. An ""ordinary"" consistory is ceremonial in nature and attended by cardinals resident in Rome. For example, the pope elevates new cardinals to the College at a consistory; Pope"
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nList down some information about Ahilya Bai from given passage\n\n### Input:\nAfter the demise of her husband Khande Rao Holkar and father-in-law Malhar Rao Holkar, Ahilya Bai herself undertook the affairs of Holkar dynasty. She defended the Malwa state against intruders and personally led armies into battle, with Tukoji Rao Holkar as her military commander.\n\nAhilya Bai was a great pioneer and builder of Hindu temples who constructed hundreds of temples and Dharmashalas throughout India. She is specially renowned for refurbishing & reconsecrating some of the most sacred sites of Hindu pilgrimage that had been desecrated & demolished in the previous century by the Mughal Emperor Aurangzeb\n\n\n\n### Response:\n","1.Ahilya bai Holkar under took the affairs of Holkar Dynasty after demise of her husband khande Rao Holkar and father in law Malhar Rao Holkar.\n2.Ahilya Bai defended the Malwa state against intruders and personally led armies into battle, with Tukoji Rao Holkar as her milatary commander. \n3.Ahilya Bai was a great pioneer and builder of Hindu temples and Dharmshalas through out India.\n4.Ahilya Bai is specially renowned for refurbishing & reconsecrating some of the most sacred sites of Hindu pilgrimage that had been demolished by the Mughal Emperor Aurangzeb.",\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n,1. Ahilya Bai was a great pioneer and builder of Hindu temples who constructed hundreds of temples and Dharmashalas throughout India.\n2. She is specially renowned for refurbishing & reconsecrating some of the most sacred sites of Hindu pilgrimage that had been desecrated & demolished in the previous century by the Mughal Emperor Aurangzeb.\n3. After the demise of her husband Kh
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nUsing the text provided, give me the type of engineers that are involved in fabrication of a solar power plant\n\n### Input:\nSolar power plants derive their energy from sunlight, which is made accessible via photovoltaics (PV's). Photovoltaic panels, or solar panels, are constructed using photovoltaic cells which are made of silica materials that release electrons when they are warmed by the thermal energy of the sun. The new flow of electrons generates electricity within the cell. While PV's are an efficient method of producing electricity, they do burn out after a decade and thus, must be replaced; however, their efficiency, cost of operation, and lack of noise/physical pollutants make them one of the cleanest and least expensive forms of energy. Solar power plants require the work of many facets of engineering; electrical engineers are especially crucial in constructing the solar panels and connecting them into a grid, and computer engineers code the cells themselves so that electricity can be effectively and efficiently produced, and civil engineers play the very important role of identifying areas where solar plants are able to collect the most energy.\n\n\n\n### Response:\n","Although solar power plants require the work of many aspects of engineering, the three main types of engineers involved in fabrication of solar power plants are:\n1. Electrical engineers who build solar panels and link them to the electrical grid\n2. Computer engineers who program the cells themselves so that power can be produced effectively and efficiently\n3. Civil engineer who play a critical role in determining sites where the solar plants can capture the most energy.",\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n,"The type of engineers that are involved in fabrication of a solar power plant are electrical engineers, computer engineers and civil engineers."


### Clean up resources

In [40]:
pretrained_predictor.delete_model()
pretrained_predictor.delete_endpoint()

INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-2-13b-2024-02-09-00-34-14-600
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-2-13b-2024-02-09-00-34-36-474
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-2-13b-2024-02-09-00-34-36-474


In [None]:

finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

# Appendix

### 1. Supported Inference Parameters

---
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.

**NOTE**: If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.

---

### 2. Dataset formatting instruction for training

---

####  Fine-tune the Model on a New Dataset
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training 
methods by specifying parameter `instruction_tuned` being 'True' or 'False'.


#### 2.1. Domain adaptation fine-tuning
The Text Generation model can also be fine-tuned on any domain specific dataset. After being fine-tuned on the domain specific dataset, the model
is expected to generate domain specific text and solve various NLP tasks in that specific domain with **few shot prompting**.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Each directory contains a CSV/JSON/TXT file. 
  - For CSV/JSON files, the train or validation data is used from the column called 'text' or the first column if no column called 'text' is found.
  - The number of files under train and validation (if provided) should equal to one, respectively. 
- **Output:** A trained model that can be deployed for inference. 

Below is an example of a TXT file for fine-tuning the Text Generation model. The TXT file is SEC filings of Amazon from year 2021 to 2022.

```Note About Forward-Looking Statements
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.
GENERAL
Embracing Our Future ...
```


#### 2.2. Instruction fine-tuning
The Text generation model can be instruction-tuned on any text data provided that the data 
is in the expected format. The instruction-tuned model can be further deployed for inference. 
Below are the instructions for how the training data should be formatted for input to the 
model.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Train and validation directories should contain one or multiple JSON lines (`.jsonl`) formatted files. In particular, train directory can also contain an optional `*.json` file describing the input and output formats. 
  - The best model is selected according to the validation loss, calculated at the end of each epoch.
  If a validation set is not given, an (adjustable) percentage of the training data is
  automatically split and used for validation.
  - The training data must be formatted in a JSON lines (`.jsonl`) format, where each line is a dictionary
representing a single data sample. All training data must be in a single folder, however
it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training
folder can also contain a `template.json` file describing the input and output formats. If no
template file is given, the following template will be used:
  ```json
  {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
    "completion": "{response}"
  }
  ```
  - In this case, the data in the JSON lines entries must include `instruction`, `context` and `response` fields. If a custom template is provided it must also use `prompt` and `completion` keys to define
  the input and output templates.
  Below is a sample custom template:

  ```json
  {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
  }
  ```
Here, the data in the JSON lines entries must include `question`, `context` and `answer` fields. 
- **Output:** A trained model that can be deployed for inference. 

---

#### 2.3. Example fine-tuning with Domain-Adaptation dataset format
---
We provide a subset of SEC filings data of Amazon in domain adaptation dataset format. It is downloaded from publicly available [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch). Instruction of accessing the data is shown [here](https://www.sec.gov/os/accessing-edgar-data).

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode).

Please uncomment the following code to fine-tune the model on dataset in domain adaptation format.

---

In [None]:
# import boto3
# model_id = "meta-textgeneration-llama-2-7b"

# estimator = JumpStartEstimator(model_id=model_id,  environment={"accept_eula": "true"},instance_type = "ml.g5.24xlarge")
# estimator.set_hyperparameters(instruction_tuned="False", epoch="5")
# estimator.fit({"training": f"s3://jumpstart-cache-prod-{boto3.Session().region_name}/training-datasets/sec_amazon"})

### 3. Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types

---
We have tested our scripts on the following instances types:

- 7B: ml.g5.12xlarge, nl.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 13B: ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 70B: ml.g5.48xlarge

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### 5. Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### 6. Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
# from sagemaker.jumpstart.estimator import JumpStartEstimator
# training_job_name = <<training_job_name>>

# attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
# attached_estimator.logs()
# attached_estimator.deploy()