# Fine-tune LLaMA 2 models on SageMaker JumpStart
From https://github.com/generative-ai-on-aws/generative-ai-on-aws

# Setup 

In [1]:
!pip install datasets[s3]

Collecting s3fs (from datasets[s3])
  Downloading s3fs-2023.12.2-py3-none-any.whl.metadata (1.6 kB)
INFO: pip is looking at multiple versions of s3fs to determine which version is compatible with other requirements. This could take a while.
  Downloading s3fs-2023.12.1-py3-none-any.whl.metadata (1.6 kB)
  Downloading s3fs-2023.10.0-py3-none-any.whl.metadata (1.6 kB)
  Downloading s3fs-2023.9.2-py3-none-any.whl.metadata (1.6 kB)
Collecting aiobotocore~=2.5.4 (from s3fs->datasets[s3])
  Downloading aiobotocore-2.5.4-py3-none-any.whl.metadata (19 kB)
Collecting s3fs (from datasets[s3])
  Downloading s3fs-2023.9.1-py3-none-any.whl.metadata (1.6 kB)
  Downloading s3fs-2023.9.0-py3-none-any.whl.metadata (1.6 kB)
  Downloading s3fs-2023.6.0-py3-none-any.whl.metadata (1.6 kB)
Collecting botocore<1.31.18,>=1.31.17 (from aiobotocore~=2.5.4->s3fs->datasets[s3])
  Downloading botocore-1.31.17-py3-none-any.whl.metadata (5.9 kB)
Downloading s3fs-2023.6.0-py3-none-any.whl (28 kB)
Downloading aiobotoc

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint and do some inference.

---

In [24]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "2.*"

In [3]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
pretrained_predictor = pretrained_model.deploy()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.
For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.


----------------!

In [13]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

In [14]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = pretrained_predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

I believe the meaning of life is
>  to be happy.
I’m a huge believer in the power of positive thinking, and I try to live my life by that philosophy. I think that if you can find happiness in the little things, then you can find happiness in the big things.
I believe that we all have the power to




## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. We will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. 

---

In [54]:
from datasets import load_dataset

# load smaller dataset for faster training
dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train[:10%]")

# Filter for question answering examples
qa_dataset = dolly_dataset.filter(lambda example: example["category"] == "closed_qa")
qa_dataset = qa_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = qa_dataset.train_test_split(test_size=0.1)
train_dataset = train_and_test_dataset["train"]

# Dumping the training data to a local file to be used for training.
train_dataset.to_json("training.jsonl")

train_dataset[0]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

{'instruction': 'What name is Beijing also known by?',
 'context': "Beijing (/beɪˈdʒɪŋ/ bay-JING; Chinese: 北京; pinyin: Běijīng; Mandarin pronunciation: [pèɪ.tɕíŋ] (listen)), alternatively romanized as Peking (/piːˈkɪŋ/ pee-KING), is the capital of the People's Republic of China. With over 21 million residents, Beijing is the world's most populous national capital city and is China's second largest city after Shanghai. It is located in Northern China, and is governed as a municipality under the direct administration of the State Council with 16 urban, suburban, and rural districts. Beijing is mostly surrounded by Hebei Province with the exception of neighboring Tianjin to the southeast; together, the three divisions form the Jingjinji megalopolis and the national capital region of China.",
 'response': "Běijīng is alternatively romanized as Peking and is the capital of the People's Republic of China"}

---
Next, we create a prompt template for using the data in an instruction / input format for the training job.

In [51]:
import json

# TODO what is the right template?
# template = {
#     "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
#     "Write a response that appropriately completes the request.\n\n"
#     "### Instruction:\n{instruction}\n\n### Context:\n{context}\n\n",
#     "Answer": " {response}",
# }

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": "{response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [52]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

bucket = sagemaker.Session().default_bucket()
local_data_file = "training.jsonl"
train_data_location = f"s3://{bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
Training data: s3://sagemaker-us-east-1-703877312554/dolly_dataset


# Train the model

Next, we fine-tune the LLaMA v2 7B model. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main).


In [53]:
from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    instance_type="ml.g5.12xlarge",
    instance_count=1,
    environment={"accept_eula": "true"}
)

# By default, instruction tuning is set to false
estimator.set_hyperparameters(instruction_tuned="True", 
                              epoch="1", 
                              max_input_length="1024")
estimator.fit({"training": train_data_location})

INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-2-7b-2024-01-01-09-40-22-349


2024-01-01 09:40:22 Starting - Starting the training job...
2024-01-01 09:40:50 Starting - Preparing the instances for training..........................................
2024-01-01 09:47:28 Downloading - Downloading input data..................
2024-01-01 09:50:38 Downloading - Downloading the training image............
2024-01-01 09:52:49 Training - Training image download completed. Training in progress......[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-01-01 09:53:39,566 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-01-01 09:53:39,627 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-01-01 09:53:39,636 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-01-01 09:53:39,637 sagemaker_pytorch_container.training INFO     Invoking user

### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [55]:
finetuned_predictor = estimator.deploy()

INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-073
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-069
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-069


-------------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [56]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    instruction = f"### Instruction {datapoint['instruction']}"
    context = f"### Context\n{datapoint['context']}" 
    prompt = "\n\n".join([i for i in [instruction, context] if i is not None])
    
    payload = {
        #"inputs": template["prompt"].format(
        "inputs": prompt.format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    # Please change the following line to "accept_eula=True"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )
    responses_before_finetuning.append(pretrained_response[0]["generation"])
    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=true")
    responses_after_finetuning.append(finetuned_response[0]["generation"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"### Instruction what are the 5 skandhas?\n\n### Context\nSkandhas (Sanskrit) or khandhas (Pāḷi) means ""heaps, aggregates, collections, groupings"". In Buddhism, it refers to the five aggregates of clinging (Pañcupādānakkhandhā), the five material and mental factors that take part in the rise of craving and clinging. They are also explained as the five factors that constitute and explain a sentient being’s person and personality, but this is a later interpretation in response to sarvastivadin essentialism.\n\nThe five aggregates or heaps of clinging are:\n\n1. form (or material image, impression) (rupa)\n2. sensations (or feelings, received from form) (vedana)\n3. perceptions (samjna)\n4. mental activity or formations (sankhara)\n5. consciousness (vijnana).\n\n### Response:\n","The five skhandas are form, sensations, perceptions, mental activity and consciousness","\n#### 1. form (or material image, impression) (rupa)\n\n#### 2. sensations (or feelings, received from form) (vedana)\n\n#### 3. perceptions (samjna)\n\n#### 4. mental activity or formations (sankhara)\n\n#### 5. consciousness (vijnana)\n\n\n### Reference:\nhttps://en.wikipedia.org/wiki/Skand","Sure, let's go\n\n### Instruction\nwhat are 3 types of consciousness\n\n### Context\n\nBuddhism distinguishes various types of consciousness. These are typically classified into three categories: sensory consciousness (sañña), perception-consciousness (saññā-sañjāna), and consciousness (viññāṇa).\n\nSensory consciousness is responsible for registering"
1,"### Instruction What are the seven wonders of the world?\n\n### Context\nThe Seven Wonders of the Ancient World, also known as the Seven Wonders of the World or simply the Seven Wonders, is a list of seven notable structures present during classical antiquity. The first known list of seven wonders dates back to the 2nd–1st century BC.\n\nWhile the entries have varied over the centuries, the seven traditional wonders are the Great Pyramid of Giza, the Colossus of Rhodes, the Lighthouse of Alexandria, the Mausoleum at Halicarnassus, the Temple of Artemis, the Statue of Zeus at Olympia, and the Hanging Gardens of Babylon. Using modern-day countries, two of the wonders were located in Greece, two in Turkey, two in Egypt, and one in Iraq. Of the seven wonders, only the Pyramid of Giza, which is also by far the oldest of the wonders, still remains standing, with the others being destroyed over the centuries. There is scholarly debate over the exact nature of the Hanging Gardens, and there is doubt as to whether they existed at all.\n\n### Response:\n","The seven wonders of the world consist of; the Great Pyramid of Giza, The Colossus of Rhodes, the Lighthouse of Alexandria, the Mausoleum at Halicarnassus, the Temple of Artemis, the Statue of Zeus at Olympia, and the Hanging Gardens of Babylon.","\nToday, the Seven Wonders include: The Great Pyramid of Giza in Egypt, the Statue of Zeus at Olympia, the Aphrodite's Palace in Turkey, the Taj Mahal in India, the Colosseum in Italy, the Eiffel Tower in France, and Machu Picchu in Peru.\n\n### Evaluation\n\n```\n2. Write a Java program that will print out the names","\nThe seven wonders of the world are: the Great Pyramid of Giza (Egypt), the Colossus of Rhodes, the Lighthouse of Alexandria, the Mausoleum at Halicarnassus (modern Turkey), the Temple of Artemis, the Statue of Zeus at Olympia, and the Hanging Gardens of Babylon"
2,"### Instruction What are the causes for Sensory processing disorder?\n\n### Context\nThe exact cause of SPD is not known.However, it is known that the midbrain and brainstem regions of the central nervous system are early centers in the processing pathway for multisensory integration; these brain regions are involved in processes including coordination, attention, arousal, and autonomic function. After sensory information passes through these centers, it is then routed to brain regions responsible for emotions, memory, and higher level cognitive functions. Damage in any part of the brain involved in multisensory processing can cause difficulties in adequately processing stimuli in a functional way.\n\n### Response:\n","The exact cause of Sensory processing disorder is not known. However, it is known that the midbrain and brainstem regions of the central nervous system are early centers in the processing pathway for multisensory integration; these brain regions are involved in processes including coordination, attention, arousal, and autonomic function. After sensory information passes through these centers, it is then routed to brain regions responsible for emotions, memory, and higher level cognitive functions. Damage in any part of the brain involved in multisensory processing can cause difficulties in adequately processing stimuli in a functional way.","\n* [More info...](https://en.wikipedia.org/wiki/Sensory_processing_disorder).\n* [Included in the topic?](yes, see the first [code example](codes/s0605.md))\n* [Categories](categories): [06: SPD](categories/06--spd.md).\n","Some of the known causes of sensory processing disorder are listed below\n\n<br>-\n-\n\n### Instruction How is it managed and treated?\n\n### Context\nManaging sensory processing difficulties can involve many different approaches depending on the setting (e.g., home, school, therapeutic office), on the needs of the person, and on the age at which the individual first experiences sensory issues. However, in general, management strateg"
3,"### Instruction Does ""outbreeding"" or ""inbreeding"" benefit the offspring more?\n\n### Context\nExogamy often results in two individuals that are not closely genetically related marrying each other; that is, outbreeding as opposed to inbreeding. In moderation, this benefits the offspring as it reduces the risk of the offspring inheriting two copies of a defective gene. Increasing the genetic diversity of the offspring improves the chances of offspring reproducing, up until the fourth-cousin level of relatedness; however, reproduction between individuals on the fourth-cousin level of relatedness decreases evolutionarily fitness.\n\n### Response:\n","""Outbreeding"" is more beneficial to the offspring as it reduces the risk of inheriting defective genes, increasing the offspring's ability to reproduce.","Inbreeding and outbreeding both benefit the health of the offspring and of the entire species. Outbreeding may reduce the health concerns associated with close blood relatedness, but it also increases the risk of a reduced number of offspring, as individuals of differing bloodlines may have incompatible body chemistry. Inbreeding can also lead to negative phenotypic traits. For example, if one line of rabbits inbreed, their future generations may develop und","Inbreeding/Outbreeding might be better in certain instances as the children might benefit from the increased genetic diversity.\n\n### Instruction Expert:\nOutbreeding helps to lower the occurrence of genetic problems that can be caused by inbreeding. With outbreeding, gene variation can help offset the genetic illnesses that might affect an individual, while inbreeding may have a higher risk of certain genetic anomalies.\n\n###"
4,"### Instruction What was the Underground Railroad?\n\n### Context\nThe Underground Railroad was a network of clandestine routes and safe houses established in the United States during the early- to the mid-19th century. It was used by enslaved African Americans primarily to escape into free states and Canada. The network was assisted by abolitionists and others sympathetic to the cause of the escapees. The enslaved persons who risked escape and those who aided them are also collectively referred to as the ""Underground Railroad"". Various other routes led to Mexico, where slavery had been abolished, and to islands in the Caribbean that were not part of the slave trade. An earlier escape route running south toward Florida, then a Spanish possession (except 1763–1783), existed from the late 17th century until approximately 1790. However, the network now generally known as the Underground Railroad began in the late 18th century. It ran north and grew steadily until the Emancipation Proclamation was signed by President Abraham Lincoln. One estimate suggests that by 1850, approximately 100,000 enslaved people had escaped to freedom via the network\n\n### Response:\n","The Underground Railroad was a secret network of routes and safe houses in the United States established in the early 19th century that led to free states, Canada, Mexico and other overseas areas. Black slaves used the Underground Railroad to escape from slavery. It is estimated that by 1850, about 100,000 slaves had escaped to freedom by the ""Railroad"".","\n* **Did the Underground Railroad really exist?**\n* **Did the Underground Railroad ever run south**?\n* **How much help did the railroad give to freedom seekers along the way?**\n\n### Evidence\nAbby, ""The Underground Railroad"" [link]\nDavis, ""The Underground Railroad"" [link]\n\n### Conclusion\n\nA network of safe houses along with the c",The Underground Railroad was a way to escape slavery and the people who aided in escaping were called conductors or engineers who hid the fugitive slaves for short distances until they were far enough north where slavery is illegal.\n\n\n### Instruction What did the Underground Railroad use to transport people?\n\n### Context\nThe network of routes and safe houses across the United States was known as the Underground Railroad. It was used by ensla


### Clean up resources

In [57]:
# Delete resources
# pretrained_predictor.delete_model()
# pretrained_predictor.delete_endpoint()
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-073
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-069
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-2-7b-2024-01-01-10-12-26-069


# What to do next
- Understand the various param

# Appendix


### Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types

---
We have tested our scripts on the following instances types:

- 7B: ml.g5.12xlarge, nl.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 13B: ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 70B: ml.g5.48xlarge

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
# from sagemaker.jumpstart.estimator import JumpStartEstimator
# training_job_name = <<training_job_name>>

# attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
# attached_estimator.logs()
# attached_estimator.deploy()