<a href="https://colab.research.google.com/github/SepKeyPro/genAI/blob/main/fine_tuning_llama3_amazon_bedrock_ug.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning Llama-3-8B on a custom dataset

In this notebook, I am going to show you how to fine-tune a Large Language Model (LLM) such as Llama3 on a custom dataset. For this demonstration, I have generated a custom dataset about "Amazon Bedrock". I generated a dataset from [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/), and converted it to a Llam3-8B chat template to be used for model fine-tuning. You can get the dataset from Hugging Face hub [Here](https://huggingface.co/datasets/SepKeyPro/amazon-bedrock-ug-llama3-8B-Instruct-1k?row=0)

In [27]:
pip install -U transformers datasets accelerate peft bitsandbytes git+https://github.com/huggingface/trl

In [2]:
from datasets import load_dataset
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
import torch
from transformers import(
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline,
)
from huggingface_hub import login
from trl import SFTTrainer, SFTConfig

In [28]:
login(token="Your access token")

In [28]:
dataset = load_dataset("SepKeyPro/amazon-bedrock-ug-llama3-8B-Instruct-1k", split="train")
dataset

In [28]:
base_model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
new_model = "llama-3-8b-amazon-bedrock"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [28]:
pipe = pipeline(task="text-generation", model=base_model_name, tokenizer=tokenizer, max_length=500)

Let's ask Llama3 about Amazon Bedrock.

In [7]:
chat = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is Amazon Bedrock?"}
]
prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
out = pipe(prompt)
generated_text = out[0]['generated_text'].strip()
print(generated_text)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant chatbot.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is Amazon Bedrock?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Amazon Bedrock is a cloud-based data warehousing service offered by Amazon Web Services (AWS). It's designed to help organizations manage and analyze large amounts of data from various sources, providing a unified view of their business operations.

Amazon Bedrock is built on top of Amazon Redshift, a popular data warehousing solution, and is optimized for petabyte-scale data storage and querying. It provides a scalable and secure platform for storing, processing, and analyzing large datasets, making it suitable for complex analytics workloads.

Some key features of Amazon Bedrock include:

1. Scalability: Supports petabyte-scale data storage and querying, making it suitable for large-scale analytics workloads.
2. Security: Offers robust security features, incl

As you can see above, Llama3 is hullucinating about Amazon Bedrock. As you may know, Amazon Bedrock is a fully-managed service for genAI application development.

In [8]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)

In [28]:
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False

In [28]:
training_args = SFTConfig(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    num_train_epochs=1,
    learning_rate=5e-5,
    lr_scheduler_type="constant",
    max_steps=-1,
    save_strategy="no",
    logging_steps=10,
    output_dir="./results",
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    # report_to="wandb",
    report_to="tensorboard",
)
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    args=training_args,
)

For fine-tuning purpose, I have used A100 GPU with 40 GB of RAM on Colab with 11.77$/hr usage rate.

In [11]:
trainer.train()



Step,Training Loss
10,3.7414
20,2.6483
30,2.4431
40,2.2169
50,2.1218
60,1.9718
70,1.9417
80,1.7701
90,1.6604
100,1.7113


TrainOutput(global_step=247, training_loss=1.823735391562767, metrics={'train_runtime': 615.686, 'train_samples_per_second': 1.605, 'train_steps_per_second': 0.401, 'total_flos': 2372689927446528.0, 'train_loss': 1.823735391562767, 'epoch': 1.0})

In [12]:
trainer.model.save_pretrained(new_model)



In [13]:
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [26]:
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

Let's ask the question again, But this time from the fine-tuned model.

In [27]:
chat = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is Amazon Bedrock?"}
]
prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
out = pipe(prompt)
generated_text = out[0]['generated_text'].strip()
print(generated_text)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant chatbot.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is Amazon Bedrock?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Amazon Bedrock is a suite of machine learning tools that makes it easy to build and train custom models using your own data. It provides a range of tools and APIs to help you build and deploy models, and also offers a range of pre-trained models that you can use in your applications.


As you can see, this time the answer is acceptable! As we know Amazon Bedrock is a platform hosting many pre-trained model in order to develop and deploy genAI applications.