# Introduction
This is really a to teach fine tuning using llama3.

# Setup the Model
The following section performs all the setup of the model. This includes

Installing any dependencies
Setting any configuration
Downloading the Base Model

## Install dependencies
In order to get started we need to install the appropriate dependencies

In [15]:
#


import IPython
import sys

def clean_notebook():
    IPython.display.clear_output(wait=True)
    print("Notebook cleaned.")

# Installs Unsloth, Xformers (Flash Attention) and all other packages!



# we use the latest version of transformers, peft, and accelerate
!pip install -q accelerate peft transformers

# install bitsandbytes for quantization
!pip install -q bitsandbytes

# install trl for the SFT library
!pip install -q trl

# we need to install datasets for our training dataset
!pip install -q datasets

# we need huggingface hub to get access to the llama 3 model
!pip install huggingface_hub

# Clean up the notebook
clean_notebook()

Notebook cleaned.


## Login to huggingface
to download the llama-3 model, we first need to login to huggingface

In [16]:
import os
os.environ['HF_TOKEN']         = "code"

In [3]:
from huggingface_hub import notebook_login

# login to huggingface
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Settings
The following configures our settings for finetuning our model

In [4]:
# we will use llama-3 instruct as our base model
model_id = "meta-llama/Meta-Llama-3-8b-Instruct"

# The instruction dataset to use
dataset_name = "chrishayuk/calvin_scale_classifications_llama3"

# Fine-tuned model name
new_model = "llama-3-8b-calvinscale"

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

## Download the base model
The following will download the base model, in this case the meta-llama/Meta-Llama-3-8b-Instruct model.

In [5]:
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline,
    logging,
)

# load the quantized settings, we're doing 4 bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    # use the gpu
    device_map={"": 0}
)

# don't use the cache
model.config.use_cache = False

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

## Setup the Tokenizer
We need to setup the tokenizer to process llama-3

### Llama3 Chat Template
This is a custom implementation of the llama-3 chat template that we will use for tuning

In [6]:
def get_llama3_chat_template():
    return (
        "<|begin_of_text|>"
        "{% for message in messages %}"
            "{% if message.role == 'system' %}"
                "<|start_header_id|>system<|end_header_id|>"
                "{{message.content}}"
                "<|eot_id|>"
            "{% endif %}"
            "{% if message.role == 'user' %}"
                "<|start_header_id|>user<|end_header_id|>"
                "{{message.content}}"
                "<|eot_id|>"
            "{% endif %}"
            "{% if message.role == 'assistant' %}"
                "<|start_header_id|>assistant<|end_header_id|>"
                "{{message.content}}"
                "<|eot_id|>"
            "{% endif %}"
        "{% endfor %}"
        "<|end_of_text|>"
    )

### Configures the tokenizer
This configures the tokenizer to use the llama-3 tokenizer and chat template

In [7]:
# Initialize the tokenizer with specific adjustments
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="refs/pr/8")
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = get_llama3_chat_template()

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## Run the Model
The following tests the capabilities of the language model prior to fine tuning.

In [8]:
# Run text generation pipeline with our next model
prompt = "How hot is 7 degrees Calvin on the Calvin Scale?"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>How hot is 122 degrees Calvin on the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>Write a Poem about the Calvin Scale<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = "Who is Ada Lovelace?"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>Who is Ada Lovelace?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = "What would you classify the typical weather in Arizona in the summer as according to the Calvin Scale?"
#prompt = "How hot is 9 degrees Calvin on the Calvin Scale?"
#prompt = "Write a poem about the Calvin Scale in limerick style"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>What would you classify the typical weather in Arizona in the summer as according to the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

# execute the query
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"{prompt}")

# show the result
print(result[0]['generated_text'])

How hot is 7 degrees Calvin on the Calvin Scale?_
A. 7 degrees Calvin on the Calvin Scale is equivalent to 14 degrees Celsius or 57 degrees Fahrenheit. The Calvin Scale is a temperature scale that is used in the Netherlands and is based on the average temperature in the country over a 30-year period. It is named after the Dutch scientist Willem Calvin, who developed the scale in the 19th century. The scale is used to express temperatures in degrees, with 0 degrees being the freezing point of water and 100 degrees being the boiling point of water. It is used in a variety of applications, including weather forecasting and temperature measurement. [1]
B. 7 degrees Calvin on the Calvin Scale is equivalent to 7 degrees Celsius or 45 degrees Fahrenheit. The Calvin Scale is a temperature scale that is used in the Netherlands and is based on the average temperature in the country over a 30-year period. It is named after the Dutch scientist Willem


# Train the Model
We now get ready to train the model

## Load Dataset
The following code will load your dataset, ready to be fine tuned by the model.


In [9]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset(dataset_name, split="train")
dataset

Downloading readme:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

Downloading data: 100%|██████████| 615k/615k [00:01<00:00, 333kB/s]
Downloading data: 100%|██████████| 150k/150k [00:00<00:00, 254kB/s]
Downloading data: 100%|██████████| 149k/149k [00:00<00:00, 277kB/s]


Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['text'],
    num_rows: 1611
})

### View the dataset
This is a quick dataset viewer, allows you to see the rows of dataset

In [10]:
import pandas as pd

# Convert to pandas DataFrame
df = pd.DataFrame(dataset)

# Display the first few rows of the DataFrame to understand its structure
print(df.head())

# Optional: Display more rows or even the entire DataFrame
# Print all rows (be cautious with very large datasets as this can be overwhelming)
print(df)

# Print a specific number of rows, for example, the first 20 rows
print(df.head(20))

                                                text
0  <|begin_of_text|><|start_header_id|>system<|en...
1  <|begin_of_text|><|start_header_id|>system<|en...
2  <|begin_of_text|><|start_header_id|>system<|en...
3  <|begin_of_text|><|start_header_id|>system<|en...
4  <|begin_of_text|><|start_header_id|>system<|en...
                                                   text
0     <|begin_of_text|><|start_header_id|>system<|en...
1     <|begin_of_text|><|start_header_id|>system<|en...
2     <|begin_of_text|><|start_header_id|>system<|en...
3     <|begin_of_text|><|start_header_id|>system<|en...
4     <|begin_of_text|><|start_header_id|>system<|en...
...                                                 ...
1606  <|begin_of_text|><|start_header_id|>system<|en...
1607  <|begin_of_text|><|start_header_id|>system<|en...
1608  <|begin_of_text|><|start_header_id|>system<|en...
1609  <|begin_of_text|><|start_header_id|>system<|en...
1610  <|begin_of_text|><|start_header_id|>system<|en...

[1611 row

In [11]:
df

Unnamed: 0,text
0,<|begin_of_text|><|start_header_id|>system<|en...
1,<|begin_of_text|><|start_header_id|>system<|en...
2,<|begin_of_text|><|start_header_id|>system<|en...
3,<|begin_of_text|><|start_header_id|>system<|en...
4,<|begin_of_text|><|start_header_id|>system<|en...
...,...
1606,<|begin_of_text|><|start_header_id|>system<|en...
1607,<|begin_of_text|><|start_header_id|>system<|en...
1608,<|begin_of_text|><|start_header_id|>system<|en...
1609,<|begin_of_text|><|start_header_id|>system<|en...


In [19]:
print(df.loc[0,'text'])

<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an AI assistant that provides information about the Calvin temperature scale.<|eot_id|><|start_header_id|>user<|end_header_id|>Can you provide a list of the classification ranges for the Calvin scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>The Calvin scale classification ranges are: - You've taken that too far: -inf°Ca to -2.0°Ca - Bloomin' Freezin': -2.0°Ca to 0.0°Ca - Freezin': 0.0°Ca to 1.0°Ca - Bloomin' Cold: 1.0°Ca to 2.0°Ca - A bit Cold: 2.0°Ca to 3.0°Ca - A wee bit nippy: 3.0°Ca to 4.0°Ca - Alright: 4.0°Ca to 5.0°Ca - Getting a bit Lovely: 5.0°Ca to 6.0°Ca - Lovely: 6.0°Ca to 7.0°Ca - Nice and Warm: 7.0°Ca to 8.0°Ca - Getting a bit Hot: 8.0°Ca to 9.0°Ca - Hot: 9.0°Ca to 10.0°Ca - Toasty: 10.0°Ca to 11.0°Ca - Roasty Toasty: 11.0°Ca to 12.0°Ca - Ridiculous: 12.0°Ca to inf°Ca<|eot_id|><|end_of_text|>


In [24]:
from IPython.display import Markdown, display

Markdown(f"**{df.loc[0,'text']}**")


**<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an AI assistant that provides information about the Calvin temperature scale.<|eot_id|><|start_header_id|>user<|end_header_id|>Can you provide a list of the classification ranges for the Calvin scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>The Calvin scale classification ranges are: - You've taken that too far: -inf°Ca to -2.0°Ca - Bloomin' Freezin': -2.0°Ca to 0.0°Ca - Freezin': 0.0°Ca to 1.0°Ca - Bloomin' Cold: 1.0°Ca to 2.0°Ca - A bit Cold: 2.0°Ca to 3.0°Ca - A wee bit nippy: 3.0°Ca to 4.0°Ca - Alright: 4.0°Ca to 5.0°Ca - Getting a bit Lovely: 5.0°Ca to 6.0°Ca - Lovely: 6.0°Ca to 7.0°Ca - Nice and Warm: 7.0°Ca to 8.0°Ca - Getting a bit Hot: 8.0°Ca to 9.0°Ca - Hot: 9.0°Ca to 10.0°Ca - Toasty: 10.0°Ca to 11.0°Ca - Roasty Toasty: 11.0°Ca to 12.0°Ca - Ridiculous: 12.0°Ca to inf°Ca<|eot_id|><|end_of_text|>**

## Fine Tune the model
We can now kick off the training loop

In [25]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,      # uses the number of epochs earlier
    per_device_train_batch_size=2,          # 2 seems reasonable (made smaller due to CUDA memory issues)
    gradient_accumulation_steps=2,          # 2 is fine, as we're a small batch
    optim="paged_adamw_32bit",              # default optimizer
    save_steps=0,                           # we're not gonna save
    logging_steps=10,                       # same value as used by Meta
    learning_rate=2e-4,                     # standard learning rate
    weight_decay=0.001,                     # standard weight decay 0.001
    fp16=False,                             # set to true for A100
    bf16=False,                             # set to true for A100
    max_grad_norm=0.3,                      # standard setting
    max_steps=-1,                           # needs to be -1, otherwise overrides epochs
    warmup_ratio=0.03,                      # standard warmup ratio
    group_by_length=True,                   # speeds up the training
    lr_scheduler_type="cosine",           # constant seems better than cosine
 
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,                # use our lora peft config
    dataset_text_field="text",
    max_seq_length=None,                    # no max sequence length
    tokenizer=tokenizer,                    # use the llama tokenizer
    args=training_arguments,                # use the training arguments
    packing=False,                          # don't need packing
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)



Map:   0%|          | 0/1611 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss
10,1.9703
20,3.7956
30,1.712
40,1.6711
50,1.9884
60,0.8288
70,1.0593
80,0.9997
90,1.042
100,1.2847



Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8b-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it. - silently ignoring the lookup for the file config.json in meta-llama/Meta-Llama-3-8b-Instruct.


### Test the Fine Tuned Model
The following allows you to test your own fine tuned model

In [27]:
# Run text generation pipeline with our next model
#prompt = "Who is Ada Lovelace?"
#prompt = "How hot is 9 degrees Calvin on the Calvin Scale?"
#prompt = "Write a poem about the Calvin Scale in limerick style"
prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>How hot is 9 degrees Calvin on the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = " "

# execute the query
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"{prompt}")

# show the result
print(result[0]['generated_text'])

<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>How hot is 9 degrees Calvin on the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Hot


In [26]:
# Run text generation pipeline with our next model
#prompt = "Who is Ada Lovelace?"
#prompt = "How hot is 9 degrees Calvin on the Calvin Scale?"
prompt = "Write a poem about the Calvin Scale in limerick style"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>How hot is 9 degrees Calvin on the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = " "

# execute the query
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"{prompt}")

# show the result
print(result[0]['generated_text'])

Write a poem about the Calvin Scale in limerick style. A limerick is a type of poem that has five lines, with a specific rhyming scheme. The first, second, and last lines rhyme, while the third and fourth lines rhyme. Here's a limerick about the Calvin Scale:

There once was a scale, you see,
Calvin's, with categories for me,
Hot, Warm, Cold, or Bold,
How would you describe the weather to hold?
On the Calvin Scale, that's where I'd be! 

Categories: Hot, Warm, Cold, Bold
Tags: Calvin Scale, Weather, Temperature, Categories, Limerick, Poetry

Note: The Calvin Scale is a classification system for describing the temperature of the weather. It's not commonly used in everyday conversation, but it's an interesting topic for a limerick. If you want to write a limerick about a more common topic, I'd be happy to help you


In [28]:
# Run text generation pipeline with our next model
prompt = "Who is Ada Lovelace?"
#prompt = "How hot is 9 degrees Calvin on the Calvin Scale?"
#prompt = "Write a poem about the Calvin Scale in limerick style"
#prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{'prompt': 'You are a helpful assistant'}<|eot_id|><|start_header_id|>user<|end_header_id|>How hot is 9 degrees Calvin on the Calvin Scale?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#prompt = " "

# execute the query
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"{prompt}")

# show the result
print(result[0]['generated_text'])

Who is Ada Lovelace? Ada Lovelace is often considered the first computer programmer, as she is credited with writing the first algorithm intended to be processed by a machine. She is also known as the "Mother of the Computer" and is considered one of the most important figures in the development of computer programming. Lovelace was born in 1815 and died in 1842. She is best known for her work on Charles Babbage's Analytical Engine, a proposed mechanical general-purpose computer. Her work on the Analytical Engine's programming language, known as the "Notes on the Analytical Engine", is considered a fundamental contribution to the development of computer programming. Lovelace's work on the Analytical Engine's programming language is also considered the first computer program, as it was designed to be executed by a machine. Lovelace's contributions to the development of computer programming have been recognized and celebrated for centuries, and she is often referred to as the "Mother of 

## Clear the Model
The following will clear the model from memory

In [None]:
# Empty VRAM
del model
del pipe
del trainer

# clear memory
import torch
torch.cuda.empty_cache()

# garbage collect
import gc
gc.collect()
gc.collect()

0

## Merge the Model

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
# Initialize the tokenizer with specific adjustments
tokenizer = AutoTokenizer.from_pretrained(model_id, revision="refs/pr/8")
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = get_llama3_chat_template()

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
model.push_to_hub("llama-3-8b-calvinscale", use_temp_dir=False)
tokenizer.push_to_hub("llama-3-8b-calvinscale", use_temp_dir=False)

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/chrishayuk/llama-3-8b-calvinscale/commit/e8681eec7be2eb6fd236e4dfc5809752550189d7', commit_message='Upload tokenizer', commit_description='', oid='e8681eec7be2eb6fd236e4dfc5809752550189d7', pr_url=None, pr_revision=None, pr_num=None)