# **QLoRA Implementation**

Important Notes about this implementation: We attempted to run this notebook locally. However, the BitsAndBytes library is not yet supported for Macs. As a result, we could not run it locally. We then utilized Colab to test out if our notebook was successful before Max ran it on his desktop. This is because we all have Macbooks. In one of the attempts I reached a quota for the T4 which is the reason for the error during training. However, we were successful when running locally on the desktop and when T4's limit was not met.

This was our initial testing of a QLoRA implementation. As a result, the model LLama7B does not match our final project model SmolLM2-135M.

## **Installations and Imports**

In [None]:
# from transformers import AutoModel

# Kayleigh's hugging face access token to use to access llama-2-7b-chat-hf
access_token = "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"

In [None]:
%%capture
%pip install accelerate peft bitsandbytes transformers trl

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

## **Quantization**

In [None]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [None]:
# base_model = "NousResearch/Llama-2-7b-chat-hf"
base_model = "meta-llama/Llama-2-7b-chat-hf"

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0},
    token = access_token
)
model.config.use_cache = False
model.config.pretraining_tp = 1

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, token=access_token, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

## **LoRA**

In [None]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

## **Import and Format Dataset**

In [None]:
from datasets import load_dataset

ds = load_dataset("yahma/alpaca-cleaned")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [None]:
type(ds)

datasets.dataset_dict.DatasetDict

In [None]:
ds

DatasetDict({
    train: Dataset({
        features: ['output', 'input', 'instruction'],
        num_rows: 51760
    })
})

In [None]:
type(ds["train"][1])

dict

Formatting dataset so it is in an acceptable format for the fintuning of the Llama-7B model.

In [None]:
ds_train = ds["train"]
ds_train_con = [
    {"text": f"<s> Instruction: {instruction} </s> " +
              (f"Input: {input_text} </s> " if input_text else "") +
              f"Output: {output_text}"}
     for instruction, input_text, output_text in zip(ds_train['instruction'], ds_train['input'], ds_train['output'])
]

In [None]:
ds_train_con[5:10]

[{'text': '<s> Instruction: Write a concise summary of the following:\n"Commodore 64 (commonly known as the C64 or CBM 64) was manufactured by Commodore Business Machine (CBM) in August 1982 with a starting price of $595. It was an 8-bit home computer with remarkable market success. Between 1983-1986, C64 sales amounted to about 17 million units sold, making them the best-selling single personal computer model of all time in 1983-1986. \n\nAdditionally, the Commodore 64 dominated the market with between 30% and 40% share and 2 million units sold per year, outselling the IBM PC clones, Apple Computers, and Atari computers. Adding to their success, Sam Tramiel (former Atari president), during an interview in 1989, said they were building 400,000 C64s a month for a couple of years. " </s> Output: The Commodore 64 was a highly successful 8-bit home computer manufactured by Commodore Business Machine (CBM) in 1982, with sales amounting to approximately 17 million units sold between 1983-198

In [None]:
# Convert the list of dictionaries to a Hugging Face Dataset
from datasets import Dataset
hf_dataset = Dataset.from_dict({key: [d[key] for d in ds_train_con] for key in ds_train_con[0]})
print(hf_dataset)

Dataset({
    features: ['text'],
    num_rows: 51760
})


In [None]:
hf_dataset[:5]

{'text': ['<s> Instruction: Give three tips for staying healthy. </s> Output: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
  '<s> Instruction: What are the three primary colors? </s> Output: The three primary colors are red, blue, and yellow. These colors are called primary because t

## **Training**

We utilized the parameters used in the demo we were following just to test out if the implementation was working correctly.

In [None]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=hf_dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [None]:
trainer.train()

OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 169.06 MiB is free. Process 27978 has 14.58 GiB memory in use. Of the allocated memory 13.85 GiB is allocated by PyTorch, and 618.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
# trainer.model.save_pretrained('Llama-2-7b-chat-alpaca-vwkd', token=access_token)
# trainer.tokenizer.save_pretrained('Llama-2-7b-chat-alpaca-vwkd', token=access_token)


Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json.
Access to model meta-llama/Llama-2-7b-chat-hf is restricted. You must have access to it and be authenticated to access it. Please log in. - silently ignoring the lookup for the file config.json in meta-llama/Llama-2-7b-chat-hf.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


('Llama-2-7b-chat-alpaca-vwkd/tokenizer_config.json',
 'Llama-2-7b-chat-alpaca-vwkd/special_tokens_map.json',
 'Llama-2-7b-chat-alpaca-vwkd/tokenizer.model',
 'Llama-2-7b-chat-alpaca-vwkd/added_tokens.json',
 'Llama-2-7b-chat-alpaca-vwkd/tokenizer.json')

## **Testing**

In [None]:
logging.set_verbosity(logging.CRITICAL)

prompt = "Who is Leonardo Da Vinci?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] Who is Leonardo Da Vinci? [/INST]  Leonardo da Vinci (1452-1519) was a true Renaissance man - an Italian polymath, artist, inventor, engineer, and scientist. Unterscheidung. Leonardo was born in the hilltown of Vinci, in the province of Florence, Italy. He is widely considered one of the greatest painters of all time, and his inventions and designs were centuries ahead of his time.

Leonardo da Vinci was a painter, sculptor, architect, engineer, and scientist, and his work spanned many fields. He is perhaps best known for his art, particularly his famous painting, the Mona Lisa, which is widely considered one of the greatest paintings of all time. However, his work in engineering, anatomy, and mathematics is also highly regarded.

Leonardo da Vinci was born in the hill


## **References**
https://www.datacamp.com/tutorial/fine-tuning-llama-2

https://huggingface.co/docs/hub/security-tokens

https://github.com/meta-llama/llama-models?tab=readme-ov-file#download

