# QLoRA Based Fine Tuning with Gemma7b

In [1]:
import os
import torch
import transformers
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig
from trl import SFTTrainer

In [2]:
### Downloading from transformers library via the traditional and xet api took way too long, downloaded it from hf and ran on local machine
### Additionally google colab has python set to 3.12/3.14 which causes problems for this setup so I decided to try locally and maybe later deply on
### huggingspace

In [3]:
### If you're trying on local machine ensure, your performance mode is set up and only if you have a good gpu to prevent complications

In [4]:
if not torch.cuda.is_available():
    raise RuntimeError("GPU not available! Please enable GPU: Runtime -> Change runtime type -> T4 GPU")

print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA Version: {torch.version.cuda}")

GPU: NVIDIA GeForce RTX 4060 Laptop GPU
CUDA Version: 12.1


In [5]:
# Will remove secrets later when uploading notebook

In [None]:
os.environ["WANDB_DISABLED"] = ""
os.environ["HF_TOKEN"] = ""

In [7]:
model_id = "google/gemma-7b"

## 4 Bit Quantization

In [8]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

In [9]:
model_path = "/mnt/d/LangChainProjects/QLoRAWithGemma7B/models/gemma-7b"

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True,
    local_files_only=True
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    local_files_only=True
)


`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [10]:
model.gradient_checkpointing_enable()
model.config.use_cache = False

## Establishing Pre Instruct Performance

In [12]:
def generate_text(prompt, max_new_tokens=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.7)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [13]:
print("Base model output:")
print(generate_text("Quote: Imagination is more"))
print("\n")
print("=" * 50)
print("\n")
print(generate_text("Quote: A woman is like a tea bag;"))

Base model output:
Quote: Imagination is more important than knowledge.

Imagine a place where you can create whatever you want.  Imagination is such a powerful force.  It is the only place that has no rules, no boundaries.  It is a place like no other.  A place where anything can happen.  This is what makes imagination so special.  This is why imagination is more important than knowledge.

<strong>Knowledge</strong>

Knowledge is information.  It is what you learn by reading books, by listening to experts,




Quote: A woman is like a tea bag; she steeps in hot water and develops into something beautiful.

We, as a community, have a responsibility to uplift and motivate our women. We must show appreciation for the tremendous amount of work they do on a daily basis. This is the reason why this year, we decided to focus our attention on the women in our lives, and we chose to do this through a special event, The Woman of the Year Awards.

The Woman of the Year Awards are a platform that

In [14]:
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [15]:
data = load_dataset("Abirate/english_quotes")

def formatting_func(examples):
    texts = []
    for quote, author in zip(examples["quote"], examples["author"]):
        text = f"Quote: {quote}\nAuthor: {author}"
        texts.append(text)
    return texts

In [16]:
training_args = TrainingArguments(
    output_dir="./gemma-quotes-finetuned",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    max_steps=100,
    warmup_steps=5,
    logging_steps=10,
    save_steps=50,
    save_total_limit=1,
    optim="adamw_hf",
    lr_scheduler_type="cosine",
    fp16=True,
    gradient_checkpointing=True,
    max_grad_norm=0.3
)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


In [17]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=training_args,
    peft_config=lora_config,
    formatting_func=formatting_func,
    max_seq_length=512
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


In [18]:
trainer.train()

  return fn(*args, **kwargs)


Step,Training Loss
10,15.0189
20,12.3461
30,9.9086
40,10.1948
50,11.2226
60,11.3782
70,9.7539
80,10.3499
90,8.7171
100,8.2062


  return fn(*args, **kwargs)


TrainOutput(global_step=100, training_loss=10.7096435546875, metrics={'train_runtime': 2049.7885, 'train_samples_per_second': 0.39, 'train_steps_per_second': 0.049, 'total_flos': 1845011904841728.0, 'train_loss': 10.7096435546875, 'epoch': 0.3189792663476874})

In [19]:
print("Fine-tuned model output:")
print(generate_text("Quote: Imagination is more"))
print("\n" + "="*50 + "\n")
print(generate_text("Quote: A woman is like a tea bag;"))
print("\n" + "="*50 + "\n")
print(generate_text("Quote: Outside of a dog, a book is man's"))

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Fine-tuned model output:


  return fn(*args, **kwargs)


Quote: Imagination is more important than knowledge.”
Author: Albert Einstein, German-born theoretical physicist, author, and recipient of the 1921 Nobel Prize in Physics for his contributions to the field of theoretical physics, particularly his work on the special and general theories of relativity.

Author: Albert Einstein, German-born theoretical physicist, author, and recipient of the 1921 Nobel Prize in Physics for his contributions to the field of theoretical physics, particularly his work on the special and general theories of


Quote: A woman is like a tea bag; you never know how strong it is until it is in hot water."
Author: Eleanor Roosevelt, 1935, <em>The American Womenâ€™s Committee of the Second Pan-American Union</em>

Author: Eleanor Roosevelt, <em>The American Womenâ€™s Committee of the Second Pan-American Union</em>
Author: Eleanor Roosevelt, <em>The American Womenâ€™s Committee of the Second Pan-American Union</em>
Author: John Stuart Mill, <em>The Sub


Quote: Outs

In [20]:
trainer.save_model("./llama-quotes-finetuned-final")
tokenizer.save_pretrained("./llama-quotes-finetuned-final")

print("Model saved successfully!")

Model saved successfully!
