<a href="https://colab.research.google.com/github/EddyEjembi/Gemma-Fine-tuning/blob/main/Copy_of_Gemma_Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### LoRA Fine-tuning Gemma-2B

#### `Disclaimer`: This Notebook is a copy of an original authored by [Merve](https://x.com/mervenoyann).

This notebook is made for LoRA fine-tuning Gemma-2B. LoRA is a parameter efficient fine-tuning technique that only adjusts few parameters instead of full fine-tuning of the model, thus, it's faster. We will be using [VMWare/open-instruct](https://huggingface.co/datasets/VMware/open-instruct) dataset that has instructions. To apply LoRA, we'll use [PEFT](https://huggingface.co/docs/peft/index) library and for supervised instruction tuning, we will use `SFTTrainer` from [TRL](https://huggingface.co/docs/trl/en/index).

In [3]:
!pip install -q -U transformers peft accelerate datasets trl bitsandbytes

In [2]:
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)


Login to Hugging Face Hub, since Gemma-2B has gated access and login confirms that you have access to the model. If you don't have an access, get it from the model repository [here](https://huggingface.co/google/gemma-2b) your request will shortly be accepted.

In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

We'll shrink the model even further by loading it in 4bit using `bitsandbytes`. Then initialize the model with the CausalLM head and initialize the tokenizer.

In [4]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import os

model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/33.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Load the dataset.

In [5]:
from datasets import load_dataset

data = load_dataset("VMware/open-instruct", split="train")


Downloading readme:   0%|          | 0.00/2.50k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/57.9M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/142622 [00:00<?, ? examples/s]

Concat Alpaca prompt with responses.

In [6]:
data

Dataset({
    features: ['alpaca_prompt', 'response', 'instruction', 'source', 'task_name', 'template_type'],
    num_rows: 142622
})

In [7]:
texts = []
for prompt, response in zip(data["alpaca_prompt"], data["response"]):
  text = prompt + response
  texts.append(text)

texts

['Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nCan you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.\n\n### Response:"Monopsony" refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions.\n\nRecent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens &

Add the concatenated column.

In [8]:
data = data.add_column("text_column", texts)
data

Dataset({
    features: ['alpaca_prompt', 'response', 'instruction', 'source', 'task_name', 'template_type', 'text_column'],
    num_rows: 142622
})

Remove unnecessary columns.

In [9]:
data = data.remove_columns(["source", "alpaca_prompt", "response", "task_name", "template_type", "instruction"])
data

Dataset({
    features: ['text_column'],
    num_rows: 142622
})

Depending on your dataset prompts, you might want to truncate and handle overflowing tokens like below. If you keep it like this, your prompts will be truncated though and you will have bad results. 😔 So adjust the below cell depending on what you need.

In [10]:
def tokenize_dataset(ds):
  result = tokenizer(ds["text_column"],truncation=True,
                       max_length=512)
  #sample_map = result.pop("overflow_to_sample_mapping")
  #for key, values in ds.items():
  #  result[key] = [values[i] for i in sample_map]
  #  print(result[key])
  return result

In [11]:
ds = data.map(tokenize_dataset)

Map:   0%|          | 0/142622 [00:00<?, ? examples/s]

In [12]:
ds

Dataset({
    features: ['text_column', 'input_ids', 'attention_mask'],
    num_rows: 142622
})

Initializing `SFTTrainer` from TRL is all you need!

Small note: if your dataset needs formatting, you can write a formatting function and pass it. You need to either pass `formatting_func` or `dataset_text_field` if your dataset text field doesn't need any formatting and you did your preprocessing beforehand.

Then simply call ` train`. Note that this notebook is built for educational purposes so you might need to adjust the hyperparameters to your own use case.

In [13]:
import transformers
from trl import SFTTrainer

new_model = "Gemma_fine-tunned"

trainer = SFTTrainer(
    model=model,
    train_dataset=ds,
    dataset_text_field="text_column",
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=30,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    #formatting_func=formatting_func,
)
trainer.train()

#Save Fine-tunned Model
trainer.model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

max_steps is given, it will override any value given in num_train_epochs


Step,Training Loss
1,2.982
2,2.9131
3,3.1114
4,2.8768
5,2.5528
6,2.7869
7,2.8327
8,2.9049
9,3.0946
10,2.7818


('Gemma_fine-tunned/tokenizer_config.json',
 'Gemma_fine-tunned/special_tokens_map.json',
 'Gemma_fine-tunned/tokenizer.model',
 'Gemma_fine-tunned/added_tokens.json',
 'Gemma_fine-tunned/tokenizer.json')

In [14]:
text = "Write a news style post about a fake event, like aliens from Mars landing on Earth. It is meant to be funny but also be written in the authoritative style of a news report, kind of like The Onion. ### Response:"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)

In [15]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Write a news style post about a fake event, like aliens from Mars landing on Earth. It is meant to be funny but also be written in the authoritative style of a news report, kind of like The Onion. ### Response: The aliens from Mars landed on Earth. They were very friendly and wanted to help us. They showed


Merge Base Model with trained Model

In [18]:
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Push Model to HuggingFace Hub

In [19]:
model.push_to_hub(new_model)
tokenizer.push_to_hub(new_model)

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/eddyejembi/Gemma_fine-tunned/commit/30cb19fa97a2decaacb990ab9440a34990d9bb6c', commit_message='Upload tokenizer', commit_description='', oid='30cb19fa97a2decaacb990ab9440a34990d9bb6c', pr_url=None, pr_revision=None, pr_num=None)

Load the Saved Model from Hub and Use

In [4]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
import os

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained("eddyejembi/Gemma_fine-tunned")
model = AutoModelForCausalLM.from_pretrained("eddyejembi/Gemma_fine-tunned", quantization_config=bnb_config, device_map={"":0})


text = "Write a news style post about a fake event, like minions going on strike. It is meant to be funny but also be written in the authoritative style of a news report, kind of like The Onion. ### Response:"

device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=30)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Write a news style post about a fake event, like minions going on strike. It is meant to be funny but also be written in the authoritative style of a news report, kind of like The Onion. ### Response: The minions have been on strike for the past few days, demanding better pay and working conditions. The company has been trying to negotiate with the union,
