<a href="https://colab.research.google.com/github/Chandramadi/gemma-2b-fine-tuning/blob/main/Fine_Tuning_Gemma_2b.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install the libraries
### bitsandbytes - > for quantization

In [1]:
!pip install bitsandbytes peft trl accelerate datasets transformers


Collecting bitsandbytes
  Downloading bitsandbytes-0.45.4-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12=

## import the modules

In [3]:
import os
import transformers
import torch
from datasets import load_dataset
from google.colab import userdata
from trl import SFTTrainer # sftTrainer is used for fine tuning called supervised fine tuning
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer

In [5]:
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')
# through the hf_token you can access any hugging face models

## Quantization to nf-4bit
nf4(4-bit NormalFloat)

In [6]:
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

In [9]:
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map={"":0},
    token=os.environ['HF_TOKEN']
)

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

## Test the loaded model

In [12]:
text = "Quote : Imagination is more,"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Quote : Imagination is more, than knowledge.

I am a self-taught artist, born in 1977 in the heart of the French Alps.

I have always been fascinated by the beauty of the mountains and the surrounding nature.

I have always been attracted


In [13]:
text = "Quote : Imagination is more"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Quote : Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.

- Albert Einstein

The <strong><em>Imagination</em></strong> is the most important part of the human being.

The <strong><em>Imagination</em></strong> is the most important


In [28]:
text = "Quote : Be yourself; everyone else is already taken."
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Quote : Be yourself; everyone else is already taken.

The quote is a very famous one and it is very popular. It is a very good quote and it is very famous. It is a very good quote and it is very famous. It is a very good quote and it is very famous.


## Fine tuning

In [47]:
os.environ["WANDB_DISABLED"] = "false"

In [17]:
lora_config = LoraConfig(
    r = 8,
    target_modules = ["q_proj", "o_proj", "k_proj",
                       "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type = "CAUSAL_LM"
)

In [22]:
from datasets import load_dataset
data = load_dataset("Abirate/english_quotes") # it's a dataset available on hugging face
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

In [31]:
def formatting_func(example):
  text = f"Quote: {example['quote']}\nAuthor: {example['author']}\n"
  return text

In [37]:
trainer = SFTTrainer(
    model = model,
    train_dataset = data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        label_names=["labels"]
    ),
    peft_config=lora_config,
    formatting_func=formatting_func
)



In [48]:
text = "Quote : Two things are infinite: the universe and human stupidity;"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Quote : Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.
- Albert Einstein

Albert Einstein was a German-born theoretical physicist who is widely regarded as one of the most influential scientists of all time. He is best known for his theory of relativity, which
