# Gemma Model Fine-tuning

This notebook demonstrates how to fine-tune the Gemma model from Google DeepMind. We'll cover:
1. Setting up the environment
2. Loading and preparing the dataset
3. Fine-tuning the model
4. Evaluating the results

> **Note**: This notebook requires a GPU runtime. Make sure to select "GPU" in Colab's Runtime settings.

## Environment Setup

First, we'll install the required packages. We'll use specific versions to ensure compatibility:

In [None]:
!pip install -q torch==2.1.2 transformers==4.37.2 datasets==2.16.1 accelerate==0.26.1 peft==0.8.2
!pip install -q bitsandbytes==0.41.3 trl==0.7.10 sentencepiece==0.1.99
!pip install -q google-cloud-aiplatform==1.38.1

## Import Dependencies

Now let's import all the necessary libraries:

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Set warning levels
logging.set_verbosity_error()
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## Model Configuration

Let's set up the configuration for Gemma model loading and quantization:

In [None]:
# Model configuration
MODEL_NAME = "google/gemma-2b-it"
DATASET_NAME = "databricks/databricks-dolly-15k"  # Example dataset
MAX_SEQ_LENGTH = 1024
LEARNING_RATE = 2e-4
BATCH_SIZE = 2
GRADIENT_ACCUMULATION_STEPS = 4
OUTPUT_DIR = "./gemma-finetuned"

# Quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# LoRA configuration
lora_config = LoraConfig(
    r=64,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

## Load Model and Tokenizer

Now we'll load the Gemma model and tokenizer with our configurations:

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)
model.config.use_cache = False  # Required for gradient checkpointing

## Prepare Dataset

Let's load and prepare the dataset for fine-tuning. We'll use the Dolly dataset as an example:

In [None]:
def format_instruction(example):
    """Format the instruction-response pair."""
    return f"""### Instruction: {example['instruction']}

### Input: {example['context']}

### Response: {example['response']}"""

# Load and preprocess the dataset
dataset = load_dataset(DATASET_NAME, split="train")

# Format the dataset
formatted_dataset = dataset.map(
    lambda x: {"text": format_instruction(x)},
    remove_columns=dataset.column_names
)

# Show an example
print("Sample formatted data:")
print(formatted_dataset[0]['text'])

## Training Setup

Configure the training arguments and initialize the trainer:

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    learning_rate=LEARNING_RATE,
    weight_decay=0.01,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    save_total_limit=2,
    remove_unused_columns=False,
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    peft_config=lora_config,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    tokenizer=tokenizer,
    args=training_args,
)

## Start Training

Now we can start the fine-tuning process:

In [None]:
# Start the training
trainer.train()

## Test the Fine-tuned Model

Let's test our fine-tuned model with some example prompts:

In [None]:
# Load the fine-tuned model
fine_tuned_model = AutoModelForCausalLM.from_pretrained(
    OUTPUT_DIR,
    device_map="auto",
    trust_remote_code=True
)

# Create a pipeline for text generation
pipe = pipeline(
    "text-generation",
    model=fine_tuned_model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

# Test prompt
test_prompt = """### Instruction: Explain the concept of machine learning in simple terms.

### Input: None

### Response:"""

# Generate response
response = pipe(test_prompt)[0]['generated_text']
print("Model response:")
print(response.split("### Response:")[-1].strip())

## Save and Share the Model

If you want to save and share your fine-tuned model, you can upload it to the Hugging Face Hub:

In [None]:
# Save the model locally
trainer.model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

# Optional: Upload to Hugging Face Hub
# from huggingface_hub import notebook_login
# notebook_login()
# trainer.model.push_to_hub("your-username/your-model-name")