# Notebook 2: Fine-Tuning Gemma with PEFT/LoRA

This notebook performs the core fine-tuning process. We will:
1. Load the base `gemma-2b-it` model in 4-bit precision.
2. Load our custom dataset.
3. Configure LoRA to create trainable adapters.
4. Run the training using the `SFTTrainer` from the TRL library.
5. Save the resulting adapters for later use.

**Note:** This requires a GPU with sufficient VRAM (e.g., NVIDIA T4, V100, or A100).

### Step 1: Install and Import Dependencies

You would first need to install the required libraries. A `requirements.txt` would look like this:
```
torch
transformers
bitsandbytes
peft
trl
datasets
accelerate
```

In [None]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer

### Step 2: Load Model and Tokenizer

We load the model in 4-bit using `BitsAndBytesConfig` to make it fit into memory. We also prepare it for k-bit training, which stabilizes the process.

In [None]:
model_name = "google/gemma-2b-it"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map="auto")
model.config.use_cache = False # Recommended for fine-tuning
model = prepare_model_for_kbit_training(model)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

### Step 3: Configure LoRA

Here, we define the LoRA configuration. We specify which layers of the model we want to attach the trainable adapters to. For Gemma, targeting all linear layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, etc.) is a good starting point.

In [None]:
lora_config = LoraConfig(
    r=8, # Rank of the update matrices. Lower rank means fewer trainable parameters.
    lora_alpha=32, # Alpha scaling factor.
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] # Target all linear layers
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # This will show how few parameters we are actually training!

### Step 4: Load Dataset and Set Up Trainer

In [None]:
dataset = load_dataset('json', data_files='dataset.jsonl', split='train')

# The SFTTrainer needs a function to format the dataset entries
def formatting_func(example):
    # Note: The exact format depends on the base model's training.
    # For Gemma-IT, a specific chat template is expected.
    text = f"<start_of_turn>user\n{example['prompt']}<end_of_turn>\n<start_of_turn>model\n{example['response']}<end_of_turn>"
    return [text]

training_args = TrainingArguments(
    output_dir="./gemma-pandas-expert",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=3, # Use more epochs for a real dataset
    logging_steps=10,
    save_strategy="epoch",
    fp16=True, # Use fp16 for faster training
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    peft_config=lora_config,
    formatting_func=formatting_func,
    max_seq_length=1024, # Adjust based on your VRAM
    tokenizer=tokenizer,
)

### Step 5: Start Training

In [None]:
trainer.train()

print("Training complete!")

# Save the LoRA adapters
trainer.save_model("./gemma-pandas-expert-adapters")
print("Model adapters saved.")