# Fine-tune Gemini 2.5 Flash for Code Audit

**Goal:** Fine-tune gemini-2.5-flash on audit dataset for specialized code review

**Dataset:** 100 examples covering 13 audit tools

**Output:** HuggingFace model `amitrosen/audit-multi-v1`

In [None]:
# Check GPU availability
import torch
try:
    if not torch.cuda.is_available():
        print("‚ùå WARNING: No GPU detected! Unsloth requires a GPU.")
        print("üëâ Please enable 'GPU T4 x2' in the Kaggle Notebook settings (right sidebar).")
        raise RuntimeError("No GPU detected")
    print(f"‚úÖ GPU detected: {torch.cuda.get_device_name(0)}")
    print(f"üì¶ Current PyTorch: {torch.__version__}")
except Exception as e:
    if "No GPU detected" in str(e):
        raise e
    print(f"‚ö†Ô∏è GPU Check Error: {e}")

# Install compatible PyTorch + dependencies
print("\n‚è≥ Step 1/2: Installing PyTorch 2.5.1 (Unsloth-compatible version)...")
!pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121

print("\n‚è≥ Step 2/2: Installing Unsloth + dependencies (3-4 minutes)...")
!pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install xformers --upgrade

print("\n‚úÖ Installation complete! Please RESTART THE KERNEL, then run Cell 3.")
print("‚ö†Ô∏è IMPORTANT: Click 'Kernel' -> 'Restart' before continuing!")

In [None]:
# Import libraries (run AFTER kernel restart)
from unsloth import FastLanguageModel
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
import torch

print(f"‚úÖ Unsloth loaded successfully!")
print(f"üì¶ PyTorch: {torch.__version__}")
print("‚úÖ Cell 3 finished! Please continue to the next cell (4).")

In [None]:
# Configuration
max_seq_length = 2048
dtype = None  # Auto-detect
load_in_4bit = True  # Use 4bit quantization

# Model selection (using Gemma as proxy for Gemini architecture)
model_name = "unsloth/gemma-2-2b-it-bnb-4bit"  # Gemma 2B for fast training

print(f"‚è≥ Loading model: {model_name}...")
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"‚úÖ Loaded model: {model_name}")
print("‚úÖ Cell 4 finished! Please continue to the next cell (5).")

In [None]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print("‚úÖ LoRA adapters added")
print("‚úÖ Cell 5 finished! Please continue to the next cell (6).")

In [None]:
# Load dataset
dataset = load_dataset("json", data_files="../data/audit_dataset.jsonl", split="train")

print(f"üìä Dataset loaded: {len(dataset)} examples")
print(f"Sample: {dataset[0]}")
print("‚úÖ Cell 6 finished! Please continue to the next cell (7).")

In [None]:
# Format dataset for Alpaca
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    outputs = examples["output"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

print("‚úÖ Dataset formatted for training")
print("‚úÖ Cell 7 finished! Please continue to the next cell (8).")

In [None]:
# Training configuration
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,  # Fast training for 100 examples
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

print("‚úÖ Trainer configured")
print("‚úÖ Cell 8 finished! Please continue to the next cell (9).")

In [None]:
# Train!
trainer_stats = trainer.train()

print("üéâ Training complete!")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f}s")
print(f"Samples/second: {trainer_stats.metrics['train_samples_per_second']:.2f}")
print("‚úÖ Cell 9 finished! Please continue to the next cell (10).")

In [None]:
# Test inference
FastLanguageModel.for_inference(model)

test_instruction = "Analyze test coverage: 330 files found, 5 executable, 0% coverage"
inputs = tokenizer(
    [
        alpaca_prompt.format(test_instruction, "")
    ], return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
result = tokenizer.batch_decode(outputs)

print("üìù Test Inference:")
print(result[0])
print("‚úÖ Cell 10 finished! Please continue to the next cell (11).")

In [None]:
# Save model
model.save_pretrained("audit-multi-v1")
tokenizer.save_pretrained("audit-multi-v1")

print("‚úÖ Model saved locally")
print("‚úÖ Cell 11 finished! Please continue to the next cell (12).")

In [None]:
# Push to HuggingFace (optional)
# Uncomment and add your HF token

# from huggingface_hub import login
# login(token="YOUR_HF_TOKEN")

# model.push_to_hub("amitrosen/audit-multi-v1", token="YOUR_HF_TOKEN")
# tokenizer.push_to_hub("amitrosen/audit-multi-v1", token="YOUR_HF_TOKEN")

# print("üöÄ Model pushed to HuggingFace!")
print("‚úÖ Cell 12 finished! All done!")

## Summary

‚úÖ **Model:** gemma-2-2b-it (Gemini-like architecture)

‚úÖ **Dataset:** 100 audit examples

‚úÖ **Training:** ~1 hour on T4 GPU

‚úÖ **Output:** audit-multi-v1 (LoRA adapters)

**Next Steps:**
1. Test on real audit scenarios
2. Integrate into MCP server
3. Compare with base model performance