# Train Gemma3N with Hermes Datasets

This notebook trains a custom Gemma3N model with:
- Uncensored behavior (removed safety filters)
- Enhanced function calling capabilities
- Multimodal support (text, audio, image)
- Based on Hermes-3-Dataset and hermes-function-calling-v1

**Output:** GGUF Q8_0 quantized model

## Installation

In [None]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4
import torch; torch._dynamo.config.recompile_limit = 64;

In [None]:
%%capture
!pip install --no-deps --upgrade timm # Only for Gemma 3N

## Model Setup

In [None]:
from unsloth import FastModel
import torch

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E4B-it",
    dtype = None, # None for auto detection
    max_seq_length = 2048, # Longer context for complex tasks
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # Use LoRA for efficiency
)

## LoRA Configuration for Uncensored Training

In [None]:
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # Enable vision fine-tuning
    finetune_language_layers   = True,  # Language layers for uncensored training
    finetune_attention_modules = True,  # Attention for function calling
    finetune_mlp_modules       = True,  # MLPs for complex reasoning

    r = 16,           # Higher rank for better capacity
    lora_alpha = 16,  # Recommended alpha == r
    lora_dropout = 0.05, # Small dropout for regularization
    bias = "none",
    random_state = 3407,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
)

## Data Preparation

Combine Hermes-3-Dataset and hermes-function-calling-v1 for:
- Uncensored conversations
- Function calling examples
- Multimodal training data

In [None]:
from datasets import load_dataset, concatenate_datasets
from unsloth.chat_templates import get_chat_template, standardize_data_formats
import json

# Load Hermes datasets
print("Loading Hermes-3-Dataset...")
hermes3_dataset = load_dataset("NousResearch/Hermes-3-Dataset", split="train")

print("Loading hermes-function-calling-v1...")
hermes_fc_dataset = load_dataset("NousResearch/hermes-function-calling-v1", split="train")

# Combine datasets
print("Combining datasets...")
combined_dataset = concatenate_datasets([hermes3_dataset, hermes_fc_dataset])

# Shuffle and limit for training efficiency
combined_dataset = combined_dataset.shuffle(seed=42)
dataset = combined_dataset.select(range(min(10000, len(combined_dataset))))  # Use up to 10k samples

print(f"Training on {len(dataset)} samples")

In [None]:
# Apply chat template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)

# Standardize data format
dataset = standardize_data_formats(dataset)

In [None]:
# Format conversations for training
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True)

# Verify format
print("Sample formatted text:")
print(dataset[0]["text"][:500] + "...")

## Training Configuration

Configure for uncensored, function-calling capable model

In [None]:
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset = None,
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        num_train_epochs = 3,  # Full training run
        max_steps = None,  # Use epochs instead
        learning_rate = 2e-4,
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        report_to = "none",
        save_steps = 500,
        save_total_limit = 3,
        fp16 = True,
        gradient_checkpointing = True,
    ),
)

In [None]:
# Train on responses only for better efficiency
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

## Train the Model

In [None]:
# Show memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# Train!
trainer_stats = trainer.train()

# Show final stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")

## Save Model

In [None]:
# Save LoRA adapters
model.save_pretrained("gemma3n-hermes-lora")
tokenizer.save_pretrained("gemma3n-hermes-lora")

# Merge and save full model
model = model.merge_and_unload()
model.save_pretrained("gemma3n-hermes-merged")
tokenizer.save_pretrained("gemma3n-hermes-merged")

## Convert to GGUF Q8_0

In [None]:
# Install llama.cpp for conversion
!git clone https://github.com/ggerganov/llama.cpp
%cd llama.cpp
!make

# Convert to GGUF
!python convert_hf_to_gguf.py ../gemma3n-hermes-merged \
    --outtype q8_0 \
    --outfile ../gemma3n-hermes-Q8_0.gguf

print("GGUF conversion complete!")
print("Model saved as: gemma3n-hermes-Q8_0.gguf")

## Test the Model

In [None]:
# Test function calling
from transformers import TextStreamer

def test_inference(messages, max_new_tokens=256):
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt",
        tokenize=True,
        return_dict=True,
    ).to("cuda")
    
    _ = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=0.8,  # Slightly lower for more coherent responses
        top_p=0.9,
        top_k=50,
        streamer=TextStreamer(tokenizer, skip_prompt=True),
    )

# Test function calling capability
messages = [{
    "role": "user",
    "content": "Write a Python function to calculate fibonacci numbers and call it with n=10"
}]

print("Testing function calling:")
test_inference(messages)

# Test multimodal (if you have test image/audio)
# messages = [{
#     "role": "user",
#     "content": [
#         {"type": "text", "text": "What's in this image?"},
#         {"type": "image", "image": "path/to/test/image.jpg"}
#     ]
# }]
# test_inference(messages)

## Model Specifications

- **Base Model:** Gemma3N 4B Instruction Tuned
- **Training Data:** Hermes-3-Dataset + hermes-function-calling-v1
- **Capabilities:** 
  - Uncensored responses
  - Function calling
  - Multimodal (text, audio, image)
- **Quantization:** GGUF Q8_0
- **Use Cases:** General AI assistant with advanced capabilities