# FunctionGemma Fine-tuning for Flutter

Fine-tuning notebook for [flutter_gemma](https://github.com/AdenisovAV/flutter_gemma) plugin.

Trains FunctionGemma 270M for custom function calling:
- `change_background_color` - Changes app background color
- `change_app_title` - Changes app title
- `show_alert` - Shows alert dialog

**Pipeline:**
1. **This notebook** - Fine-tune model
2. `functiongemma_to_tflite.ipynb` - Convert to TFLite
3. `functiongemma_tflite_to_task.ipynb` - Bundle as .task for Flutter

**Requirements:**
- A100 GPU runtime (Runtime â†’ Change runtime type â†’ A100)
- HuggingFace account with accepted [Gemma license](https://huggingface.co/google/functiongemma-270m-it)
- HuggingFace token with write access

## 1. Install Dependencies

**What we're installing:**
- `torch` - PyTorch, the core ML framework
- `transformers` - HuggingFace library for working with LLMs
- `trl` - Transformer Reinforcement Learning, for SFT (Supervised Fine-Tuning)
- `datasets` - for dataset handling
- `accelerate` - for distributed training and GPU optimization
- `sentencepiece` - tokenizer for Gemma models

**Versions are pinned** for reproducibility.

In [None]:
# =============================================================================
# Install dependencies (use Colab's pre-installed torch)
# =============================================================================
# Don't reinstall torch - Colab has optimized version pre-installed

!pip install -q transformers==4.57.3 datasets accelerate evaluate trl==0.26.2 protobuf sentencepiece
!pip install -q huggingface_hub tensorboard

print("\nDependencies installed!")

## 2. HuggingFace Authentication

**Setup Colab Secret:**
1. Click the ðŸ”‘ key icon in the left panel
2. Add new secret: `HF_TOKEN`
3. Paste your HuggingFace token (from https://huggingface.co/settings/tokens)
4. Toggle "Notebook access" ON

**Don't forget:** Accept the Gemma license at https://huggingface.co/google/functiongemma-270m-it

In [None]:
# =============================================================================
# Authenticate with HuggingFace using Colab Secrets
# =============================================================================
from google.colab import userdata
from huggingface_hub import login

# Read token from Colab Secrets (key icon in left panel)
HF_TOKEN = userdata.get('HF_TOKEN')

if not HF_TOKEN:
    raise ValueError("HF_TOKEN not found in Colab Secrets. Add it via the ðŸ”‘ key icon.")

login(token=HF_TOKEN)
print("Logged in to HuggingFace!")

## 3. Upload Training Data

**File:** `training_data.jsonl` (284 examples)

**Format per line:**
```json
{"user_content": "make it red", "tool_name": "change_background_color", "tool_arguments": "{\"color\": \"red\"}"}
```

**How to upload:**
1. Drag and drop the file into the left panel (Files)
2. Or run this cell - an Upload button will appear

In [None]:
from google.colab import files
import os

if not os.path.exists('training_data.jsonl'):
    print("Please upload training_data.jsonl:")
    uploaded = files.upload()
else:
    print("âœ… training_data.jsonl already exists")

# Verify file
!wc -l training_data.jsonl
!head -1 training_data.jsonl

## 4. Define Tools and Prepare Dataset

### 4.1 Define Functions (Tools)

We define Python functions with type hints and docstrings. The `get_json_schema()` utility from HuggingFace transformers automatically generates JSON Schema for each function.

**Why this approach:**
- Type hints â†’ parameter types in schema
- Docstrings â†’ function descriptions
- No manual JSON writing needed

In [None]:
import json
from datasets import Dataset
from transformers.utils import get_json_schema

# =============================================================================
# STEP 4.1: Define Python functions for JSON Schema generation
# =============================================================================
# These functions are NOT executed - they're only used for JSON Schema generation.
# get_json_schema() reads the function name, docstring, and type hints,
# and creates a JSON Schema in OpenAI function calling format.

def change_background_color(color: str) -> str:
    """Changes the app background color to specified color.

    Args:
        color: The color name (red, green, blue, yellow, purple, orange)
    """
    return f"Changed to {color}"

def change_app_title(title: str) -> str:
    """Changes the application title text in the AppBar.

    Args:
        title: The new title text to display
    """
    return f"Title set to {title}"

def show_alert(title: str, message: str) -> str:
    """Shows an alert dialog with a custom message and title.

    Args:
        title: The title of the alert dialog
        message: The message content of the alert dialog
    """
    return f"Alert shown: {title}"

# =============================================================================
# Generate JSON Schemas from Python functions
# =============================================================================
# get_json_schema() creates a structure like:
# {"type": "function", "function": {"name": "...", "description": "...", "parameters": {...}}}

TOOLS = [
    get_json_schema(change_background_color),
    get_json_schema(change_app_title),
    get_json_schema(show_alert),
]

print("Tools defined:")
for tool in TOOLS:
    print(f"   - {tool['function']['name']}: {tool['function']['description'][:50]}...")

### 4.2 Convert Dataset

Our simple training format is converted to FunctionGemma conversation format:

```
Before (our format):
{"user_content": "make it red", "tool_name": "change_background_color", "tool_arguments": "{\"color\": \"red\"}"}

After (FunctionGemma format):
{
  "messages": [
    {"role": "developer", "content": "You are a model that can do function calling with the following functions"},
    {"role": "user", "content": "make it red"},
    {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "...", "arguments": {...}}}]}
  ],
  "tools": [... JSON schemas ...]
}
```

**Key points:**
- `developer` role is required (not `system`) - this activates function calling mode
- `tools` array contains JSON schemas for available functions
- `tool_calls` in assistant response shows expected output

In [None]:
# =============================================================================
# STEP 4.2: Load and convert dataset to FunctionGemma format
# =============================================================================

# System message for developer role (official Google format)
# Source: https://huggingface.co/google/functiongemma-270m-it
DEFAULT_SYSTEM_MSG = "You are a model that can do function calling with the following functions"

def create_conversation(sample):
    """
    Converts simple format to full FunctionGemma conversation format.
    
    Input (our simple format):
        {"user_content": "make it red", "tool_name": "change_background_color", "tool_arguments": "{\"color\": \"red\"}"}
    
    Output (FunctionGemma format):
        {
            "messages": [
                {"role": "developer", "content": "You are a model..."},  # System prompt
                {"role": "user", "content": "make it red"},              # User request
                {"role": "assistant", "tool_calls": [                    # Expected response
                    {"type": "function", "function": {"name": "change_background_color", "arguments": {"color": "red"}}}
                ]}
            ],
            "tools": [... JSON Schemas ...]  # Available tools
        }
    """
    return {
        "messages": [
            {"role": "developer", "content": DEFAULT_SYSTEM_MSG},
            {"role": "user", "content": sample["user_content"]},
            {
                "role": "assistant",
                "tool_calls": [
                    {
                        "type": "function",
                        "function": {
                            "name": sample["tool_name"],
                            "arguments": json.loads(sample["tool_arguments"])  # Parse JSON string to dict
                        }
                    }
                ]
            },
        ],
        "tools": TOOLS
    }

# =============================================================================
# Load raw JSONL data
# =============================================================================
raw_data = []
with open('training_data.jsonl', 'r') as f:
    for line in f:
        raw_data.append(json.loads(line.strip()))

print(f"Loaded {len(raw_data)} raw examples")

# =============================================================================
# Convert to HuggingFace Dataset and apply transformation
# =============================================================================
dataset = Dataset.from_list(raw_data)
dataset = dataset.map(create_conversation, remove_columns=dataset.features)

# Split into train/test (90%/10%)
dataset = dataset.train_test_split(test_size=0.1, shuffle=True, seed=42)

print(f"Dataset prepared:")
print(f"   Train: {len(dataset['train'])} examples")
print(f"   Test:  {len(dataset['test'])} examples")
print(f"\nSample conversation (first example):")
print(json.dumps(dataset['train'][0], indent=2)[:500] + "...")

## 5. Load Base Model

**Model:** `google/functiongemma-270m-it`
- 270M parameters (compact, designed for on-device)
- Instruction-tuned (it) - already trained to follow instructions
- Specialized for function calling

**Loading parameters:**
- `torch_dtype=bfloat16` - 16-bit weights to save memory (~540MB instead of ~1GB)
- `device_map="auto"` - automatically load to GPU
- `attn_implementation="eager"` - without FlashAttention (for compatibility)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# =============================================================================
# Load FunctionGemma base model
# =============================================================================
BASE_MODEL = "google/functiongemma-270m-it"

print(f"Loading {BASE_MODEL}...")
print("   (Downloads ~540MB on first run, then uses cache)")

model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,      # 16-bit to save VRAM
    device_map="auto",                # Automatically load to GPU
    attn_implementation="eager"       # Without FlashAttention for compatibility
)

# Tokenizer converts text to tokens and back
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

print(f"\nModel loaded!")
print(f"   Parameters: {model.num_parameters():,}")
print(f"   Memory: ~{model.num_parameters() * 2 / 1e9:.1f} GB (bfloat16)")
print(f"   Device: {model.device}")

## 6. Configure Training

**Hyperparameters from official Google FunctionGemma cookbook:**

| Parameter | Value | Explanation |
|-----------|-------|-------------|
| `num_train_epochs` | 3 | How many times to iterate through the entire dataset |
| `learning_rate` | 1e-5 | Learning rate (conservative for fine-tuning) |
| `lr_scheduler_type` | cosine | Smoothly decreases LR towards end of training |
| `gradient_accumulation_steps` | 8 | Gradient accumulation (effective batch = 32) |
| `max_length` | 1024 | Maximum sequence length in tokens |
| `bf16` | True | 16-bit training to save memory |

**Why these values:**
- LR 1e-5 (not 5e-5) - prevents "forgetting" base knowledge
- Cosine scheduler - smooth LR decay improves convergence
- Gradient accumulation 8 - simulates large batch without running out of memory

In [None]:
from trl import SFTConfig, SFTTrainer

# Output directory
OUTPUT_DIR = "functiongemma-flutter-demo"

# =============================================================================
# Training configuration (based on official Google FunctionGemma cookbook)
# https://github.com/google-gemini/gemma-cookbook/blob/main/FunctionGemma/
# =============================================================================
training_args = SFTConfig(
    output_dir=OUTPUT_DIR,
    
    # Training params (Google official uses 2 epochs, we use 3 for small dataset)
    max_length=1024,                    # Max sequence length in tokens
    packing=False,                      # Don't pack multiple examples into one sequence
    num_train_epochs=3,                 # Google uses 2, we add 1 for small dataset (284 examples)
    per_device_train_batch_size=4,      # Batch size per GPU
    per_device_eval_batch_size=4,       # Eval batch size
    gradient_accumulation_steps=8,      # Effective batch size: 4 * 8 = 32
    
    # Optimizer (Google official params)
    learning_rate=1e-5,                 # Google official: 1e-5 (more conservative than 5e-5)
    lr_scheduler_type="cosine",         # Google official: cosine decay
    optim="adamw_torch_fused",          # Fused AdamW for faster training
    warmup_ratio=0.1,                   # 10% warmup steps
    
    # Logging and checkpoints
    logging_steps=10,                   # Log every 10 steps
    eval_strategy="epoch",              # Evaluate after each epoch
    save_strategy="epoch",              # Save checkpoint after each epoch
    
    # Memory optimization
    gradient_checkpointing=False,       # Trade compute for memory (enable if OOM)
    bf16=True,                          # Use bfloat16 for training
    
    # Output
    report_to="tensorboard",            # Log to TensorBoard
    push_to_hub=False,                  # Set to True to upload to HuggingFace
)

print("Training configuration (Google official params):")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size: {training_args.per_device_train_batch_size}")
print(f"   Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   LR scheduler: {training_args.lr_scheduler_type}")
print(f"   Max length: {training_args.max_length}")

## 7. Start Training

**What happens:**
1. `SFTTrainer` applies `tokenizer.apply_chat_template()` to each example
2. Model learns to predict the correct `tool_call` for each `user_content`
3. Loss is calculated only on assistant responses (not on prompts)

**Training time:** ~5 minutes on A100 GPU (for ~300 examples)

**Monitoring:**
- `loss` should decrease
- `eval_loss` should not increase (otherwise overfitting)

In [None]:
# =============================================================================
# Create SFTTrainer and start training
# =============================================================================
# SFTTrainer (Supervised Fine-Tuning Trainer) from TRL library
# automatically:
# - Applies chat template to conversations
# - Masks loss on prompts (trains only on responses)
# - Manages gradients and optimization

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    processing_class=tokenizer,  # For apply_chat_template
)

print("Starting training...")
print(f"   Train examples: {len(dataset['train'])}")
print(f"   Eval examples: {len(dataset['test'])}")
print(f"   Estimated time: ~5 minutes on A100")
print("-" * 50)

# Train! This is the main training process
train_result = trainer.train()

print("\n" + "=" * 50)
print("Training complete!")
print(f"   Final loss: {train_result.training_loss:.4f}")

## 8. Save Model

**Files saved:**
- `model.safetensors` - model weights (~540MB)
- `config.json` - architecture configuration
- `tokenizer.json`, `tokenizer_config.json` - tokenizer
- `special_tokens_map.json` - special tokens

**Format:** SafeTensors (safe, no pickle)

In [None]:
# =============================================================================
# Save the fine-tuned model to Google Drive
# =============================================================================
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

FINAL_MODEL_DIR = f"{OUTPUT_DIR}-final"
DRIVE_MODEL_DIR = f"/content/drive/MyDrive/{FINAL_MODEL_DIR}"

# Save model weights and config
trainer.save_model(FINAL_MODEL_DIR)

# Save tokenizer (needed for inference)
tokenizer.save_pretrained(FINAL_MODEL_DIR)

print(f"Model saved locally to {FINAL_MODEL_DIR}/")

# Copy to Google Drive
!cp -r {FINAL_MODEL_DIR} /content/drive/MyDrive/

print(f"\nModel copied to Google Drive: {DRIVE_MODEL_DIR}/")
print("You can now use this in the conversion notebook!")
!ls -la {DRIVE_MODEL_DIR}/

## 9. Test Fine-tuned Model

**Important:** We test on NEW prompts that were NOT in the training dataset!

We check:
- Does the model choose the correct function?
- Are the arguments correct?
- Are there any hallucinations?

In [None]:
# =============================================================================
# Test the fine-tuned model on new prompts
# =============================================================================
# These prompts were NOT in training data - we're testing generalization

test_prompts = [
    "make the background red",       # change_background_color
    "rename the app to Hello World", # change_app_title  
    "show an alert saying welcome",  # show_alert
    "I want a purple background",    # Variation for color
    "set title to My App",           # Variation for title
]

print("Testing fine-tuned model:")
print("=" * 60)

for prompt in test_prompts:
    # Create conversation in FunctionGemma format
    messages = [
        {"role": "developer", "content": DEFAULT_SYSTEM_MSG},
        {"role": "user", "content": prompt}
    ]
    
    # apply_chat_template converts messages to tokens with correct markers
    input_text = tokenizer.apply_chat_template(
        messages,
        tools=TOOLS,                    # Pass available functions
        tokenize=False,                 # Return text, not tokens
        add_generation_prompt=True      # Add marker for assistant response start
    )
    
    # Tokenize and send to GPU
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    
    # Generate response
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,  # Max tokens in response
        do_sample=False      # Greedy decoding (deterministic)
    )
    
    # Decode only new tokens (without prompt)
    response = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:], 
        skip_special_tokens=False  # Keep special tokens for debugging
    )
    
    print(f"\nUser: {prompt}")
    print(f"Model: {response.strip()}")
    print("-" * 60)

## 10. Download Model

**Download ZIP archive** with all model files.

**Next steps after downloading:**
1. **Convert to TFLite** - using `ai-edge-torch`
2. **Bundle as .task** - using MediaPipe Model Bundler
3. **Integrate into Flutter** - put in assets or download from server

In [None]:
# =============================================================================
# Zip and download the model
# =============================================================================
!zip -r {FINAL_MODEL_DIR}.zip {FINAL_MODEL_DIR}/

from google.colab import files
files.download(f"{FINAL_MODEL_DIR}.zip")

print(f"\nDownload started: {FINAL_MODEL_DIR}.zip")
print("\nNext steps:")
print("1. Convert to TFLite using ai-edge-torch")
print("2. Bundle as .task using MediaPipe bundler")
print("3. Add to flutter_gemma example app")

## Optional: Push to HuggingFace Hub

In [None]:
# Uncomment to push to HuggingFace Hub
# Replace 'your-username' with your HuggingFace username

# HUB_MODEL_ID = "your-username/functiongemma-flutter-demo"
# trainer.push_to_hub(HUB_MODEL_ID)
# print(f"âœ… Model pushed to https://huggingface.co/{HUB_MODEL_ID}")