# TFLite ‚Üí .litertlm Conversion

Converts fine-tuned FunctionGemma model to `.litertlm` format for LiteRT-LM runtime.

**Pipeline:**
1. [functiongemma_finetuning.ipynb](https://colab.research.google.com/github/DenisovAV/flutter_gemma/blob/main/colabs/functiongemma_finetuning.ipynb) - Fine-tune model ‚úÖ
2. **This notebook** - Convert to .litertlm for LiteRT-LM

**Requirements:**
- A100 GPU runtime
- Fine-tuned model on Google Drive (folder or ZIP)

**‚ö†Ô∏è CRITICAL Loading Parameters:**
- `torch_dtype=torch.bfloat16` (NOT float16!)
- `attn_implementation="eager"`

**Note:** Uses nightly builds. For stable production on iOS/Web, use `.task` format instead.

## Step 1: Install Dependencies

Install ai-edge-torch-nightly for model conversion to .litertlm format.

**Important:**
- We use nightly builds (API may change)
- numpy<2.1 is required for compatibility
- **RESTART RUNTIME** after this step!

In [None]:
# =============================================================================
# Step 1: Install ai-edge-torch-nightly
# =============================================================================
!pip uninstall -y tensorflow 2>/dev/null || true
!pip cache purge

# Install ai-edge-torch packages
!pip install ai-edge-torch-nightly --force-reinstall --no-cache-dir -q
!pip install ai-edge-litert-nightly --no-cache-dir -q

# CRITICAL: Install numpy<2.1 AFTER ai-edge-torch (it may override)
!pip install "numpy<2.1" --force-reinstall -q

# Install transformers with pinned version
!pip install transformers==4.57.3 huggingface_hub sentencepiece -q

# Restore Colab's native Pillow
!pip install Pillow --force-reinstall -q

print("\nInstalled:")
!pip show ai-edge-torch-nightly | grep Version
!pip show transformers | grep Version
!pip show numpy | grep Version
!pip show Pillow | grep Version

print("\n‚ö†Ô∏è  RESTART RUNTIME after this step! (Runtime ‚Üí Restart session)")

## Step 2: Load Model from Google Drive

Load fine-tuned model from the previous notebook `functiongemma_finetuning.ipynb`:
- **Model folder** ‚Äî contains weights, config, tokenizer
- **Or ZIP archive** ‚Äî compressed model folder

Upload to Google Drive before running this cell.

In [None]:
# =============================================================================
# Step 2: Load fine-tuned model from Google Drive
# =============================================================================
from google.colab import drive
import os

drive.mount('/content/drive')

MODEL_NAME = "functiongemma-flutter-demo-final"
MODEL_DIR = MODEL_NAME
DRIVE_MODEL_DIR = f"/content/drive/MyDrive/{MODEL_NAME}"
DRIVE_ZIP = f"/content/drive/MyDrive/{MODEL_NAME}.zip"

if os.path.exists(DRIVE_MODEL_DIR):
    print(f"Found folder: {DRIVE_MODEL_DIR}")
    !cp -r "{DRIVE_MODEL_DIR}" .
elif os.path.exists(DRIVE_ZIP):
    print(f"Found ZIP: {DRIVE_ZIP}")
    !unzip -q "{DRIVE_ZIP}"
else:
    raise FileNotFoundError(f"Model not found!\nUpload to: {DRIVE_MODEL_DIR}/ or {DRIVE_ZIP}")

print(f"\nModel ready:")
!ls -la "{MODEL_DIR}/"

## Step 3: Test Model Before Conversion

**CRITICAL**: Verify the model works BEFORE converting to litertlm.
If it outputs garbage here, the problem is in weight loading, not conversion.

In [None]:
# =============================================================================
# Step 3: Test model BEFORE conversion (using HuggingFace transformers)
# =============================================================================
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

print(f"Loading model from {MODEL_DIR} via HuggingFace transformers...")

# CRITICAL: Must use same parameters as training!
# - bfloat16 (NOT float16!)
# - attn_implementation="eager"
hf_model = AutoModelForCausalLM.from_pretrained(
    MODEL_DIR,
    torch_dtype=torch.bfloat16,           # CRITICAL: same as training!
    device_map="auto",
    attn_implementation="eager"            # CRITICAL: same as training!
)
hf_model.eval()
print(f"HuggingFace model loaded on {hf_model.device}, dtype={hf_model.dtype}")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)

# FunctionGemma test prompt
test_prompt = """<start_of_turn>developer
You are a model that can do function calling with the following functions
<start_function_declaration>declaration:change_background_color{description:<escape>Changes the app background color<escape>,parameters:{properties:{color:{description:<escape>The color name (red, green, blue, yellow, purple, orange)<escape>,type:<escape>STRING<escape>}},required:[<escape>color<escape>],type:<escape>OBJECT<escape>}}<end_function_declaration>
<end_of_turn>
<start_of_turn>user
make it red
<end_of_turn>
<start_of_turn>model
"""

print("\n" + "=" * 50)
print("TESTING FINE-TUNED MODEL (HuggingFace)")
print("=" * 50)
print(f"Input: 'make it red'")

inputs = tokenizer(test_prompt, return_tensors="pt").to(hf_model.device)

with torch.no_grad():
    outputs = hf_model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(f"\nModel output:")
print(response)
print("=" * 50)

# Check if output looks valid
if "change_background_color" in response or "call:" in response:
    print("‚úÖ Fine-tuned model outputs function call - GOOD!")
    print("   Proceeding with conversion...")
elif "<pad>" in response[:50]:
    print("‚ùå Model outputs <pad> - wrong loading parameters!")
    print("   Make sure: torch_dtype=bfloat16, attn_implementation='eager'")
    raise ValueError("STOP: Wrong model loading parameters")
elif "apologize" in response.lower() or "sorry" in response.lower():
    print("‚ùå Model refuses to call function - fine-tuning didn't work!")
    raise ValueError("STOP: Model not fine-tuned correctly")
elif any(c in response for c in "‰∏∫Ë∂≥ÁêÉÊî∂Ê∂àÊ∞î"):
    print("‚ùå Model outputs garbage - fine-tuning is broken!")
    raise ValueError("STOP: Model outputs garbage")
else:
    print("‚ö†Ô∏è Unexpected output - review manually")

# Clean up HF model to free memory before conversion
del hf_model
torch.cuda.empty_cache()
print("\nHuggingFace model unloaded, ready for ai-edge-torch conversion.")

## Step 4: Convert to .litertlm

If the test above shows garbage output, **STOP HERE** - the problem is in `gemma3.build_model_270m()` not loading weights correctly.

In [None]:
# =============================================================================
# Step 4: Convert to .litertlm format (using official Google parameters)
# Source: https://github.com/google-gemini/gemma-cookbook/blob/main/FunctionGemma/
# =============================================================================
from ai_edge_torch.generative.examples.gemma3 import gemma3
from ai_edge_torch.generative.utilities import converter
from ai_edge_torch.generative.utilities.export_config import ExportConfig
from ai_edge_torch.generative.layers import kv_cache

# Load model using ai-edge-torch (required for conversion)
print(f"Loading model from {MODEL_DIR} via ai-edge-torch...")
pytorch_model = gemma3.build_model_270m(MODEL_DIR)
pytorch_model.eval()
print("Model loaded!")

LITERTLM_OUTPUT_DIR = "litertlm_output"
os.makedirs(LITERTLM_OUTPUT_DIR, exist_ok=True)

export_config = ExportConfig()
export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
export_config.mask_as_input = True

# Find tokenizer
TOKENIZER_PATH = f"{MODEL_DIR}/tokenizer.model"
if not os.path.exists(TOKENIZER_PATH):
    from huggingface_hub import hf_hub_download
    TOKENIZER_PATH = hf_hub_download(
        repo_id="google/functiongemma-270m-it",
        filename="tokenizer.model"
    )
print(f"Tokenizer: {TOKENIZER_PATH}")

# =============================================================================
# Create FunctionGemma metadata (OFFICIAL Google format)
# Only 2 stop tokens as per official cookbook
# =============================================================================
METADATA_PATH = f"{LITERTLM_OUTPUT_DIR}/base_llm_metadata.textproto"

metadata_content = r"""start_token: {
    token_ids: {
        ids: [ 2 ]
    }
}
stop_tokens: {
    token_str: "<end_of_turn>"
}
stop_tokens: {
    token_str: "<start_function_response>"
}
llm_model_type: {
    function_gemma: {}
}
"""

with open(METADATA_PATH, 'w') as f:
    f.write(metadata_content)
print(f"Metadata created: {METADATA_PATH}")

print("\n" + "=" * 50)
print("Converting to .litertlm...")
print("Time: ~5-15 min (A100)")
print("=" * 50)

# Convert with OFFICIAL Google parameters
# Source: gemma-cookbook/FunctionGemma/Finetune_FunctionGemma_270M_for_Mobile_Actions
try:
    converter.convert_to_litert(
        pytorch_model,
        output_path=LITERTLM_OUTPUT_DIR,
        output_name_prefix="functiongemma-flutter",
        prefill_seq_len=256,           # Official: 256 (NOT 2048!)
        kv_cache_max_len=1024,         # Official: 1024 (NOT 4096!)
        quantize="dynamic_int8",
        export_config=export_config,
        output_format="litertlm",
        tokenizer_model_path=TOKENIZER_PATH,
        base_llm_metadata_path=METADATA_PATH,  # CRITICAL: base_llm_metadata_path, NOT llm_metadata_path!
    )
    print("\n.litertlm conversion complete!")
except (TypeError, AttributeError) as e:
    print(f"\nlitertlm not supported in this version: {e}")
    print("Falling back to .tflite...")
    converter.convert_to_tflite(
        pytorch_model,
        output_path=LITERTLM_OUTPUT_DIR,
        output_name_prefix="functiongemma-flutter",
        prefill_seq_len=256,
        kv_cache_max_len=1024,
        quantize="dynamic_int8",
        export_config=export_config,
    )
    print("\n.tflite conversion complete")

print("\nGenerated files:")
!ls -lah {LITERTLM_OUTPUT_DIR}/

## Step 5: Save and Download

Save the ready `.litertlm` file:
1. To Google Drive ‚Äî for future use
2. Download locally ‚Äî to use with LiteRT-LM runtime

After downloading, you can use the model with:
- [CLI tool `lit`](https://github.com/google-ai-edge/LiteRT-LM/releases)
- Kotlin API for Android/JVM
- C++ API for native integration

In [None]:
# =============================================================================
# Step 5: Save to Google Drive and download
# =============================================================================
import glob
import shutil
from google.colab import files

# Find output files
output_files = glob.glob(f"{LITERTLM_OUTPUT_DIR}/*.litertlm")
if not output_files:
    output_files = glob.glob(f"{LITERTLM_OUTPUT_DIR}/*.tflite")

if not output_files:
    raise FileNotFoundError("No output files found!")

DRIVE_OUTPUT_DIR = "/content/drive/MyDrive/flutter_gemma_models"
os.makedirs(DRIVE_OUTPUT_DIR, exist_ok=True)

print("Saving to Google Drive:")
for f in output_files:
    size = os.path.getsize(f) / 1e6
    filename = os.path.basename(f)
    drive_path = f"{DRIVE_OUTPUT_DIR}/{filename}"
    shutil.copy(f, drive_path)
    print(f"  {filename} ({size:.1f} MB) -> {drive_path}")

print("\nDownloading:")
for f in output_files:
    files.download(f)

print("\n" + "=" * 50)
print("DONE!")
print("=" * 50)

## Optional: Upload to HuggingFace Hub

Upload the `.litertlm` file to HuggingFace for easy sharing and distribution.

**Setup:**
1. Create a new model repository on [huggingface.co](https://huggingface.co/new)
2. Add `HF_TOKEN` to Colab Secrets (üîë icon in left panel)
3. Change `HUB_REPO_ID` to your repository
4. Uncomment and run the code below

In [None]:
# =============================================================================
# Optional: Upload to HuggingFace Hub
# =============================================================================
# Uncomment the code below to upload

# from huggingface_hub import login, HfApi
# from google.colab import userdata
# import glob
# import os
#
# # Login (uses token from Colab Secrets)
# HF_TOKEN = userdata.get('HF_TOKEN')
# login(token=HF_TOKEN)
#
# # Upload to HuggingFace
# HUB_REPO_ID = "your-username/functiongemma-flutter-litertlm"  # Change this!
#
# api = HfApi()
# api.create_repo(repo_id=HUB_REPO_ID, exist_ok=True)
#
# # Find output file
# output_files = glob.glob(f"{LITERTLM_OUTPUT_DIR}/*.litertlm")
# if not output_files:
#     output_files = glob.glob(f"{LITERTLM_OUTPUT_DIR}/*.tflite")
# OUTPUT_FILE = output_files[0]
# output_filename = os.path.basename(OUTPUT_FILE)
# output_size = os.path.getsize(OUTPUT_FILE) / 1e6
#
# # Create README with license information (required by Gemma Terms)
# README_CONTENT = f"""# FunctionGemma (.litertlm format)
#
# Converted version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it) for LiteRT-LM runtime.
#
# ## Usage
#
# This model is designed for the [flutter_gemma](https://github.com/DenisovAV/flutter_gemma) plugin and LiteRT-LM runtime.
#
# ## Modifications
#
# - Converted from SafeTensors to .litertlm format using ai-edge-torch-nightly
# - Quantized to int8 (dynamic quantization)
# - Bundled with tokenizer and FunctionGemma metadata
#
# ## Files
#
# - `{output_filename}` - LiteRT-LM bundle ({output_size:.0f} MB)
#
# ## License
#
# Gemma is provided under and subject to the Gemma Terms of Use found at https://ai.google.dev/gemma/terms
#
# ## Original Model
#
# - Source: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
# - License: [Gemma License](https://ai.google.dev/gemma/terms)
# - Prohibited Use Policy: [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy)
# """
#
# # Upload README
# api.upload_file(
#     path_or_fileobj=README_CONTENT.encode(),
#     path_in_repo="README.md",
#     repo_id=HUB_REPO_ID,
# )
# print("‚úÖ Uploaded: README.md")
#
# # Upload .litertlm file
# api.upload_file(
#     path_or_fileobj=OUTPUT_FILE,
#     path_in_repo=output_filename,
#     repo_id=HUB_REPO_ID,
# )
# print(f"‚úÖ Uploaded: {output_filename}")
#
# print(f"\nüéâ Model uploaded to: https://huggingface.co/{HUB_REPO_ID}")