On-device translation, transcription, and speech Q&A for ten Ugandan languages (Luganda, Acholi, Ateso, Lugbara, Runyankole, Lusoga, Lumasaba, Swahili, Kinyarwanda, English). Runs a fine-tuned Gemma 4 E2B (Sunbird AI) — and optionally a Whisper-based ASR cascade — under Cactus's ARM-optimized engine. Android-first; offline.
For the architecture, threading model, and Cactus FFI contract, see design-spec.md. For card-stack UI behaviour, see card_spec.md.
Requirements: flagship Android device (Pixel 7+ / S22+), Flutter 3.11+, Android NDK r27.x, USB debugging.
git clone git@github.com:SunbirdAI/sunflower-app.git
cd sunflower-app
flutter pub get
flutter runThe model isn't bundled (~4.7 GB INT4). On first launch the app shows "Get the model" — open the gear icon, tap Download. The tarball streams from HuggingFace into the app's private documents directory.
For dev pushes from Mac (faster than the in-app download):
hf download ak3ra/sunflower-qa-cactus-int4 model.tar.gz --local-dir /tmp
mkdir -p /tmp/sunflower_model
tar -xzf /tmp/model.tar.gz -C /tmp/sunflower_model
adb shell run-as ai.sunbird.sunflower_app sh -c 'mkdir -p app_flutter/model'
tar -cf - -C /tmp/sunflower_model . | \
adb shell "run-as ai.sunbird.sunflower_app sh -c \
'cd app_flutter/model && tar -xf - && touch .copy_complete'"Substitute any other tarball from the Available models table below — the default Speech-Q&A bundle is shown here, but the on-disk layout is identical across all flat bundles.
Native Cactus mmap only works from getApplicationDocumentsDirectory() — /sdcard (FUSE) silently NULLs.
The settings sheet ships nine downloadable variants across three architectural families. Pick by phone RAM, language coverage, latency budget, and what you want the audio path to do:
| Family | Quant | Size | Tarball | Notes |
|---|---|---|---|---|
| Sunflower Speech Q&A | INT4 | ~3.8 GB | sunflower-qa-cactus-int4 |
Default. Gemma 4 E2B fine-tuned with a speech-QA task on 7 langs (eng, lug, xog, nyn, nyo, ach, teo). On-distribution for voice questions; weaker as a pure transcriber (Lug WER ~0.53). Pinned system prompt + blank Answer-mode user content. |
| Sunflower Speech Q&A (Luganda) | INT4 | ~3.8 GB | sunflower-qa-lug-cactus-int4 |
Pending: jq's source checkpoint upload was all-zero on 2026-05-13. Catalogue row kept; awaiting a valid re-export to swap the tarball. Once a real checkpoint lands, this will become the Luganda-first default. |
| Sunflower Multilingual | INT4 | ~4.4 GB | sunflower-uga-cactus-int4 |
Gemma 4 E2B trained on 10 Ugandan languages. Vision stripped. Audio Transcribe + Answer. Broadest language coverage. |
| Sunflower Multilingual | INT8 | ~5.2 GB | sunflower-uga-cactus-int8 |
Higher quality. Needs 12 GB+ RAM. |
| Sunflower (Lug/Ach specialist) | INT4 | ~4.7 GB | sunflower-cactus-int4 |
E2B trained narrowly on Lug + Ach + En. Strongest on those two. |
| Sunflower (Lug/Ach specialist) | INT8 | ~5.5 GB | sunflower-cactus-int8 |
Higher quality specialist. Needs 12 GB+ RAM. |
| Sunflower E4B | INT4 | ~5.5 GB | sunflower-e4b-cactus-int4 |
Bigger model · stronger Lug/Ach · 30 latent langs · ~4 tok/s on Pixel 10. Vision stripped. |
| Sunflower E4B | INT8 | ~7.3 GB | sunflower-e4b-cactus-int8 |
Highest quality. Needs 12 GB+ RAM. Vision stripped. |
| Family | Quant | Size | Tarball | Notes |
|---|---|---|---|---|
| Sunflower Stitched (experimental) | INT4 | ~3.7 GB | sunflower-stitched-cactus-int4 |
Off-the-shelf Whisper-tiny multilingual (English-biased) → Gemma 4 E2B text-only. ~2-3 s voice TTFT on Pixel 10. Smoke-test bundle for the stitched architecture; ship a Luganda-fine-tuned Whisper variant in this slot for production. |
Flat bundles ship with vision tower stripped (scripts/convert_and_push.sh <src> <dst> INT4 strip-vision). The stitched bundle ships with both vision and audio towers stripped from Gemma (Whisper handles ASR) via scripts/export_stitched.sh — see that script for the dual-convert + tar layout (whisper/ + gemma/ subdirs).
Each flat-bundle entry in lib/model_settings_sheet.dart (knownModels) declares an audioModes list — currently ['answer', 'transcribe']. When a voice turn fires on a flat bundle, the app uses the verbatim SFT-trained user prompt for the selected mode:
| Mode | Default user prompt sent to Gemma |
|---|---|
| Answer | Please answer this spoken question. |
| Transcribe | Please transcribe this {source_language} audio. |
Defaults are paired with the trained system message You are an assistant that transcribes speech and translates Ugandan languages. Mode + source-language are user-selectable per turn via the two pills above the input bar.
Bundles can override either side. KnownModel.systemPrompt swaps the system message; KnownModel.audioPromptOverrides is a {mode: user_content} map that replaces the user prompt verbatim (the empty string is honoured). The Speech Q&A bundle uses both: its retrained SFT corpus saw You are an educational assistant that can give explanations, transcriptions and translations in Ugandan languages. as system, and a blank user prompt in Answer mode (audio is the whole prompt).
The stitched bundle's audioModes is intentionally empty — the cascade has its own transcript-review intent picker (Translate / Explain) that surfaces once Whisper finishes.
Add a new variant by appending an entry to knownModels in lib/model_settings_sheet.dart.
lib/
├── main.dart page state + UI wiring
├── cactus.dart raw FFI shim (post-v1.14 main; DO NOT EDIT)
├── classroom_prompt.dart runtime + trained system prompts
├── audio_trim.dart RIFF/PCM slice for VAD-trimmed WAVs
├── languages.dart language registry, prompt strings
├── model_manager.dart download + install + delete
├── model_settings_sheet.dart gear-icon bottom sheet, knownModels
├── completion/
│ ├── cactus_runner.dart worker isolate, streaming, stop, forceReset
│ └── download_foreground.dart Android foreground-service for downloads
├── conversation/
│ ├── cards.dart ConversationCard, chipsFor, types
│ ├── labels.dart chip + header strings
│ ├── state.dart ChangeNotifier card stack
│ └── instruction_detection.dart trained-prompt prefix detection
├── theme/sunflower_tokens.dart M3 ColorScheme, textTheme, markdown style
└── widgets/ slot_card, card_stack, chip_row,
timeline_scrubber, morphing_input_bar
scripts/
├── convert_and_push.sh cactus convert + optional vision strip + HF upload
├── strip_modality.py stand-alone vision / audio tower stripper
└── export_stitched.sh Whisper + Gemma dual-convert for the stitched bundle
test/ chips, conversation_state, classroom_prompt
android/app/src/main/jniLibs/arm64-v8a/libcactus.so
One entry in lib/languages.dart:
const ankole = Language(
code: 'nyn', name: 'Runyankole', abbreviation: 'Nyn', hasASRTraining: false,
);Append to supportedLanguages. The name field is interpolated into the SFT prompt template — it must match the form the corpus saw. hasASRTraining flips on only when the active bundle's checkpoint actually trained on that language's audio (the Multilingual and Speech-Q&A bundles each cover a different subset; the chip should reflect what the user-selected bundle supports).
A "Cactus model file" here is the directory of ~1959 .weights files plus tokenizer metadata that cactus convert produces from a HuggingFace fine-tune. Tarball that directory and the app downloads it on first launch.
Prereqs: Linux box with ~16 GB RAM and ~30 GB disk free, OR Mac with 32+ GB RAM. Cactus core at or after v1.14 — anything older won't read Gemma 4's audio_tower. The shipped libcactus.so is built from a post-v1.14 main commit that includes the fused INT4 MLP op; converted models must come from a compatible cactus build.
The fastest path is the bundled wrapper, which does convert + optional vision strip + tarball + HF upload in one shot:
./scripts/convert_and_push.sh <hf_source_repo> <hf_target_repo> [INT4|INT8] [strip-vision]
# example:
./scripts/convert_and_push.sh \
jq/gemma-4-e2b-questionanswering-eng-lug-xog-nyn-nyo-ach-teo \
ak3ra/sunflower-qa-cactus-int4 \
INT4 \
strip-visionRun by hand if you need the intermediate artifacts or want to deviate from the recipe:
git clone https://github.com/cactus-compute/cactus.git
cd cactus # stay on main, or `git checkout <tag>` once a post-v1.14 release is cut
source ./setup # creates venv, installs python tools
cactus convert <hf_source_repo> ./out --precision INT4Sanity-check the convert log (do not ship a model that fails these):
Normalized gemma4 audio tower key naming for conversion ← critical for audio
Warning: Unsaved tensors: set() ← must be empty
CosSim - Mean: 0.999932 ... ← INT8 ≈ 0.9999, INT4 ≈ 0.994
Tarball + upload to HF (~1959 files trips per-file rate limits, hence the tar):
tar -czf model.tar.gz -C ./out .
hf repo create <org>/<name>-cactus-int4 --repo-type model
hf upload <org>/<name>-cactus-int4 model.tar.gz model.tar.gzSmoke-test on Mac before going to phone — if it's gibberish here it's gibberish on device:
mkdir -p /tmp/test_model && tar -xzf model.tar.gz -C /tmp/test_model
cactus run /tmp/test_model
> Translate to Luganda: Photosynthesis is how plants make food.Then add (or update) the bundle in the catalogue: edit knownModels in lib/model_settings_sheet.dart with a new KnownModel(slug: …, url: …, audioModes: […], systemPrompt: …, audioPromptOverrides: {…}). See the existing entries for the field semantics and Per-bundle prompt routing above for which prompts to pin.
The chain is version-locked end to end: libcactus.so ↔ cactus.dart ↔ converted model. All three must come from the same Cactus tag.
cd /path/to/cactus && git checkout vX.Y && cactus build --android
cp android/libcactus.so /path/to/sunflower-app/android/app/src/main/jniLibs/arm64-v8a/libcactus.so
cp flutter/cactus.dart /path/to/sunflower-app/lib/cactus.dart
# then re-convert the model with the same vX.Y and re-upload to HFflutter build apk --release
# build/app/outputs/flutter-apk/app-release.apkThe APK is signed with the Flutter debug key by default. Set up release signing in android/app/build.gradle.kts before distributing.
flutter test
flutter analyze --no-pub # cactus.dart info-level lints are upstream — ignore| Path | Decode | TTFT | Notes |
|---|---|---|---|
| Text, first turn (prefix-cache prime) | ~8.8 tok/s | ~2 s | cactusPrefill at init seeds the system-prompt KV; first user turn prefills only the user-content delta (~20 tokens vs ~310). |
| Text, subsequent turns | ~8.8 tok/s | ~3-4 s | Full re-prefill of system + user content. |
| Audio (flat, audio_tower) | ~3.5-5 tok/s | ~20-25 s for a 5-7 s clip | Encoder is linear in clip length and stays FP16 in INT quants — no software lever. 15 s recording cap + VAD trim shrinks the input. |
| Audio (stitched Whisper-tiny → text) | ~8.5 tok/s | ~2-3 s for a 5 s clip | Whisper-tiny encode ~1 s, Gemma text TTFT ~700 ms on the trained-prompt path. ~6-8× felt-speed improvement over the flat audio path. |
E4B INT4 decodes at ~4 tok/s on the same device — memory-bandwidth bound, not a fixable bug. Use it when output quality matters more than latency.
The audio paths benefit further from on-device VAD trim, end-pointing, and a no-speech-detected reject; the recording cap, cap-with-end-pointing combination, and per-bundle audio mode picker are all in lib/main.dart.
| Component | Version | Why |
|---|---|---|
| Cactus core | post-v1.14 main |
The shipped libcactus.so is built from a commit after v1.14 that includes the fused INT4 MLP op (~22–24% decode speedup on Gemma 4 E2B). |
| Android NDK | r27.x | What we tested |
| Flutter | 3.11+ | Dart 3.0 minimum |
| transformers | 5.5.0 | Pinned by Cactus's converter |
Do not use cactus-flutter plugin v1.3 — it bundles a pre-Gemma-4 engine and can't read v1.14 models.
Built for the Kaggle Gemma 4 Good Hackathon. Use case: translanguage pedagogy for primary-school science in Ugandan classrooms with no internet. Training pipeline: SunbirdAI/sunflower.
- Sunflower training pipeline: SunbirdAI/sunflower
- Cactus engine: cactus-compute/cactus
- Base model: google/gemma-4-E2B-it
- Speech-Q&A fine-tune: jq/gemma-4-e2b-questionanswering-eng-lug-xog-nyn-nyo-ach-teo
- Fine-tune data: Sunbird/ug40-instructions