Skip to content

SunbirdAI/sunflower-app

Repository files navigation

Sunflower

On-device translation, transcription, and speech Q&A for ten Ugandan languages (Luganda, Acholi, Ateso, Lugbara, Runyankole, Lusoga, Lumasaba, Swahili, Kinyarwanda, English). Runs a fine-tuned Gemma 4 E2B (Sunbird AI) — and optionally a Whisper-based ASR cascade — under Cactus's ARM-optimized engine. Android-first; offline.

For the architecture, threading model, and Cactus FFI contract, see design-spec.md. For card-stack UI behaviour, see card_spec.md.

Run

Requirements: flagship Android device (Pixel 7+ / S22+), Flutter 3.11+, Android NDK r27.x, USB debugging.

git clone git@github.com:SunbirdAI/sunflower-app.git
cd sunflower-app
flutter pub get
flutter run

The model isn't bundled (~4.7 GB INT4). On first launch the app shows "Get the model" — open the gear icon, tap Download. The tarball streams from HuggingFace into the app's private documents directory.

For dev pushes from Mac (faster than the in-app download):

hf download ak3ra/sunflower-qa-cactus-int4 model.tar.gz --local-dir /tmp
mkdir -p /tmp/sunflower_model
tar -xzf /tmp/model.tar.gz -C /tmp/sunflower_model
adb shell run-as ai.sunbird.sunflower_app sh -c 'mkdir -p app_flutter/model'
tar -cf - -C /tmp/sunflower_model . | \
  adb shell "run-as ai.sunbird.sunflower_app sh -c \
    'cd app_flutter/model && tar -xf - && touch .copy_complete'"

Substitute any other tarball from the Available models table below — the default Speech-Q&A bundle is shown here, but the on-disk layout is identical across all flat bundles.

Native Cactus mmap only works from getApplicationDocumentsDirectory()/sdcard (FUSE) silently NULLs.

Available models

The settings sheet ships nine downloadable variants across three architectural families. Pick by phone RAM, language coverage, latency budget, and what you want the audio path to do:

Flat-bundle Gemma 4 (audio goes directly through Gemma's audio_tower)

Family Quant Size Tarball Notes
Sunflower Speech Q&A INT4 ~3.8 GB sunflower-qa-cactus-int4 Default. Gemma 4 E2B fine-tuned with a speech-QA task on 7 langs (eng, lug, xog, nyn, nyo, ach, teo). On-distribution for voice questions; weaker as a pure transcriber (Lug WER ~0.53). Pinned system prompt + blank Answer-mode user content.
Sunflower Speech Q&A (Luganda) INT4 ~3.8 GB sunflower-qa-lug-cactus-int4 Pending: jq's source checkpoint upload was all-zero on 2026-05-13. Catalogue row kept; awaiting a valid re-export to swap the tarball. Once a real checkpoint lands, this will become the Luganda-first default.
Sunflower Multilingual INT4 ~4.4 GB sunflower-uga-cactus-int4 Gemma 4 E2B trained on 10 Ugandan languages. Vision stripped. Audio Transcribe + Answer. Broadest language coverage.
Sunflower Multilingual INT8 ~5.2 GB sunflower-uga-cactus-int8 Higher quality. Needs 12 GB+ RAM.
Sunflower (Lug/Ach specialist) INT4 ~4.7 GB sunflower-cactus-int4 E2B trained narrowly on Lug + Ach + En. Strongest on those two.
Sunflower (Lug/Ach specialist) INT8 ~5.5 GB sunflower-cactus-int8 Higher quality specialist. Needs 12 GB+ RAM.
Sunflower E4B INT4 ~5.5 GB sunflower-e4b-cactus-int4 Bigger model · stronger Lug/Ach · 30 latent langs · ~4 tok/s on Pixel 10. Vision stripped.
Sunflower E4B INT8 ~7.3 GB sunflower-e4b-cactus-int8 Highest quality. Needs 12 GB+ RAM. Vision stripped.

Stitched cascade (Whisper ASR → Gemma text)

Family Quant Size Tarball Notes
Sunflower Stitched (experimental) INT4 ~3.7 GB sunflower-stitched-cactus-int4 Off-the-shelf Whisper-tiny multilingual (English-biased) → Gemma 4 E2B text-only. ~2-3 s voice TTFT on Pixel 10. Smoke-test bundle for the stitched architecture; ship a Luganda-fine-tuned Whisper variant in this slot for production.

Flat bundles ship with vision tower stripped (scripts/convert_and_push.sh <src> <dst> INT4 strip-vision). The stitched bundle ships with both vision and audio towers stripped from Gemma (Whisper handles ASR) via scripts/export_stitched.sh — see that script for the dual-convert + tar layout (whisper/ + gemma/ subdirs).

Per-bundle prompt routing

Each flat-bundle entry in lib/model_settings_sheet.dart (knownModels) declares an audioModes list — currently ['answer', 'transcribe']. When a voice turn fires on a flat bundle, the app uses the verbatim SFT-trained user prompt for the selected mode:

Mode Default user prompt sent to Gemma
Answer Please answer this spoken question.
Transcribe Please transcribe this {source_language} audio.

Defaults are paired with the trained system message You are an assistant that transcribes speech and translates Ugandan languages. Mode + source-language are user-selectable per turn via the two pills above the input bar.

Bundles can override either side. KnownModel.systemPrompt swaps the system message; KnownModel.audioPromptOverrides is a {mode: user_content} map that replaces the user prompt verbatim (the empty string is honoured). The Speech Q&A bundle uses both: its retrained SFT corpus saw You are an educational assistant that can give explanations, transcriptions and translations in Ugandan languages. as system, and a blank user prompt in Answer mode (audio is the whole prompt).

The stitched bundle's audioModes is intentionally empty — the cascade has its own transcript-review intent picker (Translate / Explain) that surfaces once Whisper finishes.

Add a new variant by appending an entry to knownModels in lib/model_settings_sheet.dart.

Project layout

lib/
├── main.dart                          page state + UI wiring
├── cactus.dart                        raw FFI shim (post-v1.14 main; DO NOT EDIT)
├── classroom_prompt.dart              runtime + trained system prompts
├── audio_trim.dart                    RIFF/PCM slice for VAD-trimmed WAVs
├── languages.dart                     language registry, prompt strings
├── model_manager.dart                 download + install + delete
├── model_settings_sheet.dart          gear-icon bottom sheet, knownModels
├── completion/
│   ├── cactus_runner.dart             worker isolate, streaming, stop, forceReset
│   └── download_foreground.dart       Android foreground-service for downloads
├── conversation/
│   ├── cards.dart                     ConversationCard, chipsFor, types
│   ├── labels.dart                    chip + header strings
│   ├── state.dart                     ChangeNotifier card stack
│   └── instruction_detection.dart     trained-prompt prefix detection
├── theme/sunflower_tokens.dart        M3 ColorScheme, textTheme, markdown style
└── widgets/                           slot_card, card_stack, chip_row,
                                       timeline_scrubber, morphing_input_bar
scripts/
├── convert_and_push.sh                cactus convert + optional vision strip + HF upload
├── strip_modality.py                  stand-alone vision / audio tower stripper
└── export_stitched.sh                 Whisper + Gemma dual-convert for the stitched bundle
test/                                  chips, conversation_state, classroom_prompt
android/app/src/main/jniLibs/arm64-v8a/libcactus.so

How to

Add a translation language

One entry in lib/languages.dart:

const ankole = Language(
  code: 'nyn', name: 'Runyankole', abbreviation: 'Nyn', hasASRTraining: false,
);

Append to supportedLanguages. The name field is interpolated into the SFT prompt template — it must match the form the corpus saw. hasASRTraining flips on only when the active bundle's checkpoint actually trained on that language's audio (the Multilingual and Speech-Q&A bundles each cover a different subset; the chip should reflect what the user-selected bundle supports).

Create a Cactus model file

A "Cactus model file" here is the directory of ~1959 .weights files plus tokenizer metadata that cactus convert produces from a HuggingFace fine-tune. Tarball that directory and the app downloads it on first launch.

Prereqs: Linux box with ~16 GB RAM and ~30 GB disk free, OR Mac with 32+ GB RAM. Cactus core at or after v1.14 — anything older won't read Gemma 4's audio_tower. The shipped libcactus.so is built from a post-v1.14 main commit that includes the fused INT4 MLP op; converted models must come from a compatible cactus build.

The fastest path is the bundled wrapper, which does convert + optional vision strip + tarball + HF upload in one shot:

./scripts/convert_and_push.sh <hf_source_repo> <hf_target_repo> [INT4|INT8] [strip-vision]
# example:
./scripts/convert_and_push.sh \
  jq/gemma-4-e2b-questionanswering-eng-lug-xog-nyn-nyo-ach-teo \
  ak3ra/sunflower-qa-cactus-int4 \
  INT4 \
  strip-vision

Run by hand if you need the intermediate artifacts or want to deviate from the recipe:

git clone https://github.com/cactus-compute/cactus.git
cd cactus      # stay on main, or `git checkout <tag>` once a post-v1.14 release is cut
source ./setup           # creates venv, installs python tools

cactus convert <hf_source_repo> ./out --precision INT4

Sanity-check the convert log (do not ship a model that fails these):

Normalized gemma4 audio tower key naming for conversion       ← critical for audio
Warning: Unsaved tensors: set()                               ← must be empty
CosSim - Mean: 0.999932 ...                                   ← INT8 ≈ 0.9999, INT4 ≈ 0.994

Tarball + upload to HF (~1959 files trips per-file rate limits, hence the tar):

tar -czf model.tar.gz -C ./out .
hf repo create <org>/<name>-cactus-int4 --repo-type model
hf upload <org>/<name>-cactus-int4 model.tar.gz model.tar.gz

Smoke-test on Mac before going to phone — if it's gibberish here it's gibberish on device:

mkdir -p /tmp/test_model && tar -xzf model.tar.gz -C /tmp/test_model
cactus run /tmp/test_model
> Translate to Luganda: Photosynthesis is how plants make food.

Then add (or update) the bundle in the catalogue: edit knownModels in lib/model_settings_sheet.dart with a new KnownModel(slug: …, url: …, audioModes: […], systemPrompt: …, audioPromptOverrides: {…}). See the existing entries for the field semantics and Per-bundle prompt routing above for which prompts to pin.

Bump Cactus core to a newer version

The chain is version-locked end to end: libcactus.socactus.dart ↔ converted model. All three must come from the same Cactus tag.

cd /path/to/cactus && git checkout vX.Y && cactus build --android
cp android/libcactus.so      /path/to/sunflower-app/android/app/src/main/jniLibs/arm64-v8a/libcactus.so
cp flutter/cactus.dart       /path/to/sunflower-app/lib/cactus.dart
# then re-convert the model with the same vX.Y and re-upload to HF

Build a release APK

flutter build apk --release
# build/app/outputs/flutter-apk/app-release.apk

The APK is signed with the Flutter debug key by default. Set up release signing in android/app/build.gradle.kts before distributing.

Run tests

flutter test
flutter analyze --no-pub   # cactus.dart info-level lints are upstream — ignore

Performance (Pixel 10 Tensor G5 CPU, INT4)

Path Decode TTFT Notes
Text, first turn (prefix-cache prime) ~8.8 tok/s ~2 s cactusPrefill at init seeds the system-prompt KV; first user turn prefills only the user-content delta (~20 tokens vs ~310).
Text, subsequent turns ~8.8 tok/s ~3-4 s Full re-prefill of system + user content.
Audio (flat, audio_tower) ~3.5-5 tok/s ~20-25 s for a 5-7 s clip Encoder is linear in clip length and stays FP16 in INT quants — no software lever. 15 s recording cap + VAD trim shrinks the input.
Audio (stitched Whisper-tiny → text) ~8.5 tok/s ~2-3 s for a 5 s clip Whisper-tiny encode ~1 s, Gemma text TTFT ~700 ms on the trained-prompt path. ~6-8× felt-speed improvement over the flat audio path.

E4B INT4 decodes at ~4 tok/s on the same device — memory-bandwidth bound, not a fixable bug. Use it when output quality matters more than latency.

The audio paths benefit further from on-device VAD trim, end-pointing, and a no-speech-detected reject; the recording cap, cap-with-end-pointing combination, and per-bundle audio mode picker are all in lib/main.dart.

Version pinning

Component Version Why
Cactus core post-v1.14 main The shipped libcactus.so is built from a commit after v1.14 that includes the fused INT4 MLP op (~22–24% decode speedup on Gemma 4 E2B).
Android NDK r27.x What we tested
Flutter 3.11+ Dart 3.0 minimum
transformers 5.5.0 Pinned by Cactus's converter

Do not use cactus-flutter plugin v1.3 — it bundles a pre-Gemma-4 engine and can't read v1.14 models.

Hackathon

Built for the Kaggle Gemma 4 Good Hackathon. Use case: translanguage pedagogy for primary-school science in Ugandan classrooms with no internet. Training pipeline: SunbirdAI/sunflower.

Credits

About

On-device translanguage demo: Gemma 4 E2B fine-tuned for Luganda + Acholi (Sunbird/sunflower) running offline on Android via Cactus v1.14

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages