Zero-trust on-device PII/PHI redaction for sensitive documents. Redacto runs a 4-layer LLM pipeline (Classify → Detect → Redact → Validate) entirely on-device using LiteRT-LM and Gemma 4 E2B — no cloud, no INTERNET permission, no BAA required. Supports 7 document categories: Medical, Financial, Legal, Tactical, Journalism, Field Service, and General.
Built for the Qualcomm × Google LiteRT Developer Hackathon 2026 by team Edge Artists.
- Android Studio Meerkat or later
- Android SDK 36, JDK 21 (bundled with Android Studio)
- Device: any arm64 Android 12+ device with 8+ GB RAM (tested on Galaxy S25 Ultra and S24)
Download from litert-community/gemma-4-E2B-it-litert-lm on HuggingFace (Apache 2.0 license):
| File | Size | Use |
|---|---|---|
gemma-4-E2B-it.litertlm |
~2.4 GB | CPU/GPU inference (all devices) |
gemma-4-E2B-it_qualcomm_sm8750.litertlm |
~2.8 GB | Snapdragon 8 Elite NPU (S25 Ultra only) |
The GPU/CPU model must be pushed manually. The NPU model is bundled in the APK and auto-extracts on first launch — but you can also push it manually if needed.
# GPU/CPU model (required — works on all devices)
adb push gemma-4-E2B-it.litertlm /sdcard/Android/data/com.example.redacto/files/gemma4.litertlm
# NPU model (optional push — auto-extracted from APK assets on first launch)
# Only works on S25 Ultra (Snapdragon 8 Elite / SM8750)
adb push gemma-4-E2B-it_qualcomm_sm8750.litertlm /sdcard/Android/data/com.example.redacto/files/gemma4_npu.litertlmVerify models are in place:
adb shell ls -la /sdcard/Android/data/com.example.redacto/files/export JAVA_HOME="/Applications/Android Studio.app/Contents/jbr/Contents/Home"
export ANDROID_HOME="$HOME/Library/Android/sdk"
./gradlew installDebugNote: The
JAVA_HOMEpath above is for macOS. On Linux/Windows, point it to your Android Studio bundled JDK location.
adb shell am start -n com.example.redacto/.MainActivityThe engine cold-starts on first run (~10s for GPU, ~14s for NPU). Subsequent launches use the AOT cache (~2s).
The app defaults to NPU, which will crash on devices without the Snapdragon 8 Elite. To force GPU mode before first launch:
echo '<?xml version="1.0" encoding="utf-8" standalone="yes" ?><map><string name="selected_variant">GPU</string></map>' > /tmp/redacto_prefs.xml
adb push /tmp/redacto_prefs.xml /data/local/tmp/redacto_prefs.xml
adb shell "run-as com.example.redacto mkdir -p /data/data/com.example.redacto/shared_prefs && run-as com.example.redacto cp /data/local/tmp/redacto_prefs.xml /data/data/com.example.redacto/shared_prefs/redacto_prefs.xml"Or use CPU instead of GPU if your device lacks OpenCL support.
applicationId: com.example.redacto
compileSdk: 36
minSdk: 31
targetSdk: 36
abiFilters: arm64-v8a
largeHeap: true (required for 2.4 GB model)
permissions: No INTERNET
A single LLM call asked to simultaneously classify, detect, replace, and verify produces inconsistent results. Redacto separates concerns — each layer runs a fresh conversation with a purpose-built prompt.
Document type + category detection. Output drives which of the 7 specialist detectors runs in Layer 2.
DOCUMENT_TYPE: Medical Note
CATEGORY: Medical
Find every identifier — including relational ones. The LLM finds what regex never could: "the patient's daughter Lisa" — "Lisa" in isolation isn't PHI; in context, it is.
NAME: Mrs. Chen
DATE: 3/15/48
MRN: 4471829
PHONE: 408-555-1234
For images, the LLM labels indices, not strings: 1:NAME 2:NAME 5:DATE.
The LLM performs substitution directly, so OCR errors in detection never produce missed redactions. For images, detected index ranges map directly to bounding boxes — no string matching involved.
A fresh LLM conversation audits the redacted output — no memory of what Layers 1–3 decided. If it finds a miss, Layers 3 + 4 re-run. Maximum 3 rounds.
OCR (indexed) → LLM labels (Layer 2) → Layer 3 draws boxes
[0] Patient 1:NAME Patient ████ ████
[1] Jane 2:NAME DOB: ████
[2] Smith 4:DATE compound fracture
[3] DOB: 0,3,5,6:preserve
[4] 03/15/78
[5] compound
[6] fracture
The index → bounding-box mapping is lossless. No text matching means no OCR-error sensitivity.
| Step | Action |
|---|---|
| 01 · Capture | Tap + on the home vault. Choose camera, gallery, or paste text. |
| 02 · OCR + Index | ML Kit OCR extracts every word with its bounding box. |
| 03 · Redact | 4-layer pipeline: Classify → Detect → Redact → Validate. |
| 04 · Save or send | Redacted image + text + heatmap. Save to vault or share. |
| Home Document vault |
Add Upload options |
Pipeline Step 2/4 · Detecting |
![]() |
![]() |
![]() |
| Result · GPU 2.0s · 24 tok/s |
Result · NPU 1.6s · 42 tok/s |
Toggle Redacted ↔ Original |
![]() |
![]() |
![]() |
| Text · Journalism Category placeholders |
||
![]() |
| Layer | Component | What it does |
|---|---|---|
| Pipeline · Layers 1–4 | LiteRT-LM + Gemma 4 E2B | On-device LLM inference via the compiled-model API. Catches relational identifiers that regex can't. |
| Silicon · NPU | Snapdragon Hexagon NPU | QNN delegates targeting Hexagon V79. 41.7 tok/s sustained on Gemma 4 E2B. |
| Vision · OCR | Google ML Kit OCR | On-device text recognition with bounding-box metadata. |
| UI · Android | Jetpack Compose | Document vault, camera scanner, heatmap overlay, share sheet. |
| Device · Target | Samsung S25 Ultra | Snapdragon 8 Elite (SM8750). GPU fallback works on any arm64 device. |
| # | Optimization | Detail |
|---|---|---|
| 01 | INT4-quantized Gemma 4 E2B via LiteRT-LM | Two compiled bundles: GPU/CPU (2.4 GB) and NPU (2.8 GB, QNN-prepared for Hexagon). |
| 02 | QNN delegate via Qualcomm dispatch library | NPU path through libLiteRtDispatch_Qualcomm.so. Targets Hexagon V79 directly. |
| 03 | AOT cache for cold-start | Compiled NPU graph cached. Second-launch init drops from ~14s to ~2s. |
| 04 | Backend cascade · NPU → GPU → CPU | Automatic fallback at init time. Each variant pairs with its own model file. |
| 05 | 4-pass pipeline · separation of concerns | Each LLM call is a fresh conversation — purpose-built prompt, no context pollution. |
| 06 | Indexed-element image redaction | OCR yields indexed tokens; LLM returns indices. Index → bounding-box is lossless. |
| 07 | Constrained sampling (GPU) · streaming (NPU) | GPU: topK=64, topP=0.95. NPU: QNN's native sampling (no constrained decoding support). |
| 08 | arm64-v8a single-ABI build | No fat APK. pickFirsts dedupes QNN .so files. |
| 09 | largeHeap + chunked OCR (150 elements) | Long documents chunked per detection pass to respect the ~4k context window. |
| 10 | No INTERNET permission · zero telemetry |
The manifest cannot send a packet. Compliance is structural. |
30 entries across 5 redaction modes. 3-step pipeline (Classify → Detect → Redact).
| Metric | NPU | GPU |
|---|---|---|
| Time to first token | 92ms | 366ms |
| Decode throughput | 41.7 tok/s | 24.5 tok/s |
| Detect step latency | 624ms | 1,586ms |
| End-to-end (229-char note) | 2.78s | 5.65s |
| Step | GPU | NPU | Speedup |
|---|---|---|---|
| Step 1 · Classify | 773ms | 345ms | 2.2× |
| Step 2 · Detect | 1,586ms | 624ms | 2.5× |
| Step 3 · Redact | 2,475ms | 4,060ms | NPU verbose (3.2× more tokens) |
QNN doesn't yet support constrained decoding — the NPU produces ~2× more tokens, which equalizes the 30-entry average at ~5s. For TTFT-critical UX and short inputs, NPU still wins decisively.
- Latency:
System.currentTimeMillis()wall-clock aroundengine.infer(). - TTFT: first
onMessagecallback minus start time. - Decode tok/s:
(tokens − 1) × 1000 / (lastToken − firstToken). - Peak RSS:
/proc/self/statusVmRSS line.
Full per-entry data in docs/benchmark-results.md.
| Category | Typical identifiers |
|---|---|
| Medical | names, DOB, MRN, diagnoses |
| Financial | accounts, routing, SSN, tax IDs |
| Legal | parties, case numbers, attorney contacts |
| Tactical | victims, witnesses, minors |
| Journalism | source identity, locations |
| Field Service | gate codes, PINs, customer PII |
| General | names, dates, contacts, addresses (fallback) |
| v1 (single-pass) | v2 (Redacto) |
|---|---|
| One LLM call: classify + detect + redact | Classify → Detect → Redact → Validate · each a fresh conversation |
Deterministic string.replace() |
LLM-driven substitution — handles OCR noise |
| Regex fallback for SSN / phone / email | Indexed-element image redaction — zero string matching |
| No validator | Independent auditor with up to 3 retry rounds |
| Idea | Outcome | Lesson |
|---|---|---|
| Fine-tune Gemma 4 E2B | Cannot ship to NPU | Quality regressed (70.3% vs 80.5%). Can't compile fine-tuned model into QNN bundle without internal toolchain. |
Deterministic string.replace() in Step 3 |
Replaced | OCR errors caused 6 of 8 redactions to silently fail. LLM substitution fixed it. |
| Word-diff for image bounding boxes | Replaced | LLM rewrote text — diff broke. Indexed-element approach is lossless. |
| Battery / energy metrics | Dropped | BATTERY_PROPERTY_CURRENT_NOW is system-wide at 1Hz. Can't isolate per-process draw. |
We have a fine-tuned Gemma 4 E2B that runs on GPU. We cannot ship it to the Hexagon V79 NPU — compiling a fine-tuned .litertlm into a QNN-prepared bundle requires the AIMET + QNN-AOT toolchain, which is not part of the public LiteRT-LM SDK. This is a hardware-team integration boundary, not an engineering shortcut.
| # | Enhancement | Status |
|---|---|---|
| 01 | Fine-tuned NPU bundle | Needs Qualcomm hardware support for QNN compilation |
| 02 | Pipeline accuracy iteration | Prompt + validator tuning with wider eval set |
| 03 | PDF support | Code shipped, UI entry point not yet exposed |
| 04 | In-app benchmark dashboard | Spec written, not built |
| 05 | Per-detection confidence scores | Validator extension |
| 06 | Live transcription redaction | Streaming pipeline |
| 07 | iOS port | LiteRT-LM ships an iOS runtime |
| 08 | Audit log + MDM deployment | Enterprise compliance |
app/src/main/java/com/example/redacto/
├── RedactoApp.kt # Application: DSP paths, model extraction
├── MainActivity.kt # Entry point, ViewModel wiring
├── engine/
│ ├── LlmEngine.kt # Interface: initialize, redact, infer, close
│ ├── InferenceEngine.kt # LiteRT-LM wrapper: NPU/GPU/CPU, streaming
│ ├── OcrProcessor.kt # ML Kit OCR with bounding boxes
│ ├── PdfTextExtractor.kt # PDF → OCR (implemented, UI not exposed)
│ ├── PlaceholderMapper.kt # Maps [CATEGORY_N] back to original text
│ ├── RegexFallback.kt # Legacy regex patterns (not in pipeline)
│ └── pipeline/
│ ├── RedactionPipeline.kt # 4-step text pipeline
│ ├── ImageRedactionPipeline.kt # Indexed-element image pipeline
│ ├── PipelinePrompts.kt # All prompt templates (7 categories)
│ └── *Parser.kt # Classification, detection, validation parsers
├── benchmark/
│ ├── TextBenchmarkRunner.kt # Text benchmark (ADB-triggered)
│ └── BenchmarkRunner.kt # Image benchmark
├── data/ # Room database: documents, versions, categories
├── ui/
│ ├── RedactionViewModel.kt # Model cascade, pipeline orchestration
│ ├── screens/ # Home, TextInput, Scanner, Result, Setup
│ ├── components/ # Heatmap, HUD bar, FAB, category cards
│ └── theme/ # DM Sans typography, Navy/Teal palette
└── navigation/NavGraph.kt # 9 routes
See docs/ for detailed engineering documentation:
- npu-enablement.md — NPU setup on SM8750: 6 failure modes and fixes
- native-libs.md — .so file inventory and packaging
- backend-cascade.md — NPU → GPU → CPU fallback logic
- diagnostics.md — Error messages and logcat filters
- engineering-decisions.md — 27 technical decisions with rationale
- benchmark-results.md — Full per-entry benchmark data
- runbook.md — Operational recipes
- demo-code-walkthrough.md — Code tour
Hackathon: Qualcomm × Google LiteRT Developer Hackathon 2026 DevPost Submission: Redacto Team: Edge Artists (Bhavik, Jaydeep, Riken, Tirth)







