Redacto — Privacy. Redacted.

Zero-trust on-device PII/PHI redaction for sensitive documents. Redacto runs a 4-layer LLM pipeline (Classify → Detect → Redact → Validate) entirely on-device using LiteRT-LM and Gemma 4 E2B — no cloud, no INTERNET permission, no BAA required. Supports 7 document categories: Medical, Financial, Legal, Tactical, Journalism, Field Service, and General.

Built for the Qualcomm × Google LiteRT Developer Hackathon 2026 by team Edge Artists.

Quick start

Prerequisites

Android Studio Meerkat or later
Android SDK 36, JDK 21 (bundled with Android Studio)
Device: any arm64 Android 12+ device with 8+ GB RAM (tested on Galaxy S25 Ultra and S24)

1. Download the model

Download from litert-community/gemma-4-E2B-it-litert-lm on HuggingFace (Apache 2.0 license):

File	Size	Use
`gemma-4-E2B-it.litertlm`	~2.4 GB	CPU/GPU inference (all devices)
`gemma-4-E2B-it_qualcomm_sm8750.litertlm`	~2.8 GB	Snapdragon 8 Elite NPU (S25 Ultra only)

2. Push model to device

The GPU/CPU model must be pushed manually. The NPU model is bundled in the APK and auto-extracts on first launch — but you can also push it manually if needed.

# GPU/CPU model (required — works on all devices)
adb push gemma-4-E2B-it.litertlm /sdcard/Android/data/com.example.redacto/files/gemma4.litertlm

# NPU model (optional push — auto-extracted from APK assets on first launch)
# Only works on S25 Ultra (Snapdragon 8 Elite / SM8750)
adb push gemma-4-E2B-it_qualcomm_sm8750.litertlm /sdcard/Android/data/com.example.redacto/files/gemma4_npu.litertlm

Verify models are in place:

adb shell ls -la /sdcard/Android/data/com.example.redacto/files/

3. Build and install

export JAVA_HOME="/Applications/Android Studio.app/Contents/jbr/Contents/Home"
export ANDROID_HOME="$HOME/Library/Android/sdk"
./gradlew installDebug

Note: The JAVA_HOME path above is for macOS. On Linux/Windows, point it to your Android Studio bundled JDK location.

4. Launch

adb shell am start -n com.example.redacto/.MainActivity

The engine cold-starts on first run (~10s for GPU, ~14s for NPU). Subsequent launches use the AOT cache (~2s).

Running on non-S25 Ultra devices

The app defaults to NPU, which will crash on devices without the Snapdragon 8 Elite. To force GPU mode before first launch:

echo '<?xml version="1.0" encoding="utf-8" standalone="yes" ?><map><string name="selected_variant">GPU</string></map>' > /tmp/redacto_prefs.xml
adb push /tmp/redacto_prefs.xml /data/local/tmp/redacto_prefs.xml
adb shell "run-as com.example.redacto mkdir -p /data/data/com.example.redacto/shared_prefs && run-as com.example.redacto cp /data/local/tmp/redacto_prefs.xml /data/data/com.example.redacto/shared_prefs/redacto_prefs.xml"

Or use CPU instead of GPU if your device lacks OpenCL support.

Build config

applicationId:  com.example.redacto
compileSdk:     36
minSdk:         31
targetSdk:      36
abiFilters:     arm64-v8a
largeHeap:      true (required for 2.4 GB model)
permissions:    No INTERNET

Architecture — 4-layer pipeline

A single LLM call asked to simultaneously classify, detect, replace, and verify produces inconsistent results. Redacto separates concerns — each layer runs a fresh conversation with a purpose-built prompt.

Layer 01 · Classify

Document type + category detection. Output drives which of the 7 specialist detectors runs in Layer 2.

DOCUMENT_TYPE: Medical Note
CATEGORY: Medical

Layer 02 · Detect

Find every identifier — including relational ones. The LLM finds what regex never could: "the patient's daughter Lisa" — "Lisa" in isolation isn't PHI; in context, it is.

NAME: Mrs. Chen
DATE: 3/15/48
MRN:  4471829
PHONE: 408-555-1234

For images, the LLM labels indices, not strings: 1:NAME 2:NAME 5:DATE.

Layer 03 · Redact

The LLM performs substitution directly, so OCR errors in detection never produce missed redactions. For images, detected index ranges map directly to bounding boxes — no string matching involved.

Layer 04 · Validate

A fresh LLM conversation audits the redacted output — no memory of what Layers 1–3 decided. If it finds a miss, Layers 3 + 4 re-run. Maximum 3 rounds.

Image redaction · indexed-element approach

OCR (indexed)        →  LLM labels (Layer 2)  →  Layer 3 draws boxes
[0] Patient              1:NAME                   Patient ████ ████
[1] Jane                 2:NAME                   DOB: ████
[2] Smith                4:DATE                   compound fracture
[3] DOB:                 0,3,5,6:preserve
[4] 03/15/78
[5] compound
[6] fracture

The index → bounding-box mapping is lossless. No text matching means no OCR-error sensitivity.

App flow

Step	Action
01 · Capture	Tap + on the home vault. Choose camera, gallery, or paste text.
02 · OCR + Index	ML Kit OCR extracts every word with its bounding box.
03 · Redact	4-layer pipeline: Classify → Detect → Redact → Validate.
04 · Save or send	Redacted image + text + heatmap. Save to vault or share.

Screenshots

Home Document vault	Add Upload options	Pipeline Step 2/4 · Detecting

Result · GPU 2.0s · 24 tok/s	Result · NPU 1.6s · 42 tok/s	Toggle Redacted ↔ Original

Text · Journalism Category placeholders

Technical stack

Layer	Component	What it does
Pipeline · Layers 1–4	LiteRT-LM + Gemma 4 E2B	On-device LLM inference via the compiled-model API. Catches relational identifiers that regex can't.
Silicon · NPU	Snapdragon Hexagon NPU	QNN delegates targeting Hexagon V79. 41.7 tok/s sustained on Gemma 4 E2B.
Vision · OCR	Google ML Kit OCR	On-device text recognition with bounding-box metadata.
UI · Android	Jetpack Compose	Document vault, camera scanner, heatmap overlay, share sheet.
Device · Target	Samsung S25 Ultra	Snapdragon 8 Elite (SM8750). GPU fallback works on any arm64 device.

Key optimizations

#	Optimization	Detail
01	INT4-quantized Gemma 4 E2B via LiteRT-LM	Two compiled bundles: GPU/CPU (2.4 GB) and NPU (2.8 GB, QNN-prepared for Hexagon).
02	QNN delegate via Qualcomm dispatch library	NPU path through `libLiteRtDispatch_Qualcomm.so`. Targets Hexagon V79 directly.
03	AOT cache for cold-start	Compiled NPU graph cached. Second-launch init drops from ~14s to ~2s.
04	Backend cascade · NPU → GPU → CPU	Automatic fallback at init time. Each variant pairs with its own model file.
05	4-pass pipeline · separation of concerns	Each LLM call is a fresh conversation — purpose-built prompt, no context pollution.
06	Indexed-element image redaction	OCR yields indexed tokens; LLM returns indices. Index → bounding-box is lossless.
07	Constrained sampling (GPU) · streaming (NPU)	GPU: `topK=64, topP=0.95`. NPU: QNN's native sampling (no constrained decoding support).
08	arm64-v8a single-ABI build	No fat APK. `pickFirsts` dedupes QNN .so files.
09	largeHeap + chunked OCR (150 elements)	Long documents chunked per detection pass to respect the ~4k context window.
10	No `INTERNET` permission · zero telemetry	The manifest cannot send a packet. Compliance is structural.

Benchmarks — measured on Galaxy S25 Ultra

30 entries across 5 redaction modes. 3-step pipeline (Classify → Detect → Redact).

Headline numbers · NPU vs GPU

Metric	NPU	GPU
Time to first token	92ms	366ms
Decode throughput	41.7 tok/s	24.5 tok/s
Detect step latency	624ms	1,586ms
End-to-end (229-char note)	2.78s	5.65s

Per-step latency · 30-entry average

Step	GPU	NPU	Speedup
Step 1 · Classify	773ms	345ms	2.2×
Step 2 · Detect	1,586ms	624ms	2.5×
Step 3 · Redact	2,475ms	4,060ms	NPU verbose (3.2× more tokens)

Why total wall-clock isn't 2×

QNN doesn't yet support constrained decoding — the NPU produces ~2× more tokens, which equalizes the 30-entry average at ~5s. For TTFT-critical UX and short inputs, NPU still wins decisively.

Methodology

Latency: System.currentTimeMillis() wall-clock around engine.infer().
TTFT: first onMessage callback minus start time.
Decode tok/s: (tokens − 1) × 1000 / (lastToken − firstToken).
Peak RSS: /proc/self/status VmRSS line.

Full per-entry data in docs/benchmark-results.md.

7 supported categories

Category	Typical identifiers
Medical	names, DOB, MRN, diagnoses
Financial	accounts, routing, SSN, tax IDs
Legal	parties, case numbers, attorney contacts
Tactical	victims, witnesses, minors
Journalism	source identity, locations
Field Service	gate codes, PINs, customer PII
General	names, dates, contacts, addresses (fallback)

Engineering journey

From single-pass to 4-layer

v1 (single-pass)	v2 (Redacto)
One LLM call: classify + detect + redact	Classify → Detect → Redact → Validate · each a fresh conversation
Deterministic `string.replace()`	LLM-driven substitution — handles OCR noise
Regex fallback for SSN / phone / email	Indexed-element image redaction — zero string matching
No validator	Independent auditor with up to 3 retry rounds

What we tried and what we learned

Idea	Outcome	Lesson
Fine-tune Gemma 4 E2B	Cannot ship to NPU	Quality regressed (70.3% vs 80.5%). Can't compile fine-tuned model into QNN bundle without internal toolchain.
Deterministic `string.replace()` in Step 3	Replaced	OCR errors caused 6 of 8 redactions to silently fail. LLM substitution fixed it.
Word-diff for image bounding boxes	Replaced	LLM rewrote text — diff broke. Indexed-element approach is lossless.
Battery / energy metrics	Dropped	`BATTERY_PROPERTY_CURRENT_NOW` is system-wide at 1Hz. Can't isolate per-process draw.

NPU fine-tuning blocker

We have a fine-tuned Gemma 4 E2B that runs on GPU. We cannot ship it to the Hexagon V79 NPU — compiling a fine-tuned .litertlm into a QNN-prepared bundle requires the AIMET + QNN-AOT toolchain, which is not part of the public LiteRT-LM SDK. This is a hardware-team integration boundary, not an engineering shortcut.

Future enhancements

#	Enhancement	Status
01	Fine-tuned NPU bundle	Needs Qualcomm hardware support for QNN compilation
02	Pipeline accuracy iteration	Prompt + validator tuning with wider eval set
03	PDF support	Code shipped, UI entry point not yet exposed
04	In-app benchmark dashboard	Spec written, not built
05	Per-detection confidence scores	Validator extension
06	Live transcription redaction	Streaming pipeline
07	iOS port	LiteRT-LM ships an iOS runtime
08	Audit log + MDM deployment	Enterprise compliance

Project structure (key files)

app/src/main/java/com/example/redacto/
├── RedactoApp.kt              # Application: DSP paths, model extraction
├── MainActivity.kt            # Entry point, ViewModel wiring
├── engine/
│   ├── LlmEngine.kt           # Interface: initialize, redact, infer, close
│   ├── InferenceEngine.kt     # LiteRT-LM wrapper: NPU/GPU/CPU, streaming
│   ├── OcrProcessor.kt        # ML Kit OCR with bounding boxes
│   ├── PdfTextExtractor.kt    # PDF → OCR (implemented, UI not exposed)
│   ├── PlaceholderMapper.kt   # Maps [CATEGORY_N] back to original text
│   ├── RegexFallback.kt       # Legacy regex patterns (not in pipeline)
│   └── pipeline/
│       ├── RedactionPipeline.kt       # 4-step text pipeline
│       ├── ImageRedactionPipeline.kt  # Indexed-element image pipeline
│       ├── PipelinePrompts.kt         # All prompt templates (7 categories)
│       └── *Parser.kt                # Classification, detection, validation parsers
├── benchmark/
│   ├── TextBenchmarkRunner.kt  # Text benchmark (ADB-triggered)
│   └── BenchmarkRunner.kt     # Image benchmark
├── data/                       # Room database: documents, versions, categories
├── ui/
│   ├── RedactionViewModel.kt  # Model cascade, pipeline orchestration
│   ├── screens/               # Home, TextInput, Scanner, Result, Setup
│   ├── components/            # Heatmap, HUD bar, FAB, category cards
│   └── theme/                 # DM Sans typography, Navy/Teal palette
└── navigation/NavGraph.kt     # 9 routes

Documentation

See docs/ for detailed engineering documentation:

npu-enablement.md — NPU setup on SM8750: 6 failure modes and fixes
native-libs.md — .so file inventory and packaging
backend-cascade.md — NPU → GPU → CPU fallback logic
diagnostics.md — Error messages and logcat filters
engineering-decisions.md — 27 technical decisions with rationale
benchmark-results.md — Full per-entry benchmark data
runbook.md — Operational recipes
demo-code-walkthrough.md — Code tour

Hackathon: Qualcomm × Google LiteRT Developer Hackathon 2026 DevPost Submission: Redacto Team: Edge Artists (Bhavik, Jaydeep, Riken, Tirth)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
docs		docs
gradle		gradle
image-data		image-data
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.gradle.kts		build.gradle.kts
deploy.sh		deploy.sh
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Folders and files

Latest commit

History

Repository files navigation