A premium, high-performance offline Android client for running Large Language Models (LLMs) on-device with multimodal OCR and NPU acceleration.
Download - Features - Screenshots - Credits - Disclaimer
Warning
Local LLM/AI executes AI models entirely on your physical mobile device. Running large models is highly resource-intensive and requires a modern processor and sufficient RAM (6 GB+). System stability, inference speeds, and output quality depend entirely on your hardware capability. Model weights (such as Qwen, DeepSeek, or Gemma) are not packaged inside the APK and must be downloaded or transferred manually due to their size (1.5 GB+).
Additionally, this application executes all calculations offline. No internet connection is required after models are downloaded, and no conversational data ever leaves your device.
Local LLM/AI is a high-fidelity, modern Android client designed to provide a completely private, offline, and secure conversational AI experience. By integrating Google's optimized MediaPipe Tasks GenAI engine, the app compiles and runs lightweight LLMs (like Qwen 2.5, DeepSeek-R1, Phi-2, and Gemma 2B) natively on mobile hardware.
The app includes dynamic backend routing depending on the build flavor:
- Normal Flavor: Targets GPU acceleration (Vulkan) for responsive streaming generation with graceful CPU fallback.
- NPU Flavor: Configured to delegate inference directly to the device's system NPU/AI chip via NNAPI.
The app wraps this powerful local engine in a premium, fluid Jetpack Compose (Material 3) user interface featuring offline OCR document parsing, video/file media integration, and background download handling.
The app includes built-in presets for several highly-capable, lightweight models optimized for mobile execution. Below are their approximate download sizes and memory requirements:
| Model | Developer | Parameters | Approx. Size | Min. RAM Requirement |
|---|---|---|---|---|
| Qwen 2.5 1.5B Instruct | Alibaba | 1.5B | ~1.6 GB | 6 GB+ |
| DeepSeek-R1 Distill Qwen 1.5B | DeepSeek | 1.5B | ~1.6 GB | 6 GB+ |
| Gemma 1.1 2B IT | 2B | ~1.4 GB | 8 GB+ | |
| Phi-2 2.7B | Microsoft | 2.7B | ~1.6 GB | 8 GB+ |
| Inference | Multimodal & OCR (100% Offline) |
|---|---|
| High-performance offline LLM execution | Attach Images, Videos & Documents (PDF, Code, Text) |
Dual-flavor release (normal Vulkan GPU & npu routing) |
Offline image OCR text extraction using Google ML Kit |
| Graceful CPU fallback optimization | Offline page-by-page PDF rendering and text recognition |
| Streaming word-by-word responses | Playback attached videos natively and view documents via Intent |
| UI / Experience | Core Features |
|---|---|
| Premium Material 3 dynamic styling | Complete offline privacy (no logs or tracking) |
| Custom system instructions prompt | Large model memory size & RAM badges in-app |
| Interactive file attachments preview drawer | Multi-turn chat context memory (6-turn history) |
| Collapsible OCR logs under bubble cards | Quantized weights optimizations |
Grab the latest compiled APKs from the GitHub releases page.
We compile two separate releases for each update:
- Normal Release (
app-normal-release-unsigned.apk): Optimized for general devices using mobile GPU (Vulkan) or CPU. - NPU Release (
app-npu-release-unsigned.apk): Designed for modern phones featuring specialized AI chips (NPU), utilizing neural network API routing (LlmInference.Backend.DEFAULT). Includes the.npuapplication suffix so you can install both releases side-by-side.
To compile the application yourself, ensure you have Java 17 and Android SDK set up. Set your JDK path and run the compilation:
$env:JAVA_HOME = "C:\Users\Badsiwal\.gradle\jdks\eclipse_adoptium-17-amd64-windows.2"
./gradlew assembleNormalRelease$env:JAVA_HOME = "C:\Users\Badsiwal\.gradle\jdks\eclipse_adoptium-17-amd64-windows.2"
./gradlew assembleNpuReleaseLocal LLM/AI is built on top of state-of-the-art on-device intelligence libraries and modern Android components.
Special thanks to:
- Google MediaPipe Tasks GenAI
- Google ML Kit Text Recognition
- Jetpack Compose & Material 3
- Coil Image Loading Library
- OkHttp
- Kotlin Coroutines Flow
Local LLM/AI is licensed under the MIT License. See LICENSE for details.
Local LLM/AI is an independent, unofficial project. It is not affiliated with, funded, authorized, endorsed by, or associated with Google LLC, MediaPipe, Gemma, or any of their affiliates.
All trademarks, service marks, catalogs, artwork, metadata, and model weights remain the property of their respective owners. Users are responsible for procuring and loading model files in compliance with the respective model's terms of use, license agreements, and regional requirements.

