Run ChatGPT-like features fully offline on Android in minutes. No APIs. No cloud. No data leakage.
Most AI SDKs require:
- ❌ Cloud APIs
- ❌ Latency
- ❌ Privacy tradeoffs
- ❌ Expensive scaling
Axiom AI flips that:
- ✅ 100% on-device inference (powered by llama.cpp)
- ✅ Offline-first (works in airplane mode)
- ✅ Kotlin-first clean API
- ✅ Play Store–safe model delivery
- ✅ Production-ready architecture
- 🧠 Local LLM inference (GGUF models)
- ⚡ Streaming text generation
- 📥 Built-in model download manager
- 🧩 Modular architecture
- 📱 Android-optimized performance
Offline AI running on a real Android device (no internet)
Add demo GIF/video here
dependencies {
implementation("com.axiom:axiom-core:0.1.0")
implementation("com.axiom:axiom-llama-cpp:0.1.0")
implementation("com.axiom:axiom-models:0.1.0")
}val engine = LlamaCppEngine()
val modelManager = DefaultModelManager(context)
// Get recommended model
val model = modelManager.recommend(context)
// Download if needed
val modelPath = modelManager.download(model) { progress ->
Log.d("Axiom", "Downloading: $progress")
}
// Initialize engine
engine.init(LLMConfig(
modelPath = modelPath,
contextSize = 1024,
temperature = 0.7f,
topK = 40
))engine.stream("Explain Android in simple terms") { token ->
print(token)
}For the sample app, you can manually download and add a model:
-
Download TinyLlama 1.1B (Q4_K_M):
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
-
Add to sample app assets:
mkdir -p sample/src/main/assets/models cp tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf sample/src/main/assets/models/tinyllama.gguf
-
Rebuild sample app:
./gradlew :sample:assembleDebug
axiom-core → SDK interfaces + config
axiom-llama-cpp → llama.cpp engine binding
axiom-models → model manager + downloader
sample-app → demo implementation
Axiom does NOT bundle models in APK.
Instead:
- Models are fetched from a remote registry
- Downloaded via Android DownloadManager
- Stored in app-private storage
- Verified via checksum
✅ No APK bloat ✅ No policy violations ✅ Resumable downloads
| Model | Size | RAM | Speed |
|---|---|---|---|
| TinyLlama 1.1B | ~500MB | 3–4GB | ⚡ Fast |
| TinyMistral 0.2B | ~130MB | 2–3GB | ⚡⚡ Very Fast |
| Mistral 7B | ~4GB | 6–8GB | 🐢 Slow but high quality |
interface LLMEngine {
suspend fun init(config: LLMConfig): Boolean
suspend fun generate(prompt: String): String
suspend fun stream(prompt: String, onToken: (String) -> Unit)
fun cleanup()
}data class LLMConfig(
val modelPath: String,
val contextSize: Int = 1024,
val temperature: Float = 0.7f,
val topK: Int = 40,
val threads: Int = Runtime.getRuntime().availableProcessors()
)ModelManager.fetchRegistry()
ModelManager.download(model) { progress -> }
ModelManager.getInstalledModels()
ModelManager.delete(model)- Use Q4_K_M quantization
- Keep context ≤ 1024 for mobile
- Prefer small models for UX
- Avoid frequent init/cleanup
- All inference runs locally
- No data leaves device
- No API calls
- Streaming API improvements
- Model auto-selection
- GPU / NNAPI acceleration
- Background downloads
- Stable SDK
- Production benchmarks
- UI components (ChatKit)
We welcome contributions!
- Add new model configs
- Improve streaming API
- Optimize memory usage
- Enhance sample app UI
git clone https://github.com/av-feaster/axiom
cd axiom
./gradlew buildMIT License
Axiom AI aims to become:
"Firebase for on-device AI"
If this project helps you:
- ⭐ Star the repo
- 🧵 Share on Twitter / Reddit
- 🧪 Build something cool
The future of AI is not just in the cloud.
It's on your device.
Made with ❤️ for indie developers