Talk to your phone. It listens. It acts.
AndroMolt is the world's first fully native Android AI agent that autonomously controls your device using natural language — no scripting, no tapping, no limits.
AndroMolt redefines how humans interact with Android. Instead of tapping through menus, you just describe what you want — in plain English — and the AI agent does it for you. Autonomously. Intelligently. Instantly.
You say: "Open WhatsApp and send good morning to mom"
"Play the latest Bollywood playlist on YouTube"
"Open Settings and turn on Wi-Fi then open Chrome and search the weather"
AndroMolt: ✅ Done.
Unlike screen-recording or macro tools, AndroMolt uses live AI reasoning at every step — reading the actual UI tree, planning the next action, and adapting when things don't go as expected. It's not a script. It's an agent.
|
The entire agent — observe, plan, act — runs in Kotlin on a background thread. The agent keeps working even when the app is backgrounded. |
Reads Android's Accessibility Tree in real-time. Understands every button, text field, scroll view — exactly like a human would. |
|
Seamlessly switches between OpenAI GPT-4o-mini and Google Gemini 2.0 Flash. Falls back to rule-based heuristics if no key is provided. |
Detects when it's stuck (same screen 4+ times) and automatically retries by pressing Back and replanning. |
|
Clean, premium chat interface with live step-by-step logs, reasoning traces, and action feedback in real time. |
All UI interaction happens locally on your device. Only the UI snapshot + goal is sent to the LLM API — never screenshots or personal data. |
╔═══════════════════════════════════════════════════════════╗
║ React Native / Expo (UI Layer) ║
║ components/ChatInterface.tsx ║
║ • Natural Language Input • Live Agent Logs ║
║ • Progress Indicators • App Picker ║
╚═══════════════════╦═══════════════════════════════════════╝
║ NativeModules.AndroMoltCore
║ .runNativeAgent(goal, apiKeys)
▼
╔═══════════════════════════════════════════════════════════╗
║ Kotlin / Android Native (Agent Core) ║
║ ║
║ AndroMoltCoreModule.kt ←── React Native Bridge ║
║ └── NativeAgentLoop.kt (Background Thread) ║
║ ├── NativeLlmClient.kt ║
║ │ ├── OpenAI GPT-4o-mini ║
║ │ ├── Google Gemini 2.0 Flash ║
║ │ └── FallbackHeuristics.kt ║
║ └── AccessibilityController.kt ║
║ └── AndroMoltAccessibilityService.kt ║
╚═══════════════════════════════════════════════════════════╝
The TypeScript layer is UI-only. All automation logic — the agent loop, LLM calls, and UI interaction — runs entirely in Kotlin on a native Android background thread.
Each step inside NativeAgentLoop.kt follows a clean OBSERVE → PLAN → ACT → SETTLE cycle:
| Phase | What happens |
|---|---|
| 👁️ OBSERVE | AccessibilityController.getUiSnapshot() reads the live screen. UiTreeBuilder compresses it to a max-150-element text snapshot the LLM can reason about. |
| 🧠 PLAN | NativeLlmClient.getNextAction() sends the goal + screen snapshot to the LLM, which responds with a single structured JSON action. |
| ⚡ ACT | AccessibilityController executes the action — click, type, scroll, launch, swipe — via the Android Accessibility API. |
| ⏳ SETTLE | Waits 2000ms for the UI to update, then loops back to OBSERVE. |
Stuck detection: if the same screen hash appears 4+ times with 2+ consecutive failures, the agent automatically presses Back and retries with a new plan.
NativeLlmClient.kt tries providers in this priority order:
Priority 1 ──► OpenAI GPT-4o-mini (if EXPO_PUBLIC_OPENAI_API_KEY is set)
Priority 2 ──► Google Gemini 2.0 Flash (if EXPO_PUBLIC_GEMINI_API_KEY is set)
Priority 3 ──► FallbackHeuristics (rule-based, no API required — limited)
AndroMolt is just getting started. Here's what's coming — and it's going to be massive.
|
Model Context Protocol support — expose your Android device as an MCP resource. Control your phone from any MCP-compatible AI client, IDE, or custom tool.
|
Control your Android device from anywhere — PC, browser, or another device. Trigger automations remotely via a secure web dashboard or REST API.
|
A completely redesigned, ultra-premium interface — animated agent status, live visual overlays showing what the agent is clicking, and a beautiful command history timeline.
|
|
Plug in any LLM — Anthropic Claude, Mistral, Cohere, DeepSeek, OpenRouter, and more. Choose your preferred AI brain per task or let AndroMolt auto-select the best one.
|
Run the agent 100% offline with local models via Ollama, LM Studio, or llama.cpp. Full privacy, zero API costs, works without internet.
|
Scheduled automations, multi-device orchestration, voice command mode, workflow templates, plugin SDK for developers, and a community marketplace.
|
- Android device or emulator (API 26+)
- Node.js 18+ and npm
- Android Studio with Android SDK
git clone https://github.com/JASIM0021/AndroMolt.git
cd AndroMolt
npm installCreate a .env file in the project root:
EXPO_PUBLIC_OPENAI_API_KEY=sk-...
EXPO_PUBLIC_GEMINI_API_KEY=AIza...At least one key is required for full functionality. Without either key, the agent falls back to rule-based heuristics.
npx expo run:androidThe app needs Accessibility Service permission to control other apps:
- Tap "Enable Accessibility" on the onboarding screen
- Android Settings opens — find AndroMolt in the Accessibility list
- Enable it and return to the app
- 🎉 Start automating!
See AndroMolt in action — zero tapping, zero scripting, just talk.
install_app_agent.mp4
|
play_yt_songs.mp4
|
💬 "Open YouTube and play a Hindi song"
💬 "Send good morning to didi on WhatsApp"
💬 "Open Chrome and search for today's weather"
💬 "Open Settings and turn on Wi-Fi"
💬 "Open Instagram and like the first post"
💬 "Set an alarm for 7 AM tomorrow"
💬 "Open Spotify, search for lo-fi music, and play the first result"
💬 "Go to Amazon, search for wireless earbuds under ₹2000"
| Action | Description |
|---|---|
open_app |
Launch any app by package name |
click_by_text |
Click element with matching visible text |
click_by_content_desc |
Click element by accessibility description |
click_by_index |
Click element by index in the UI tree |
input_text |
Type text into the focused field |
press_enter |
Press Enter / keyboard Search button |
scroll |
Scroll up or down |
back |
Press the system Back button |
wait |
Wait N milliseconds |
complete_task |
Mark goal as achieved ✅ |
EventBridge.kt emits real-time events via DeviceEventEmitter. The chat UI listens and renders them as live, step-by-step logs:
| Event | Payload |
|---|---|
agentStart |
{ goal } |
agentStep |
{ step, package, elementCount } |
agentThink |
{ message } |
agentAction |
{ action, params, reasoning } |
actionResult |
{ success, message } |
agentComplete |
{ steps, message } |
AndroMolt/
├── app/
│ └── (tab)/index.tsx # Entry point → renders ChatInterface
├── components/
│ ├── ChatInterface.tsx # Main UI: chat, live logs, progress
│ └── OnboardingScreen.tsx # Accessibility permission setup
├── lib/
│ ├── automation/
│ │ └── AgentEvents.ts # AgentEvent type (used by chat UI)
│ └── stores/
│ └── automationStore.ts # Zustand: chat history + state
├── types/
│ ├── agent.ts # AgentAction / AgentResult interfaces
│ └── automation.ts # Shared automation types
├── constants/
│ └── theme.ts # UI colors and fonts
└── android/app/src/main/java/com/anonymous/androMolt/
├── agent/
│ ├── NativeAgentLoop.kt # 🔁 OBSERVE-PLAN-ACT loop
│ ├── NativeLlmClient.kt # 🧠 LLM API calls (OpenAI / Gemini)
│ └── FallbackHeuristics.kt # 🛡️ Rule-based fallback
├── accessibility/
│ ├── AndroMoltAccessibilityService.kt # Android Accessibility Service
│ ├── AccessibilityController.kt # click / type / scroll / launch
│ └── UiTreeBuilder.kt # UI tree → LLM-readable text
├── service/
│ └── AndroMoltForegroundService.kt # Keeps agent alive in background
├── utils/
│ ├── EventBridge.kt # Emits events to React Native JS
│ └── PermissionHelper.kt # Permission checking
└── modules/
└── AndroMoltCoreModule.kt # @ReactMethod bridge exports
| Layer | Technology |
|---|---|
| 📱 UI Framework | React Native + Expo 54 |
| 🗺️ Navigation | Expo Router |
| 🗃️ State Management | Zustand |
| 🤖 AI / LLM | OpenAI GPT-4o-mini · Google Gemini 2.0 Flash |
| ⚙️ Native Automation | Kotlin + Android Accessibility Service |
| 🌐 Native HTTP | OkHttp3 |
| 🔨 Build System | Gradle + Expo Prebuild |
- Android only (Kotlin + Accessibility Service)
- Accessibility Service permission must be granted manually in Settings
- Screen reading is text-based — purely graphical elements with no text or description may require
click_by_index - Maximum 20 steps per task (configurable)
- LLM API keys required for best results; heuristics cover basic flows only
Contributions are what make the open-source community an incredible place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.