Voot, standing for "Voice On Top," is an intelligent simultaneous-interpretation & text translation app for HarmonyOS, powered by your own LLM / translation APIs.
It is designed with three core principles: security, privacy, and simplicity.
Note
Voot does not provide or resell any LLM/translation service.
You bring your own API keys (OpenAI, DeepL, Ollama, ่ฑๅ
, etc.).
Voot has launched on Huawei AppGallery (Overseas) (Note: You need an oversea internet environment to access). Also, releases update will be still available on GitHub for sideloading. Even though, we strongly recommend you to follow the AppGallery listing for the latest version.
- Features
- Architecture
- Screenshots
- Getting Started
- Install Hap
- Configuration
- Usage
- Security & Privacy
- Roadmap
- Blueprints
- Contributing
- Model Performance
- Known Issues
- Acknowledgements
- License
- Disclaimer
- ๐ Secure by design
- No built-in or hosted model โ you must configure your own API keys.
- API keys are stored only in the HarmonyOS sandbox, protected by face / biometric unlock.
- No third-party analytics SDKs.
- ๐ต๏ธ Privacy-first
- Audio is processed locally on-device for capture & pre-processing.
- Recorded audio for translation is not uploaded and is destroyed after processing.
- Only the minimal text required for translation is sent directly to the provider you configure.
- ๐งฉ Multi-provider support
- OpenAI (GPT-style chat / translation)
- DeepL
- Ollama (local LLM gateway)
- ่ฑๅ / other custom endpoints (via configurable URL & API key)
- ๐ฃ๏ธ Simultaneous interpretation
- One-tap start/stop of โliveโ translation.
- Clear split between original text and translated text.
- ๐ Device Continuation
- Seamlessly transfer your active translation session to another HarmonyOS device (e.g., from Phone to Tablet).
- Keeps your current transcription and translation context intact.
- ๐ผ๏ธ Subtitles
- Floating subtitle window that works over other apps.
- Resizable and movable overlay for seamless multitasking.
- ๐ฑ Desktop Widgets
- Control Card: Start/stop subtitle and interpretation directly from the home screen.
- Token Card: Monitor your API token usage without opening the app.
- ๐จ Air Gestures
- Control translation start/stop without touching the screen.
- Ideal for hands-free operation during presentations or cooking.
- โจ Text Polishing
- Improve the quality and tone of translated text.
- Refine rough translations into more natural and professional language.
- ๐ท Scan & Translate
- Scan text from physical documents or screens using the camera.
- Instantly translate scanned text with save functionality.
Voot/
โโ entry/
โ โโ src/main/ets/
โ โ โโ pages/ # ArkUI pages (Index, Configuration, Translation, Settings, etc.)
โ โ โโ services/ # Mic + ASR services (SherpaWhisperMicService, PipSubtitleManager)
โ โ โโ storage/ # Preference-backed stores (API config, TokenUsage, etc.)
โ โ โโ components/ # Shared UI builders (PolicySheet, TokenUsageChart, etc.)
โ โ โโ widget/ # Service Cards (Desktop Widgets)
โ โ โโ entryformability/ # Widget lifecycle management
โ โ โโ workers/ # Background ASR workers for long-running capture
โ โโ src/main/resources/ # Raw HTML, media assets, Sherpa models
โ โโ oh-package*.json5 # Module package definitions
โ โโ build-profile.json5 # Entry module build settings
โโ AppScope/ # Application-level configuration and assets
โโ hvigorfile.ts # Workspace hvigor build script
โโ build-profile.json5 # Global build profile
-
HarmonyOS toolchain:
- DevEco Studio with ArkTS support
- HarmonyOS SDK (version matching the project, current: 6.0.1(21))
-
A HarmonyOS device or emulator
-
One or more API keys, for example:
- OpenAI API key
- DeepL API key
- Ollama endpoint running locally or on LAN
- ่ฑๅ / other compatible HTTP API
git clone https://github.com/YANGZX22/Voot.git
cd VootOpen the project in DevEco Studio.
- Connect a HarmonyOS device or start an emulator.
- In DevEco Studio, select the run configuration corresponding to the app.
- Click Run to build and deploy.
Or you can use Auto-installer or DevEcho Testing for installation.
Important
Huawei's signing servers block IP addresses outside mainland China. To sideload software for HarmonyOS NEXT in countries/regions outside mainland China.
Note
Apps sideloaded via self-signing on HarmonyOS NEXT have a default validity period of 14 days. Completing Developer Real-Name Authentication extends this period to 180 days.
In the โ้ ็ฝฎ APIโ tab:
-
Choose the current provider (e.g. OpenAI, DeepL, Ollama, ่ฑๅ ).
-
Tap โ้ ็ฝฎ APIโ.
-
For each provider, fill in:
- API URL (e.g.
https://api.openai.com/v1/chat/completions,https://api-free.deepl.com/v2/translate, or your Ollama endpoint) - API Key / Token
- Optional: custom prompt / system message used for translation.
- API URL (e.g.
The configuration is stored locally in the sandbox and bound to face / biometric verification when accessing/modifying.
In the โ็ฎๆ ่ฏญ่จ / Target languageโ section:
- Select your default output language (e.g. ไธญๆ, English, etc.).
- The chosen target language is used for all translation APIs by default.
In the โๆฏ่ฏญๅบ / Glossaryโ menu:
- Enter term pairs in the format
Original = Translation(one per line). - Example:
HarmonyOS = ้ธฟ่ AI = ไบบๅทฅๆบ่ฝ - These terms are automatically appended to the system prompt, instructing the LLM to strictly follow your terminology.
-
Launch Voot on your HarmonyOS device.
-
Configure API:
- Go to the first tab Configuration.
- Select an API provider (OpenAI, DeepL, etc.) and enter your API Key/URL.
- Set your Target Language.
-
Live Translation (็ฟป่ฏ):
- Switch to the Translation tab.
- Tap โๅผๅฏ้บฆๅ ้ฃโ to start capturing audio.
- Speak in the source language; the app will transcribe and translate in real-time.
- Air Gestures: Wave your hand above the front camera to start/stop translation without touching the screen.
- Device Continuation: Tap the Transfer (ๆต่ฝฌ) icon to move the session to another HarmonyOS device.
-
Text Polishing (ๆถฆ่ฒ):
- Switch to the Polishing tab.
- Input or paste text that needs refinement.
- The AI will improve the tone, grammar, and clarity of the text.
-
Scan & Translate (ๆซๆ):
- Switch to the Scan tab.
- Point the camera at a document or screen.
- The app will recognize the text and provide an instant translation.
- You can save the scanned results to History.
Short summary (see in-app privacy policy / privacy.html for details):
-
Audio:
- Recorded only on device for the current translation session.
- Not uploaded to our servers (we have none).
- Discarded after processing.
-
API Keys:
- Stored in the app sandbox.
- Protected with HarmonyOS face/biometric mechanisms.
- Never transmitted to any server except the provider you configured.
-
Data Flow:
- Text is sent only to your chosen provider (OpenAI / DeepL / etc.).
- No central logging, analytics, or telemetry from the developer.
Finished / planned / possible steps:
- Subtitle (Realized โ )
- Live Window on HarmonyOS (Realized โ )
- Desktop Widgets (Realized โ )
- Token usage analytics (Realized โ )
- Glossary / Terminology Support (Realized โ )
- Device Continuation (Realized โ )
- History & Favorites (Realized โ )
- Air Gestures (Realized โ )
- Text Polishing (Realized โ )
- Scan & Translate (Realized โ )
- Pose Detection Button Dialog (Realized โ )
- Support for more LLM / translation APIs (e.g. Google Translate)
- Enhanced ASR and cutoff logic
- More supported original languages
Feel free to open issues or PRs with feature requests.
Moving beyond "Speech-to-Text-to-Translation" lossy pipelines:
- Direct Audio Input Sending VAD-filtered audio segments directly to multimodal models (e.g., GPT-4o Audio, Gemini 1.5 Pro).
- Nuance Capture Preserving tone, emotion (sarcasm, urgency), and speaker identity which are often lost in ASR.
- Feedback Loop Using the rich understanding from the multimodal engine to "feed back" into the frontend, correcting previous ASR errors or updating the context for the Fast Track.
- Implement a confidence scoring system that highlights translated text based on the model's certainty.
- Low-confidence segments could be visually marked (e.g., colored or underlined), prompting users to review or wait for the Slow Track refinement.
Contributions are welcome!
- Fork the repo.
- Create a new branch for your feature/bugfix.
- Make changes and add tests where appropriate.
- Submit a pull request with a clear description and screenshots if UI-related.
Before submitting, please:
- Do not commit any real API keys or secrets.
- Ensure the app builds and runs on the current HarmonyOS SDK version.
-
Well performed LLM/Translating models on this APP by testing:
- OpenAI: gpt-4o-mini (preferred๐), gpt-4o
- DeepL: Free and Pro are both well performed (preferred๐)
- ่ฑๅ ๏ผ็ซๅฑฑๅผๆ๏ผ: DeepSeek v3.2 (deepseek-v3-2-251201) (preferred๐)
-
Badly performed LLM/Translating models on this APP by testing:
- OpenAI: gpt-5 and all thinking models series (for Translating task)
- ่ฑๅ ๏ผ็ซๅฑฑๅผๆ๏ผ: Doubao-Seed-1.6-lite (doubao-seed-1-6-lite-251015)
- ASR accuracy may vary based on background noise and microphone quality.
- Some API providers may have rate limits or usage costs; monitor your usage carefully.
When tap ไผ ่ฏ button in desktop widget, the app may not open interpretation correctly due to HarmonyOS restrictions.[Fixed]Subtitle floating window may have layout issues on certain screen sizes, e.g. Pad. You can try to adjust the size of the floating window manually. An update may fix this in the future.[Fixed]- Doubao API always gives latente responses when use Lite models. This may be due to the server side of Doubao, not the app itself. [To be confirmed]
- Inspiration: LiveCaptions Translator on Windows.
- This project is an independent implementation and does not reuse any code from that repository.
This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
Please see the LICENSE file in this repository for full license text.
Voot is an independent open-source project.
-
It is not affiliated with or endorsed by OpenAI, DeepL, Ollama, ่ฑๅ , or any other API provider.
-
You are responsible for:
- Complying with the terms of the APIs you connect to.
- Any costs generated by your API usage.
- Ensuring that your use of Voot and third-party services complies with local laws and regulations.

