GitHub Repository • Copilot Instructions
This app lets you easily transcribe media files (audio or video) using AI. Users can upload, share, or open media files which are then re-encoded and sent to multiple transcription models. The app supports traditional Whisper-1 transcription, as well as newer GPT-4o variants that either directly transcribe or provide enhanced output. AI-powered post-processing further improves transcript readability. All transcriptions and settings are stored locally—with your API key secured in encrypted storage.
-
Media Handling:
- Select, share, or open media files directly from your device.
- Direct transcription of voice messages from messaging apps like WhatsApp, Telegram, or Signal.
- Automatic re-encoding using FFmpegKit to M4A/AAC format for all transcription models.
- File size check (max. 24MB after processing) to ensure smooth operation.
-
Multiple Transcription Models:
- Whisper-1: Traditional audio transcription model.
- GPT-4o Transcribe: Specifically optimized for transcription tasks.
- GPT-4o-mini Transcribe: Efficient transcription model for cost-effective processing.
-
AI-Powered Cleanup:
- Enhance transcript readability with AI-driven cleanup using GPT-4o chat completions.
- Customizable cleanup prompts ensure the original content is preserved while improving clarity.
- Optional auto-format setting to automatically enhance readability after every transcription.
-
Local History & Settings:
- Maintain a local history of transcriptions with comprehensive statistics:
- Transcript length (character count)
- File sizes (original and uploaded)
- Audio duration (when available)
- Processing settings used (model, language, prompt)
- History view displays only relevant information for each entry, with expandable details.
- Secure API key storage using EncryptedSharedPreferences.
- In-app Settings allow you to:
- Save and test your OpenAI API key.
- Configure transcription models and language preferences.
- Customize prompts for both transcription and AI cleanup.
- Enable auto-format to automatically enhance transcriptions after processing.
- Maintain a local history of transcriptions with comprehensive statistics:
-
User Experience Enhancements:
- Support for shared intents (from other apps) and direct file access.
- Retry functionality for reprocessing files with updated parameters.
- Clear, responsive UI built with Jetpack Compose and modern Android architecture practices.
-
Use Cases:
- Convert voice messages to text for easy reading and sharing
- Archive and search through voice message content
- Make voice messages accessible for hearing-impaired users
- Quick transcription of meeting recordings and lectures
-
Initialization & Setup:
- Enter and test your OpenAI API key in the Settings screen.
- Choose your transcription model and set any custom prompts or language preferences.
-
Media Processing:
- Select a media file, or share one to the app.
- The file is copied, re-encoded to AAC format, and its size is validated.
-
Transcription & Cleanup:
- The processed file is uploaded to the selected transcription API.
- Once transcribed, the text is optionally enhanced with an AI cleanup process.
- The final transcript is displayed and stored locally with comprehensive statistics:
- File sizes (original and processed)
- Transcript length
- All processing parameters used
-
History & Management:
- View past transcriptions with detailed statistics (file sizes, transcript length, duration).
- Each entry shows only relevant processing settings that were actually used.
- Copy transcriptions with or without detailed statistics.
- Delete individual entries or clear all history as needed.
graph TD
A[Start] --> B[Enter & Test API Key in Settings]
B --> C[Select Transcription Model & Set Prompts]
C --> D[Choose Media File / Share to App]
D --> E["Copy & Re-encode File (M4A/AAC)"]
E --> F["File Size Check (<= 24MB)"]
F --> G[Upload to Selected Transcription API]
G --> H{Response Successful?}
H -- Yes --> I[Display & Store Transcript]
I --> J[Optional: AI Cleanup for Readability]
J --> K[Update History]
H -- No --> L[Show Error Message]
K --> M[User Can Retry or View History]
M --> N[End]
-
Third-Party Processing:
Your media files are sent to OpenAI’s servers. Ensure you have permission to share and transcribe them. -
Sensitive Data:
Do not transcribe content with sensitive personal or confidential information. -
User Responsibility:
Use your own OpenAI API key. You are responsible for any costs incurred, and you must comply with OpenAI’s Terms of Service.
For developers and testers who want to build the app with a pre-configured OpenAI API key for testing, see the Testing Guide. This allows you to create builds with embedded API keys without committing secrets to the repository.
Future enhancements include:
- Refactoring UI state management into ViewModels.
- Consolidating duplicated code and further adopting Kotlin coroutines with Retrofit’s suspend functions.
- Enhancing the UI and user interaction flow.
