A Flutter mobile app for Sanskrit reading practice using Whisper.cpp for speech recognition.
- Offline Speech Recognition: Uses Whisper.cpp with ggml quantized models
- Sanskrit Support: Optimized for Sanskrit pronunciation using Hindi language model
- Read-Along Experience: Interactive word-by-word reading practice
- Text-to-Speech: Tap words to hear correct pronunciation
- Fully Offline: All processing happens on-device
- Flutter: Cross-platform mobile framework
- Whisper.cpp: Fast inference of OpenAI's Whisper models
- ggml: Efficient machine learning inference library
- Flutter TTS: Text-to-speech for pronunciation guidance
lib/
├── main.dart # Main app entry point
└── services/
└── whisper_service.dart # Whisper.cpp integration service
assets/
└── model/
└── ggml-sanskrit-q5_1.bin # Quantized Whisper model for Sanskrit
- Flutter SDK (>=3.1.0)
- Android Studio / Xcode
- Whisper ggml model for Sanskrit
- Clone the repository:
git clone <your-repo-url>
cd read_story- Install dependencies:
flutter pub get- Ensure the ggml model is in the assets folder:
assets/model/ggml-sanskrit-q5_1.bin
- Run the app:
flutter run- Model Loading: On startup, the app copies the ggml model from assets to app storage
- Audio Recording: Captures microphone input at 16kHz (Whisper's native sample rate)
- Streaming Recognition: Accumulates 3-second audio chunks for processing
- Transcription: Whisper.cpp transcribes audio chunks in real-time
- Word Matching: Compares transcribed text with expected Sanskrit words
- Progress Tracking: Advances through the story as words are correctly spoken
- Format: GGML quantized (Q5_1)
- Language: Hindi (used for Sanskrit recognition)
- Size: Optimized for mobile deployment
- Inference: On-device using whisper_flutter_new package
Edit the _words list in lib/main.dart:
final List<String> _words = ['एकः', 'काकः', 'पिपासितः', 'आसीत्', 'सः', 'जलार्थम्'];Modify the buffer duration in lib/services/whisper_service.dart:
static const int _bufferSizeSeconds = 3; // Seconds of audio to accumulate- Model Load Time: ~2-5 seconds on modern devices
- Transcription Latency: ~1-3 seconds per 3-second audio chunk
- Memory Usage: ~200-300 MB with model loaded
- Battery Impact: Moderate during active recording
The app requires microphone permission for audio recording.
This project is licensed under the MIT License.
- OpenAI for Whisper models
- ggerganov for whisper.cpp and ggml
- Flutter community for mobile development framework