A continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.
- Continuous Recording: Records audio from your device's microphone in chunks
- Real-time Transcription: Transcribes speech to text using OpenAI's Whisper model
- Memory-efficient: Processes audio in chunks with minimal memory overhead
- Silence Detection: Skips processing silent audio segments
- Configurable: Multiple command-line options for customization
- Debug Mode: Detailed logging for troubleshooting
- Audio is continuously recorded in small chunks
- Each chunk is analyzed for speech content
- Speech segments are processed through the Whisper model
- Transcribed text is printed to the console with timestamps
The following diagram illustrates the multi-threaded architecture and data flow of the transcription process:
sequenceDiagram
participant Main
participant RecordingThread
participant AudioQueue
participant TranscriptionThread
participant WhisperModel
participant Console
Note over Main: Initialize ContinuousTranscriber
Main->>Main: Load Whisper Model
Main->>RecordingThread: Start recording thread
Main->>TranscriptionThread: Start transcription thread
loop Recording Loop
RecordingThread->>RecordingThread: Record audio chunk (5s)
RecordingThread->>RecordingThread: Save overlap buffer (0.5s)
RecordingThread->>RecordingThread: Calculate RMS for silence detection
alt RMS > silence_threshold (Speech detected)
RecordingThread-->>AudioQueue: Push audio chunk + previous overlap
else RMS <= silence_threshold
RecordingThread->>RecordingThread: Skip silent audio (no processing)
end
end
loop Transcription Loop
AudioQueue-->>TranscriptionThread: Pop audio chunk when available
TranscriptionThread->>WhisperModel: Send audio for transcription
WhisperModel-->>TranscriptionThread: Return transcribed text
alt Transcription not empty
TranscriptionThread->>Console: Output text with timestamp
end
end
Note over Main: On Ctrl+C
Main->>RecordingThread: Stop recording
Main->>TranscriptionThread: Stop transcription
Main->>Main: Terminate PyAudio
- Python 3.7+
- PortAudio (for audio recording)
- FFmpeg (required by Whisper)
# Clone the repository (if not already done)
# git clone <repository-url>
# cd <repository-directory>
# Setup development environment (creates venv, installs dependencies)
make setup
make list-devices
# Run with default settings
make run
# Run with debug logging enabled
make run-debug
The script supports various command-line arguments for customization:
--chunk-duration SEC Duration of each audio chunk in seconds (default: 5)
--sample-rate RATE Audio sample rate in Hz (default: 16000)
--channels N Number of audio channels (default: 1)
--overlap-duration SEC Overlap between chunks in seconds (default: 0.5)
--silence-threshold TH RMS threshold for silence detection (default: 0.01)
--device-index N Specific audio device index (default: system default)
--model SIZE Whisper model size (default: base)
Options: tiny, turbo, base, small, medium, large
--list-devices List available audio input devices and exit
--debug Enable debug mode for verbose logging
Example:
# Using a specific device and the small model
venv/bin/python record.py --device-index 2 --model small
# With custom audio settings
venv/bin/python record.py --chunk-duration 3 --sample-rate 44100 --overlap-duration 0.2
If you encounter issues:
- Use
--list-devices
to ensure the correct audio source is selected - Enable debug mode with
--debug
for detailed logging - Try different
--silence-threshold
values if speech is not being detected - Check that system dependencies (PortAudio, FFmpeg) are properly installed
- Try a smaller model like "tiny" if transcription is slow
[Specify your license here]