Skip to content
/ Jarvis Public

Continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.

License

Notifications You must be signed in to change notification settings

Link-/Jarvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jarvis Audio Transcription

A continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.

Features

  • Continuous Recording: Records audio from your device's microphone in chunks
  • Real-time Transcription: Transcribes speech to text using OpenAI's Whisper model
  • Memory-efficient: Processes audio in chunks with minimal memory overhead
  • Silence Detection: Skips processing silent audio segments
  • Configurable: Multiple command-line options for customization
  • Debug Mode: Detailed logging for troubleshooting

How It Works

  1. Audio is continuously recorded in small chunks
  2. Each chunk is analyzed for speech content
  3. Speech segments are processed through the Whisper model
  4. Transcribed text is printed to the console with timestamps

Sequence Diagram

The following diagram illustrates the multi-threaded architecture and data flow of the transcription process:

sequenceDiagram
    participant Main
    participant RecordingThread
    participant AudioQueue
    participant TranscriptionThread
    participant WhisperModel
    participant Console

    Note over Main: Initialize ContinuousTranscriber
    Main->>Main: Load Whisper Model
    Main->>RecordingThread: Start recording thread
    Main->>TranscriptionThread: Start transcription thread
    
    loop Recording Loop
        RecordingThread->>RecordingThread: Record audio chunk (5s)
        RecordingThread->>RecordingThread: Save overlap buffer (0.5s)
        RecordingThread->>RecordingThread: Calculate RMS for silence detection
        alt RMS > silence_threshold (Speech detected)
            RecordingThread-->>AudioQueue: Push audio chunk + previous overlap
        else RMS <= silence_threshold
            RecordingThread->>RecordingThread: Skip silent audio (no processing)
        end
    end
    
    loop Transcription Loop
        AudioQueue-->>TranscriptionThread: Pop audio chunk when available
        TranscriptionThread->>WhisperModel: Send audio for transcription
        WhisperModel-->>TranscriptionThread: Return transcribed text
        alt Transcription not empty
            TranscriptionThread->>Console: Output text with timestamp
        end
    end
    
    Note over Main: On Ctrl+C
    Main->>RecordingThread: Stop recording
    Main->>TranscriptionThread: Stop transcription
    Main->>Main: Terminate PyAudio
Loading

Requirements

  • Python 3.7+
  • PortAudio (for audio recording)
  • FFmpeg (required by Whisper)

Quick Start

1. Setup the Environment

# Clone the repository (if not already done)
# git clone <repository-url>
# cd <repository-directory>

# Setup development environment (creates venv, installs dependencies)
make setup

2. List Available Audio Devices

make list-devices

3. Run the Transcription

# Run with default settings
make run

# Run with debug logging enabled
make run-debug

Command-Line Options

The script supports various command-line arguments for customization:

--chunk-duration SEC    Duration of each audio chunk in seconds (default: 5)
--sample-rate RATE      Audio sample rate in Hz (default: 16000)
--channels N            Number of audio channels (default: 1)
--overlap-duration SEC  Overlap between chunks in seconds (default: 0.5)
--silence-threshold TH  RMS threshold for silence detection (default: 0.01)
--device-index N        Specific audio device index (default: system default)
--model SIZE            Whisper model size (default: base)
                        Options: tiny, turbo, base, small, medium, large
--list-devices          List available audio input devices and exit
--debug                 Enable debug mode for verbose logging

Example:

# Using a specific device and the small model
venv/bin/python record.py --device-index 2 --model small

# With custom audio settings
venv/bin/python record.py --chunk-duration 3 --sample-rate 44100 --overlap-duration 0.2

Troubleshooting

If you encounter issues:

  1. Use --list-devices to ensure the correct audio source is selected
  2. Enable debug mode with --debug for detailed logging
  3. Try different --silence-threshold values if speech is not being detected
  4. Check that system dependencies (PortAudio, FFmpeg) are properly installed
  5. Try a smaller model like "tiny" if transcription is slow

License

[Specify your license here]

About

Continuous audio transcription script that records audio from your device's microphone and transcribes it to text in real-time using OpenAI's Whisper model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published