LiveVTT is a tool for live transcription of streaming audio/video content, providing real-time subtitles in WebVTT format.
LiveVTT allows you to transcribe live audio/video streams and generate WebVTT subtitles. It supports various features such as model selection, CUDA utilization, silence filtering, and more.
livevtt -u <URL> [-s] [-l <BIND_ADDRESS>] [-p <BIND_PORT>] [-m <MODEL>] [-b <BEAM_SIZE>] [-c <USE_CUDA>] [-t <TRANSLATE>] [-vf <VAD_FILTER>] [-la <LANGUAGE>] [-ua <USER_AGENT>]
-u, --url
: [Required] URL of the live audio/video stream.-s, --hard-subs
: Set if you want the subtitles to be baked into the stream itself.-l, --bind-address
: The IP address to bind to (defaults to 127.0.0.1).-p, --bind-port
: The port to bind to (defaults to 8000).-m, --model
: Whisper model to use (defaults to large).-b, --beam-size
: Beam size to use (defaults to 5).-c, --use-cuda
: Use CUDA where available. Defaults to true.-t, --transcribe
: If set, transcribes rather than translates the given stream.-vf, --vad-filter
: Whether to utilize the Silero VAD model to try and filter out silences. Defaults to false.-la, --language
: The original language of the stream, if known/not multilingual. Can be left unset.-ua, --user-agent
: User agent to use to retrieve playlists/stream chunks (defaults to 'VLC/3.0.18 LibVLC/3.0.18').
Once the program is running, you can access the transcribed and/or translated stream at the following URL:
http://127.0.0.1:8000/playlist.m3u8
This URL may vary based on the bind address and port provided via the command-line options.
-
Clone the repository:
git clone https://github.com/Psychotropos/livevtt.git
-
Navigate to the directory:
cd livevtt
-
Install dependencies:
- For general installation:
pip install -r requirements.txt
- For CUDA support on Windows:
pip install -r requirements-cuda-win.txt
-
Transcribe a live audio/video stream with default settings:
livevtt -u <URL>
-
Transcribe a live audio/video stream and embed subtitles:
livevtt -u <URL> -s
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.