This repository contains a collection of Python scripts with a graphical user interface (PyQt6) designed for a full cycle of video processing: from audio extraction to text summarization.
A utility for extracting audio tracks from video files. Prepares audio for further recognition.
- Input Formats:
.mp4,.mkv,.avi,.mov,.wmv - Output Formats:
.wav(PCM 16kHz Mono) — Recommended format for speech recognition..mp3— Compressed format to save space.
- Key Features: Automatic conversion of sampling rate and channels.
A tool for automatic speech recognition (Speech-to-Text) based on the Whisper model.
- Engine:
faster-whisper(optimized version of OpenAI Whisper). - Operation Mode: Local execution on CPU (with
int8quantization to reduce memory consumption). - Input Formats:
.wav,.mp3,.m4a,.flac. - Output: A
.txtfile containing the full recognized text. - Note: Currently configured to recognize Russian speech (
language="ru"). - Default model:
medium(good quality for 4 CPU / 8 GB RAM server profile).
A smart text summarizer using the YandexGPT API.
- Functionality: Creates a summary of large texts, lectures, or interviews.
- Key Features:
- Automatic slicing of long texts into chunks of 8000 characters.
- Support for custom system prompts (instructions for the neural network).
- Operates via the official Yandex Cloud API.
- Requirements: You must provide an API Key, Folder ID, and Model URI.
To run the scripts, you need Python 3.8+ installed.
Run the following command in your terminal:
pip install PyQt6 moviepy faster-whisper requestsNote: If you encounter errors with
moviepy, ensure you have a compatible version installed or check the script's built-in exception handling.
Execute the scripts sequentially depending on your task:
- Extract audio from video:
python extractor.py
- Recognize text from audio:
python recognizer.py
- Generate a summary:
python summarizer.py
- To use
summarizer.py, you need a Yandex Cloud account with a service account that has theai.languageModels.userrole. - The selected Whisper model will be downloaded automatically on first run. This may take some time.
- You can tune recognizer via environment variables:
WHISPER_MODEL_SIZE=medium
WHISPER_COMPUTE_TYPE=int8
WHISPER_CPU_THREADS=4
WHISPER_NUM_WORKERS=1
WHISPER_BEAM_SIZE=5
WHISPER_LANGUAGE=ru
WHISPER_VAD_FILTER=1
python recognizer.py- If recognition is too slow, switch to
WHISPER_MODEL_SIZE=small.