If you want to see a video about how to use this repo, check this video
Ensure that Python 3.x is installed:
python --versionFFmpeg is required for audio and video processing. Follow the steps below to install it:
- Download the FFmpeg executable from the official FFmpeg website.
- Extract the downloaded zip file to a folder (e.g.,
C:\ffmpeg). - Add the
bindirectory to your system's PATH:- Open the Start Menu, search for "Environment Variables", and select "Edit the system environment variables".
- Click on "Environment Variables".
- Under "System variables", find the
Pathvariable and click "Edit". - Click "New" and add the path to the
bindirectory (e.g.,C:\ffmpeg\bin). - Click "OK" to save the changes.
- Install FFmpeg using a package manager:
# For Ubuntu/Debian sudo apt update sudo apt install ffmpeg # For MacOS using Homebrew brew install ffmpeg
- Create and activate a virtual environment:
py -m venv venv
source venv/Scripts/activate # Windows
source venv/bin/activate # Unix/MacOS- Install the dependencies:
pip install -r requirements.txt- Duplicate
.env.exampleto.envand add the credentials:
cp .env.example .envPlace the videos in /videos and run:
python main.py --language [TARGET_LANGUAGE] --action [ACTION]--languageor-l: Specifies the target language for the translation (default is English).--actionor-a: Defines the last action to perform (options: extract, transcribe, translate, all).
- Extract: Extracts the audio from the source video
- Transcribe: Transcribes the source video into vtt or json format
- Translate: Translates the transcription into the target language
- All: Performs all the above actions and generates a new version of the video in the target language using ElevenLabs (you must have the API KEY)
Record, transcribe, and translate audio in real-time:
- Choose the microphone ID to record: First, the application will show you the available devices, just type the ID of the one you want to use.
- Set the target language: Then, you will be asked to type the language you want to translate to, it can be any language.
- After that, just press
Enterto start recording and you're done!
This script acts as a voice assistant that records audio, transcribes it, generates responses using OpenAI, and converts the responses into speech. It allows users to select audio devices, voices, and system prompts. The script also logs the times for transcription, response generation, and speech generation, and concatenates audio files into a single conversation file.
This script records audio from a selected microphone and transcribes it. The transcription is then reformatted into a more readable format using markdown syntax. The script saves the formatted transcription to a markdown file.
main.py: MP3 audio, transcription, translation, and translated audio in/output, additionally the translated video if you selected all.record.py: WAV recordings, transcriptions, translations, and translated audio inrecording_sessions.voice_assistant.py: WAV recordings, transcriptions, responses, and concatenated audio files innotes.notetaker.py: WAV recordings and formatted transcriptions innotes.
.gitignoreexcludesvenvandoutput.- Scripts ignore
.gitkeepin/videos.
Follow this guide for efficient use of the Automatic Translations Project.