Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.
I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!
Currently, translation support is enabled only from English 🇬🇧!
- Clone this repo
- Install conda
- Create environment with Python 3.10 (for macOS refer to link)
- Activate environment
- Install requirements:
cd path_to_project sh install.sh
- In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
- Download weights from drive, unzip downloaded file into weights folder
- Install ffmpeg
Key | Description |
---|---|
DET_TRESH | Face detection treshtold [0.0:1.0] |
DIST_TRESH | Face embeddings distance treshtold [0.0:1.0] |
HF_TOKEN | Your HuggingFace token (see Installation) |
USE_ENHANCER | Do we need to improve faces using GFPGAN? |
ADD_SUBTITLES | Subtitles in the output video |
English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)
- Activate your environment:
conda activate your_env_name
- Сd to project path:
cd path_to_project
At the root of the project there is a translate script that translates the video you set.
- video_filename - the filename of your input video (.mp4)
- output_language - the language to be translated into. Provided here (you can also find it in my code)
- output_filename - the filename of output video (.mp4)
python translate.py video_filename output_language -o output_filename
I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person.
- voice_filename - the filename of your speech (.wav)
- video_filename - the filename of your input video (.mp4)
- output_filename - the filename of output video (.mp4)
python speech_changer.py voice_filename video_filename -o output_filename
- Detecting scenes (PySceneDetect)
- Face detection (yolov8-face)
- Reidentification (deepface)
- Speech enhancement (MDXNet)
- Speakers transcriptions and diarization (whisperX)
- Text translation (googletrans)
- Voice cloning (TTS)
- Lip sync (lipsync)
- Face restoration (GFPGAN)
- [Need to fix] Search for talking faces, determining what this person is saying
Note that this example was created without GFPGAN usage!
Destination language | Source video | Output video |
---|---|---|
🇷🇺 (Russian) |
- Fully GPU support
- Multithreading support (optimizations)
- Detecting talking faces (improvement)
- Tested on macOS
⚠️ The project is under development!