Skip to content

SheepYang1993/HeyGenClone

 
 

Repository files navigation

HeyGenClone

Welcome to HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English 🇬🇧!

Installation 🥸

  • Clone this repo
  • Install conda
  • Create environment with Python 3.10 (for MacOS refer to link)
  • Activate environment
  • Install requirements:
    cd path_to_project
    python install.py
    
  • In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
  • Download weights from drive, unzip downloaded file into weights folder
  • Install ffmpeg

Configurations (config.json) 🧙‍♂️

Key Description Can modify
LANGUAGES_URL Url for getting available languages
DET_TRESH Face detection treshtold [0.0:1.0]
DIST_TRESH Face embeddings distance treshtold [0.0:1.0]
DB_NAME Name of the database for data storage
HF_TOKEN Your HuggingFace token (see Installation)

Usage 🤩

  • Activate your environment:
  conda activate your_env_name
  • Сd to project path:
  cd path_to_project

At the root of the project there is a translate script that translates the video you set.

  • video_filename - the filename of your input video (.mp4)
  • output_language - the code of the language to be translated into (you can find it here)
  • output_filename - the filename of output video (.mp4)
python translate.py video_filename output_language -o output_filename

I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person.

  • voice_filename - the filename of your speech (.wav)
  • video_filename - the filename of your input video (.mp4)
  • output_filename - the filename of output video (.mp4)
python speech_changer.py voice_filename video_filename -o output_filename

How it works 😱

  1. Detecting scenes (PySceneDetect)
  2. Face detection (yolov8-face)
  3. Reidentification (deepface)
  4. Speech enhancement (MDXNet)
  5. Speakers transcriptions and diarization (whisperX)
  6. Text translation (googletrans)
  7. Voice cloning (TTS)
  8. Lip sync (lipsync)
  9. Face restoration (GFPGAN)
  10. [Need to fix] Search for talking faces, determining what this person is saying

Translation results 🥺

Note that this example was created without GFPGAN usage!

Destination language Source video Output video
🇷🇺 (Russian) Watch the video Watch the video

Contributing 🫵🏻

Contributions are welcomed! I am very glad that so many people are interested in my project. I will be happy to see the pull requests. In the future, all contributors will be included in the list that will be displayed here!

To-Do List 🤷🏼‍♂️

  • Fully GPU support
  • Multithreading support (optimizations)
  • Detecting talking faces (improvement)

Other 🤘🏻

  • Tested on macOS
  • ⚠️ The project is under development!

About

A simple and open-source analogue of the HeyGen system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%