Skip to content

A curated list of 'Talking Head Generation' resources. Features influential papers, groundbreaking algorithms, crucial GitHub repositories, insightful videos, and more. Ideal for AI enthusiasts, researchers, and graphics professionals

License

Curated-Awesome-Lists/awesome-ai-talking-heads

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Awesome talking-head

Welcome to the Awesome List for Talking Head Generation! This curated collection of resources focuses on the intriguing domain of 'Talking Head Generation' - an area of computer graphics and artificial intelligence that strives to create lifelike digital recreations of human heads and faces. These 'talking heads' can be used in a variety of applications, from realistic video content and virtual reality, to advanced communication tools and beyond. This list aims to gather key research papers, state-of-the-art algorithms, seminal GitHub repositories, educational videos, inspiring blogs, and more. Whether you are an AI researcher, computer graphics professional, or an AI enthusiast, this list is your one-stop destination to dive into the world of Talking Head Generation. Happy exploring!

Table of Contents

GitHub projects

  • AudioGPT : Understanding and Generating Speech, Music, Sound, and Talking Head. 🗣️🎵
  • SadTalker : Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. 🎭🎶
  • Thin-Plate-Spline-Motion-Model : Thin-Plate Spline Motion Model for Image Animation. 🖼️
  • GeneFace : Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code. 👤💬
  • CVPR2022-DaGAN : Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation. 👥📹
  • sd-wav2lip-uhq : Wav2Lip UHQ extension for Automatic. 👄
  • Text2Video : ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary". 🔤🎞️
  • OTAvatar : This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023]. 👤🎭
  • Audio2Head : Code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021. 🗣️👤
  • IP_LAP : CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors. 🔥🤖
  • Wunjo AI : Synthesize & clone voices in English, Russian & Chinese, real-time speech recognition, deepfake face & lips animation, face swap with one photo, change video by text prompts, segmentation, and retouching. Open-source, local & free. 🗣️👤💬
  • LIHQ : Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter). 🎙️👤
  • Co-Speech-Motion-Generation : Freeform Body Motion Generation from Speech. 🗣️🚶
  • Neural Head Reenactment with Latent Pose Descriptors : The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper. 🤖👤
  • NED : PyTorch implementation for NED (CVPR 2022). It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles. 😃🎭🎥
  • WACV23_TSNet : The pytorch implementation of our WACV23 paper "Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis". 🎬✨
  • ICCV2023-MCNET : The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation. 🎥🤖
  • Speech2Video : Code for ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses". 🗣️🎥💃
  • StyleLipSync : Official pytorch implementation of "StyleLipSync: Style-based Personalized Lip-sync Video Generation". 💋🎥

Articles & Blogs

  • How to Create Fake Talking Head Videos With Deep Learning (Code Tutorial): An article explaining the process of generating fake talking head videos using deep learning techniques.
  • AudioGPT: Understanding and Generating Speech, Music, Sound: A research paper introducing AudioGPT, a multi-modal AI system that can process complex audio information and understand and generate speech, music, sound, and talking head content.
  • Text-based Editing of Talking-head Video: An academic publication discussing the editing of talking-head videos using text-based instructions.
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head: A research paper presenting a system capable of learning personalized talking head models from just a few image views of a person, using adversarial training techniques.
  • DisCoHead: Audio-and-Video-Driven Talking Head Generation: A paper describing DisCoHead, a method that disentangles and controls head pose and facial expressions in talking head generation, without supervision.
  • Microsoft's 3D Photo Realistic Talking Head: A blog post showcasing Microsoft's 3D talking head technology, which combines photorealistic video with a 3D mesh model.
  • Depth-Aware Generative Adversarial Network for Talking Head: A research paper proposing a GAN-based approach that leverages dense 3D facial geometry to generate realistic and accurate talking head videos.
  • Talking-head Generation with Rhythmic Head Motion: This article presents a method for generating realistic talking-head videos with natural head movements, addressing the challenge of generating lip-synced videos while incorporating natural head motion. The proposed approach utilizes a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module, resulting in controllable and photo-realistic talking-head videos with natural head movements.
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis: This article introduces a novel approach for few-shot talking-head synthesis by factorizing the representation of a subject into its spatial and style components. The proposed method predicts a dense spatial layout for the target image and utilizes it for synthesizing the target frame, achieving improved preservation of the subject's identity in the source images.
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head: This article proposes the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner. The approach utilizes lightweight adaptations to enable precise and realistic emotion controls, achieving state-of-the-art performance on widely-used benchmarks.
  • High-Fidelity and Freely Controllable Talking Head Video Generation: This article addresses the challenges faced by current methods in generating high-quality and controllable talking-head videos. It introduces a novel model that leverages self-supervised learned landmarks and 3D face model-based landmarks to model the motion, along with a motion-aware multi-scale feature alignment module. The proposed method produces high-fidelity talking-head videos with free control over head pose and expression.
  • Implicit Identity Representation Conditioned Memory Compensation: This article proposes a global facial representation space and a novel implicit identity representation conditioned memory compensation network for high-fidelity talking head generation. The network module learns a unified spatial facial meta-memory bank, which compensates warped source facial features to overcome limitations due to complex motions in the driving video, resulting in improved generation quality.
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head: This article focuses on the task of avatar fingerprinting, which verifies the trustworthiness of rendered talking-head videos. It proposes an embedding that groups the motion signatures of one identity together, allowing the identification of synthetic videos using the appearance of a specific individual driving the expressions.
  • Style Transfer for 2D Talking Head Animation: This article presents a method for generating talking head animation with learnable style references. It reconstructs 2D talking head animation based on a single input image and an audio stream, utilizing facial landmarks motion, style-pattern construction, and a style-aware image generator. The method achieves better results than recent state-of-the-art methods in generating photo-realistic and fidelity 2D animation.
  • One-Shot Free-View Neural Talking-Head Synthesis for Video: This article proposes a neural talking-head video synthesis model that learns to synthesize videos using a source image containing the target person's appearance and a driving video for motion. The model achieves high visual quality and bandwidth efficiency, outperforming competing methods on benchmark datasets.
  • Progressive Disentangled Representation Learning for Fine: This article presents a one-shot talking head synthesis method that achieves disentangled control over lip motion, eye gaze & blink, head pose, and emotional expression. It utilizes a progressive disentangled representation learning strategy to isolate each motion factor, allowing for fine-grained control and high-quality speech and lip-motion synchronization.
  • VideoReTalking: Audio-based Lip Synchronization for Talking Head: This article introduces VideoReTalking, a system for editing real-world talking head videos according to input audio. It disentangles the editing task into face video generation, audio-driven lip-sync, and face enhancement, ultimately producing a high-quality and lip-syncing output video. The system utilizes learning-based approaches in a sequential pipeline, without requiring user intervention.

Online Courses

Research Papers

Tools & Software

Slides & Presentations


This initial version of the Awesome List was generated with the help of the Awesome List Generator. It's an open-source Python package that uses the power of GPT models to automatically curate and generate starting points for resource lists related to a specific topic.

About

A curated list of 'Talking Head Generation' resources. Features influential papers, groundbreaking algorithms, crucial GitHub repositories, insightful videos, and more. Ideal for AI enthusiasts, researchers, and graphics professionals

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published