[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)

### SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

[arxiv](https://arxiv.org/abs/2211.12194) | [project](https://sadtalker.github.io) | [Github](https://github.com/Winfredy/SadTalker)

Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, Fei Wang.

Xi'an Jiaotong University, Tencent AI Lab, Ant Group

CVPR 2023

TL;DR: A realistic and stylized talking head video generation method from a single image and audio


Installation (around 5 mins)

In [None]:
### make sure that CUDA is available in Edit -> Nootbook settings -> GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

In [None]:
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1
!sudo apt install python3.8

!sudo apt-get install python3.8-distutils

!python --version

!apt-get update

!apt install software-properties-common

!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel

!apt-get install python3-pip

print('Git clone project and install requirements...')
!git clone https://github.com/Winfredy/SadTalker &> /dev/null
%cd SadTalker
!export PYTHONPATH=/content/SadTalker:$PYTHONPATH
!python3.8 -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null
!python3.8 -m pip install -r requirements.txt

Download models (1 mins)

In [None]:
print('Download pre-trained models...')
!rm -rf checkpoints
!bash scripts/download_models.sh

In [None]:
# borrow from makeittalk
import ipywidgets as widgets
import glob
import matplotlib.pyplot as plt
print("Choose the image name to animate: (saved in folder 'examples/')")
img_list = glob.glob1('examples/source_image', '*.png')
img_list.sort()
img_list = [item.split('.')[0] for item in img_list]
default_head_name = widgets.Dropdown(options=img_list, value='full3')
def on_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        plt.imshow(plt.imread('examples/source_image/{}.png'.format(default_head_name.value)))
        plt.axis('off')
        plt.show()
default_head_name.observe(on_change)
display(default_head_name)
plt.imshow(plt.imread('examples/source_image/{}.png'.format(default_head_name.value)))
plt.axis('off')
plt.show()

Animation

In [None]:
# selected audio from exmaple/driven_audio
img = 'examples/source_image/{}.png'.format(default_head_name.value)
print(img)
!python3.8 inference.py --driven_audio ./examples/driven_audio/RD_Radio31_000.wav \
           --source_image {img} \
           --result_dir ./results --still --preprocess full --enhancer gfpgan

In [None]:
# visualize code from makeittalk
from IPython.display import HTML
from base64 import b64encode
import os, sys

# get the last from results

results = sorted(os.listdir('./results/'))

mp4_name = glob.glob('./results/*.mp4')[0]

mp4 = open('{}'.format(mp4_name),'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

print('Display animation: {}'.format(mp4_name), file=sys.stderr)
display(HTML("""
  <video width=256 controls>
        <source src="%s" type="video/mp4">
  </video>
  """ % data_url))


In [2]:
# Install PyTorch with CUDA 11.3 support
!pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

# Clone the SadTalker repository
!git clone https://github.com/OpenTalker/SadTalker.git
%cd SadTalker

# Install requirements
!pip install -r requirements.txt

# Fix NumPy warning issue
!sed -i 's/np.VisibleDeprecationWarning/DeprecationWarning/' src/face3d/util/preprocess.py

# Download pre-trained models
!bash scripts/download_models.sh

# Upload video file (will use the first frame as source image)
from google.colab import files
print("Upload your video file:")
video_upload = files.upload()
video_file = list(video_upload.keys())[0]

# Upload audio file
print("Upload your audio file:")
audio_upload = files.upload()
audio_file = list(audio_upload.keys())[0]

# Run the inference script
!python inference.py --driven_audio "{audio_file}" --source_image "{video_file}" --enhancer gfpgan

# List the results
!ls results

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting

--2025-04-29 12:51:22--  https://github.com/OpenTalker/SadTalker/releases/download/v0.0.2-rc/mapping_00109-model.pth.tar
Resolving github.com (github.com)... 140.82.116.3
Connecting to github.com (github.com)|140.82.116.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/569518584/ccc415aa-c6f4-47ee-8250-b10bf440ba62?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250429%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250429T125122Z&X-Amz-Expires=300&X-Amz-Signature=f4c687a63423f5929c8068de95ef9bc994f301264af5ee3571f6286ae15936f5&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dmapping_00109-model.pth.tar&response-content-type=application%2Foctet-stream [following]
--2025-04-29 12:51:22--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/569518584/ccc415aa-c6f4-47ee-8250-b10bf440ba62?X-Amz-Algorit

Saving video-shardul.mp4 to video-shardul.mp4
Upload your audio file:


Saving harvard.wav to harvard.wav
Traceback (most recent call last):
  File "/content/SadTalker/SadTalker/inference.py", line 10, in <module>
    from src.facerender.animate import AnimateFromCoeff
  File "/content/SadTalker/SadTalker/src/facerender/animate.py", line 23, in <module>
    from src.utils.face_enhancer import enhancer_generator_with_len, enhancer_list
  File "/content/SadTalker/SadTalker/src/utils/face_enhancer.py", line 4, in <module>
    from gfpgan import GFPGANer
  File "/usr/local/lib/python3.11/dist-packages/gfpgan/__init__.py", line 2, in <module>
    from .archs import *
  File "/usr/local/lib/python3.11/dist-packages/gfpgan/archs/__init__.py", line 2, in <module>
    from basicsr.utils import scandir
  File "/usr/local/lib/python3.11/dist-packages/basicsr/__init__.py", line 4, in <module>
    from .data import *
  File "/usr/local/lib/python3.11/dist-packages/basicsr/data/__init__.py", line 22, in <module>
    _dataset_modules = [importlib.import_module(f'basicsr.

In [8]:
# Upload video file (will use the first frame as source image)
from google.colab import files
print("Upload your video file:")
video_upload = files.upload()
video_file = list(video_upload.keys())[0]

# Upload audio file
print("Upload your audio file:")
audio_upload = files.upload()
audio_file = list(audio_upload.keys())[0]

# Run the inference script
!python inference.py --driven_audio "{audio_file}" --source_image "{video_file}" --preprocess full --still --enhancer gfpgan --size 512

# List the results
!ls results

Upload your video file:


Saving video-shardul.mp4 to video-shardul (4).mp4
Upload your audio file:


Saving harvard.wav to harvard (4).wav
using safetensor as default
3DMM Extraction for source image
landmark Det:: 100% 1/1 [00:00<00:00,  8.59it/s]
3DMM Extraction In Video:: 100% 1/1 [00:00<00:00, 25.23it/s]
mel:: 100% 458/458 [00:00<00:00, 46436.65it/s]
audio2exp:: 100% 46/46 [00:00<00:00, 268.86it/s]
Face Renderer:: 100% 229/229 [10:06<00:00,  2.65s/it]
The generated video is named ./results/2025_04_29_14.01.22/video-shardul (4)##harvard (4).mp4
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100% 458/458 [00:22<00:00, 20.54it/s]
The generated video is named ./results/2025_04_29_14.01.22/video-shardul (4)##harvard (4)_full.mp4
face enhancer....
Face Enhancer:: 100% 458/458 [04:45<00:00,  1.61it/s]
The generated video is named ./results/2025_04_29_14.01.22/video-shardul (4)##harvard (4)_enhanced.mp4
The generated video is named: ./results/2025_04_29_14

In [5]:


# Install ffmpeg (fixes the video-saving issue)
!apt-get install -y ffmpeg

# Update imageio to the latest version
!pip install --upgrade imageio

# Patch basicsr to fix a known import issue (if still relevant)
!sed -i 's/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms.functional import rgb_to_grayscale/' /usr/local/lib/python3.11/dist-packages/basicsr/data/degradations.py

# Fix NumPy warning (if present)
!sed -i 's/np.VisibleDeprecationWarning/DeprecationWarning/' src/face3d/util/preprocess.py



Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.
Collecting imageio
  Downloading imageio-2.37.0-py3-none-any.whl.metadata (5.2 kB)
Downloading imageio-2.37.0-py3-none-any.whl (315 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.8/315.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: imageio
  Attempting uninstall: imageio
    Found existing installation: imageio 2.19.3
    Uninstalling imageio-2.19.3:
      Successfully uninstalled imageio-2.19.3
Successfully installed imageio-2.37.0


In [3]:
# Patch basicsr to fix the import statement
!sed -i 's/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms.functional import rgb_to_grayscale/' /usr/local/lib/python3.11/dist-packages/basicsr/data/degradations.py

# Fix NumPy warning issue (if present in preprocess.py)
!sed -i 's/np.VisibleDeprecationWarning/DeprecationWarning/' src/face3d/util/preprocess.py

# Download pre-trained models
!bash scripts/download_models.sh

mkdir: cannot create directory ‘./checkpoints’: File exists
File ‘./checkpoints/mapping_00109-model.pth.tar’ already there; not retrieving.
File ‘./checkpoints/mapping_00229-model.pth.tar’ already there; not retrieving.
File ‘./checkpoints/SadTalker_V0.0.2_256.safetensors’ already there; not retrieving.
File ‘./checkpoints/SadTalker_V0.0.2_512.safetensors’ already there; not retrieving.
File ‘./gfpgan/weights/alignment_WFLW_4HG.pth’ already there; not retrieving.
File ‘./gfpgan/weights/detection_Resnet50_Final.pth’ already there; not retrieving.
File ‘./gfpgan/weights/GFPGANv1.4.pth’ already there; not retrieving.
File ‘./gfpgan/weights/parsing_parsenet.pth’ already there; not retrieving.
