Wav2Lip - Installation and Setup Guide

Overview

Wav2Lip is a state-of-the-art lip-sync generation system that accurately synchronizes lip movements with audio in videos. This guide provides comprehensive installation and setup instructions for both the open-source version and the commercial API.

System Requirements

Hardware Requirements

GPU: NVIDIA GPU with 4GB+ VRAM (CUDA compatible)
RAM: Minimum 8GB, recommended 16GB+
Storage: At least 5GB free space for models and dependencies

Software Requirements

Python: 3.6-3.8 (3.6 recommended for compatibility)
CUDA: 10.1+ (for GPU acceleration)
FFmpeg: Essential for video/audio processing
Git: For cloning the repository

Installation Options

Option 1: Commercial API (Recommended for Production)

The commercial version offers higher quality and easier setup through Sync.so API.

Step 1: Get API Key

Visit Sync.so Dashboard
Create an account and generate your API key
Note your API key for later use

Step 2: Install SDK

Python SDK:

pip install syncsdk

TypeScript SDK:

npm i @sync.so/sdk

Step 3: Quick Start Example

Python Example:

# quickstart.py
import time
from sync import Sync
from sync.common import Audio, GenerationOptions, Video
from sync.core.api_error import ApiError

# Replace with your API key
api_key = "YOUR_API_KEY_HERE"

# Input URLs (or use local files)
video_url = "https://assets.sync.so/docs/example-video.mp4"
audio_url = "https://assets.sync.so/docs/example-audio.wav"

client = Sync(
    base_url="https://api.sync.so", 
    api_key=api_key
).generations

print("Starting lip sync generation...")

try:
    response = client.create(
        input=[Video(url=video_url), Audio(url=audio_url)],
        model="lipsync-2",
        options=GenerationOptions(sync_mode="cut_off"),
        outputFileName="quickstart"
    )
except ApiError as e:
    print(f'Generation failed: {e.status_code} - {e.body}')
    exit()

job_id = response.id
print(f"Job submitted: {job_id}")

# Poll for completion
generation = client.get(job_id)
while generation.status not in ['COMPLETED', 'FAILED']:
    print('Polling status...')
    time.sleep(10)
    generation = client.get(job_id)

if generation.status == 'COMPLETED':
    print(f'Success! Output: {generation.output_url}')
else:
    print('Generation failed')

Run the example:

python quickstart.py

Option 2: Open-Source Version (Research/Personal Use)

Step 1: Environment Setup

# Clone the repository
git clone https://github.com/Rudrabha/Wav2Lip.git
cd Wav2Lip

# Create conda environment
conda create -n wav2lip python=3.6 -y
conda activate wav2lip

# Install PyTorch (adjust CUDA version as needed)
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 --extra-index-url https://download.pytorch.org/whl/cu101

# Install FFmpeg (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install ffmpeg

# Install Python dependencies
pip install -r requirements.txt

Step 2: Download Required Models

Face Detection Model:

# Download face detection model
mkdir -p face_detection/detection/sfd/
wget -O face_detection/detection/sfd/s3fd.pth https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth

# Alternative link if above fails
wget -O face_detection/detection/sfd/s3fd.pth https://iiitaphyd-my.sharepoint.com/:u:/g/personal/prajwal_k_research_iiit_ac_in/EZsy6qWuivtDnANIG73iHjIBjMSoojcIV0NULXV-yiuiIg?e=qTasa8

Wav2Lip Models:

Choose one of the following:

Wav2Lip (High Accuracy)

# Download from Google Drive
# Link: https://drive.google.com/drive/folders/153HLrqlBNxzZcHi17PEvP09kkAfzRshM?usp=share_link

# Manual download and place in checkpoints/
mkdir -p checkpoints
# Download wav2lip_gan.pth to checkpoints/

Wav2Lip + GAN (Better Visual Quality)

# Download from Google Drive
# Link: https://drive.google.com/file/d/15G3U08c8xsCkOqQxE38Z2XXDnPcOptNk/view?usp=share_link

# Manual download and place in checkpoints/
mkdir -p checkpoints
# Download wav2lip.pth to checkpoints/

Step 3: Verify Installation

Create a test script:

# test_wav2lip.py
import os
import torch
import cv2

def test_installation():
    try:
        # Check PyTorch
        print(f"PyTorch version: {torch.__version__}")
        print(f"CUDA available: {torch.cuda.is_available()}")
        
        # Check OpenCV
        print(f"OpenCV version: {cv2.__version__}")
        
        # Check required files
        required_files = [
            'face_detection/detection/sfd/s3fd.pth',
            'checkpoints/wav2lip_gan.pth'  # or wav2lip.pth
        ]
        
        for file in required_files:
            if os.path.exists(file):
                print(f"✓ {file} exists")
            else:
                print(f"✗ {file} missing")
        
        # Test imports
        from Wav2Lip import inference
        print("✓ Wav2Lip imports successful")
        
        print("Installation test completed!")
        
    except Exception as e:
        print(f"Error during test: {e}")

if __name__ == "__main__":
    test_installation()

Run the test:

python test_wav2lip.py

Usage Instructions

Open-Source Version Usage

Basic Inference

python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face input_video.mp4 \
    --audio input_audio.wav \
    --output_file result.mp4

Advanced Parameters

python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face input_video.mp4 \
    --audio input_audio.wav \
    --output_file result.mp4 \
    --pads 0 20 0 0 \
    --resize_factor 1 \
    --nosmooth \
    --wav2lip_batch_size 128

Parameter Descriptions

Parameter	Description	Default
`--checkpoint_path`	Path to model checkpoint	Required
`--face`	Input video path	Required
`--audio`	Input audio path	Required
`--output_file`	Output video path	`results/result_voice.mp4`
`--pads`	Face padding [top,bottom,left,right]	`0 10 0 0`
`--resize_factor`	Downsample factor	`1`
`--nosmooth`	Disable face detection smoothing	`False`
`--wav2lip_batch_size`	Batch size for processing	`128`

Commercial API Usage

Local File Processing

from sync import Sync
from sync.common import Audio, GenerationOptions, Video

client = Sync(api_key="YOUR_API_KEY").generations

# Upload local files
response = client.create(
    input=[
        Video(file_path="local_video.mp4"),
        Audio(file_path="local_audio.wav")
    ],
    model="lipsync-2",
    options=GenerationOptions(sync_mode="cut_off")
)

Batch Processing

import asyncio
from sync import Sync

async def batch_process(video_audio_pairs):
    client = Sync(api_key="YOUR_API_KEY").generations
    jobs = []
    
    for video, audio in video_audio_pairs:
        job = await client.create_async(
            input=[Video(file_path=video), Audio(file_path=audio)],
            model="lipsync-2"
        )
        jobs.append(job)
    
    # Wait for all jobs to complete
    results = await asyncio.gather(*[client.get_async(job.id) for job in jobs])
    return results

Configuration and Optimization

GPU Optimization

# Set GPU memory usage
export CUDA_VISIBLE_DEVICES=0

# Use mixed precision for faster processing
# (modify inference script if needed)

Quality Improvement Tips

Face Padding Adjustment

# Increase bottom padding to include chin
python inference.py --pads 0 20 0 0 ...

Disable Smoothing for Artifacts

# Use if you see multiple mouths or artifacts
python inference.py --nosmooth ...

Resize Factor for Performance

# Lower resolution for faster processing
python inference.py --resize_factor 2 ...

Audio Processing

# Convert audio to required format
ffmpeg -i input.mp3 -ar 16000 -ac 1 input.wav

# Extract audio from video
ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav

Troubleshooting

Common Issues

CUDA Out of Memory

# Solution: Reduce batch size
python inference.py --wav2lip_batch_size 64 ...

# Solution: Use CPU for some steps
# Solution: Reduce input video resolution

Face Detection Issues

# Solution: Adjust padding
python inference.py --pads 10 30 10 10 ...

# Solution: Disable smoothing
python inference.py --nosmooth ...

Audio Sync Problems

# Solution: Check audio format
ffmpeg -i audio.wav

# Solution: Ensure correct sample rate
ffmpeg -i input.mp3 -ar 16000 output.wav

Model Loading Errors

# Solution: Verify model paths
ls -la checkpoints/
ls -la face_detection/detection/sfd/

# Solution: Check file permissions
chmod 644 checkpoints/*.pth

Performance Optimization

# Enable torch.compile for PyTorch 2.0+
import torch
if hasattr(torch, 'compile'):
    model = torch.compile(model)

File Formats and Requirements

Input Video Requirements

Formats: MP4, AVI, MOV
Codecs: H.264 recommended
Resolution: Any resolution (will be resized internally)
Frame Rate: 25-30 FPS recommended

Input Audio Requirements

Formats: WAV, MP3, M4A
Sample Rate: 16kHz recommended
Channels: Mono preferred
Duration: Any length supported

Output Specifications

Format: MP4
Codec: H.264
Resolution: Matches input (or resized based on model)
Frame Rate: Matches input

Directory Structure

Wav2Lip/
├── checkpoints/              # Model weights
│   ├── wav2lip.pth         # High accuracy model
│   └── wav2lip_gan.pth     # GAN enhanced model
├── face_detection/          # Face detection module
│   └── detection/
│       └── sfd/
│           └── s3fd.pth     # Face detection model
├── Wav2Lip/                # Main package
├── evaluation/             # Evaluation scripts
├── filelists/             # Dataset filelists
├── inference.py           # Main inference script
├── requirements.txt       # Dependencies
└── README.md             # This file

Advanced Setup

Docker Installation

# Dockerfile
FROM pytorch/pytorch:1.7.1-cuda10.1-cudnn7-runtime

RUN apt-get update && apt-get install -y ffmpeg git
RUN git clone https://github.com/Rudrabha/Wav2Lip.git
WORKDIR /Wav2Lip
RUN pip install -r requirements.txt
# Download models here...

CMD ["python", "inference.py", "--help"]

Training Setup (Advanced)

For training on custom datasets:

# Preprocess dataset
python preprocess.py --data_root /path/to/data --preprocessed_root /path/to/preprocessed

# Train expert discriminator
python color_syncnet_train.py --data_root /path/to/preprocessed --checkpoint_dir /path/to/checkpoints

# Train Wav2Lip model
python wav2lip_train.py --data_root /path/to/preprocessed --checkpoint_dir /path/to/checkpoints --syncnet_checkpoint_path /path/to/expert/checkpoint

Next Steps

Test with Examples: Try provided example videos and audio
Quality Tuning: Experiment with parameters for best results
Batch Processing: Set up automated workflows
API Integration: Use commercial API for production applications

Support and Resources

Commercial Support: contact@sync.so
Open Source Issues: GitHub Issues
Documentation: Repository Wiki
Research Paper: ACM Multimedia 2020

License Information

Open Source Version: Research/Personal use only
Commercial Version: Full commercial license available
Attribution: Cite the original paper if used in research

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
checkpoints		checkpoints
evaluation		evaluation
face_detection		face_detection
filelists		filelists
models		models
results		results
temp		temp
.gitignore		.gitignore
README.md		README.md
audio.py		audio.py
color_syncnet_train.py		color_syncnet_train.py
hparams.py		hparams.py
hq_wav2lip_train.py		hq_wav2lip_train.py
inference.py		inference.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
wav2lip_train.py		wav2lip_train.py

Folders and files

Latest commit

History

Repository files navigation

Wav2Lip - Installation and Setup Guide

Overview

System Requirements

Hardware Requirements

Software Requirements

Installation Options

Option 1: Commercial API (Recommended for Production)

Step 1: Get API Key

Step 2: Install SDK

Step 3: Quick Start Example

Option 2: Open-Source Version (Research/Personal Use)

Step 1: Environment Setup

Step 2: Download Required Models

Step 3: Verify Installation

Usage Instructions

Open-Source Version Usage

Basic Inference

Advanced Parameters

Parameter Descriptions

Commercial API Usage

Local File Processing

Batch Processing

Configuration and Optimization

GPU Optimization

Quality Improvement Tips

Audio Processing

Troubleshooting

Common Issues

Performance Optimization

File Formats and Requirements

Input Video Requirements

Input Audio Requirements

Output Specifications

Directory Structure

Advanced Setup

Docker Installation

Training Setup (Advanced)

Next Steps

Support and Resources

License Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages