GitHub - JeremyCJM/DiffSHEG: [CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

(CVPR 2024 Official Repo)

Junming Chen^†1,2, Yunfei Liu², Jianan Wang², Ailing Zeng², Yu Li^*2, Qifeng Chen^*1

¹HKUST ²International Digital Economy Academy (IDEA)
^*Corresponding authors ^†Work done during an internship at IDEA

Project Page · Paper · Video

Environment

We have tested on Ubuntu 18.04 and 20.04.

cd assets

Option 1: conda install

conda env create -f environment.yml
conda activate diffsheg

Option 2: pip install

conda create -n "diffsheg" python=3.9
conda activate diffsheg
pip install -r requirements.txt
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Untar data.tar.gz for data statistics

tar zxvf data.tar.gz
mv data ../

Checkpoints

Google Drive

Inference on a Custom Audio

First specify the '--test_audio_path' argument to your test audio path in the following mentioned bash files. Note that the audio should be a .wav file.

Use model trained on BEAT dataset:

bash inference_custom_audio_beat.sh

Use model trained on SHOW dataset:

bash inference_custom_audio_talkshow.sh

Training

Train on BEAT dataset

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4 python -u runner.py \
    --dataset_name beat \
    --name beat_diffsheg \
    --batch_size 2500 \
    --num_epochs 1000 \
    --save_every_e 20 \
    --eval_every_e 40 \
    --n_poses 34 \
    --ddim \
    --multiprocessing-distributed \
    --dist-url 'tcp://127.0.0.1:6666'

Train on SHOW dataset

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4 python -u runner.py \
    --dataset_name talkshow \
    --name talkshow_diffsheg \
    --batch_size 950 \
    --num_epochs 4000 \
    --save_every_e 20 \
    --eval_every_e 40 \
    --n_poses 88 \
    --classifier_free \
    --multiprocessing-distributed \
    --dist-url 'tcp://127.0.0.1:6667' \
    --ddim \
    --max_eval_samples 200

Testing

Test on BEAT dataset

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -u runner.py \
    --dataset_name talkshow \
    --name talkshow_GesExpr_unify_addHubert_encodeHubert_mdlpIncludeX_condRes_LN_ClsFree \
    --PE pe_sinu \
    --n_poses 88 \
    --multiprocessing-distributed \
    --dist-url 'tcp://127.0.0.1:8889' \
    --classifier_free \
    --cond_scale 1.25 \
    --ckpt ckpt_e2599.tar \
    --mode test_arbitrary_len \
    --ddim \
    --timestep_respacing ddim25 \
    --overlap_len 10

Test on SHOW dataset

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -u runner.py \
    --dataset_name talkshow \
    --name talkshow_GesExpr_unify_addHubert_encodeHubert_mdlpIncludeX_condRes_LN_ClsFree \
    --PE pe_sinu \
    --n_poses 88 \
    --multiprocessing-distributed \
    --dist-url 'tcp://127.0.0.1:8889' \
    --classifier_free \
    --cond_scale 1.25 \
    --ckpt ckpt_e2599.tar \
    --mode test_arbitrary_len \
    --ddim \
    --timestep_respacing ddim25 \
    --overlap_len 10

Visualization

After running under the test or test-custom-audio mode, the Gesture and Expression results will be saved in the ./results directory.

BEAT

Open assets/beat_visualize.blend with latest Blender on your local computer.
Specify the audio, BVH (for gesture), JSON (for expression), and video saving path in the transcript in Blender.
(Optional) Click Window --> Toggle System Console to check the visulization progress.
Run the script in Blender.

SHOW

Please refer the the TalkSHOW code for the visualization of our generated motion.

Acknowledgement

Our implementation is partially based on BEAT, TalkSHOW, and MotionDiffuse.

Citation

If you use our code or find this repo useful, please consider cite our paper:

@inproceedings{chen2024diffsheg,
  title     = {DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation},
  author    = {Chen, Junming and Liu, Yunfei and Wang, Jianan and Zeng, Ailing and Li, Yu and Chen, Qifeng},
  booktitle = {CVPR},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
audios		audios
datasets		datasets
models		models
options		options
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference_custom_audio_beat.sh		inference_custom_audio_beat.sh
inference_custom_audio_show.sh		inference_custom_audio_show.sh
runner.py		runner.py
train_test_scripts.sh		train_test_scripts.sh

License

JeremyCJM/DiffSHEG

Folders and files

Latest commit

History

Repository files navigation

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Project Page · Paper · Video

Environment

Checkpoints

Inference on a Custom Audio

Training

Testing

Visualization

BEAT

SHOW

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages