Skip to content

Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis [ACMMM 2022]

Notifications You must be signed in to change notification settings

PantoMatrix/DisCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis

Reproduction Based on Relased Model

Train and test DisCo

  1. python == 3.7
  2. build folders like:
    audio2pose
    ├── codes
    │   └── audio2pose
    ├── datasets
    │   ├── trinity
    │   └── s2g
    └── outputs
        └── audio2pose
            ├── custom
            └── wandb   
    
  3. download the framework scripts from BEAT to codes/audio2pose/
  4. run pip install -r requirements.txt in the path ./codes/audio2pose/
  5. download trinity dataset to datasets/trinity
  6. bulid data cache and calculate mean and std by given number of joints, FPS, speakers using /dataloader/preprocessing.ipynb
  7. put disco.py under ./audio2pose/model/ and customize disco_trainer.py for contrastive learning.
  8. run python train.py -c ./configs/disco_trinity_ae.yaml for pretrained_ae for FID calculation.
  9. run python train.py -c ./configs/disco_trinity.yaml for training.
  10. run python test.py -c ./configs/disco_trinity.yaml for inference.
  11. load ./outputs/audio2pose/custom/exp_name/epoch_number/xxx.bvh into blender to visualize the test results.

From data in other dataset (e.g. S2G)

  • refer train and test DisCo for bvh cache
  • set dataset: trinity in .yaml

Citation

DisCo is established for the following research project.

@inproceedings{liu2022disco,
    title={DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis},
    author={Liu, Haiyang and Iwamoto, Naoya and Zhu, Zihao and Li, Zhengqing and Zhou, You and Bozkurt, Elif and Zheng, Bo},
    booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
    pages={3764--3773},
    year={2022}
}

About

Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis [ACMMM 2022]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages