🕺 Reproduced by Ling-Hao Chen and Shunlin Lu (credit also with TMR, SwanHub).
❗️[Highlight]: We provide a demo for the OpenTMA in HumanTOMATO. The demo is supported by the SwanHub engineering team. Hav a try!
OpenTMA is a project that aims to provide a simple and efficient way to align text and motion data. It is designed to be easy to use and flexible, allowing users to align text and motion data in the latent space.
In the HumanTOMATO (ICML 2024) project, we clarify the importance of how to use the text and motion data to generate motions for the first time. We highlight the two method.
- Replace your CLIP text encoder with OpenTMA text encoder.
- Introduce the text-motion alignment supervision to your motion generation model during training.
- [2024/05/12] We release the OpenTMA training and checkpoints.
- Release the OpenTMA training.
- Release the OpenTMA checkpoints.
- Support PyPI (
pip install opentma
).
pip install -r requirements.txt
We provide some pretrained checkpoints of OpenTMA for evaluation. Here are two methods to download the checkpoints. 1) You can download the checkpoints from the Google Drive. 2) You can download the checkpoints from the Baidu Drive (pwd: evan
).
# Load text and motion data
import torch
from transformers import AutoTokenizer, AutoModel
from tma.models.architectures.temos.textencoder.distillbert_actor import DistilbertActorAgnosticEncoder
from tma.models.architectures.temos.motionencoder.actor import ActorAgnosticEncoder
from collections import OrderedDict
modelpath = 'distilbert-base-uncased'
textencoder = DistilbertActorAgnosticEncoder(modelpath, num_layers=4)
motionencoder = ActorAgnosticEncoder(nfeats=126, vae = True, num_layers=4)
"""
load model here
You need to normalize the motion data with mean and std.
For motionx, they are stored in './deps/t2m/motionx/vector_623/Comp_v6_KLD01/meta/*.npy'
"""
motion = torch.randn(1, 64, 126) # B = 1, T = , D = , need normalization
lengths = [64]
print(textencoder(["a man is running"]).loc)
print(motionencoder(motion, lengths).loc)
Our OpenTMA project supports three datasets: HumanML3D, Motion-X, and UniMoCap.
HumanML3D Data Preparation
Please following the instructions in the HumanML3D repository to download and preprocess the data. The data should be stored in the ./datasets/humanml3d
folder. The path tree should look like this:
./OpenTMR/datasets/humanml3d/
├── all.txt
├── Mean.npy
├── new_joints/
├── new_joint_vecs/
├── Std.npy
├── test.txt
├── texts/
├── train.txt
├── train_val.txt
└── val.txt
Motion-X Data Preparation
Please following the instructions in the Motion-X project. And then please follow the HumanTOMATO repository to preprocess the data into tomatao
format. The data should be stored in the ./datasets/Motion-X
folder. The path tree should look like this:
./OpenTMR/datasets/Motion-X
├── mean_std
│ └── vector_623
│ ├── mean.npy
│ └── std.npy
├── motion_data
│ └── vector_623
│ ├── aist/ (subset_*/*.npy)
│ ├── animation/
│ ├── dance/
│ ├── EgoBody/
│ ├── fitness/
│ ├── game_motion/
│ ├── GRAB/
│ ├── HAA500/
│ ├── humanml/
│ ├── humman/
│ ├── idea400/
│ ├── kungfu/
│ ├── music/
│ └── perform/
├── split
│ ├── all.txt
│ ├── test.txt
│ ├── train.txt
│ └── val.txt
└── texts
├── semantic_texts
│ ├── aist/ (subset_*/*.txt)
│ ├── animation/
│ ├── dance/
│ ├── EgoBody/
│ ├── fitness/
│ ├── game_motion/
│ ├── GRAB/
│ ├── HAA500/
│ ├── humanml/
│ ├── humman/
│ ├── idea400/
│ ├── kungfu/
│ ├── music/
└───└── perform/
UniMoCap Data Preparation
Please following the instructions in the UniMoCap repository to download and preprocess the data (HumanML3D, BABEL, and KIT-ML). The data should be stored in the ./datasets/UniMocap
folder. The path tree should look like this:
./OpenTMR/datasets/UniMocap
├── all.txt
├── Mean.npy
├── new_joints/ (*.npy)
├── new_joint_vecs/ (*.npy)
├── Std.npy
├── test.txt
├── texts/ (*.txt)
├── train.txt
├── train_val.txt
└── val.txt
Here, we provide some pre-traind checkpoints for the evaluation. Here are two methods to download the checkpoints:
Google Drive
Download the checkpoints from the Google Drive and put them in the ./deps
folder. Please unzip the checkpoints via the following command:
unzip *.zip
Finally, the path tree should look like this:
./deps
├── distilbert-base-uncased/
├── glove/
├── t2m/
└── transforms/
Baidu Drive
Download the checkpoints from the Baidu Drive (pwd: evan
) and put them in the ./deps
folder. Please unzip the checkpoints via the following command:
tar –xvf deps.tar
Finally, the path tree should look like this:
./deps
├── distilbert-base-uncased/
├── glove/
├── t2m/
└── transforms/
- Training on HumanML3D:
python -m train --cfg configs/configs_temos/H3D-TMR.yaml --cfg_assets configs/assets.yaml --nodebug
- Training on Motion-X:
python -m train --cfg configs/configs_temos/MotionX-TMR.yaml --cfg_assets configs/assets.yaml --nodebug
- Training on UniMoCap:
python -m train --cfg configs/configs_temos/UniMoCap-TMR.yaml --cfg_assets configs/assets.yaml --nodebug
The checkpoints will be saved in the ./experiments/
. If you would like to the debug mode, please remove the --nodebug
flag. The best checkpoints often appear in the 100-500th epoch.
Before running the code below, please revise the retreival.sh
(like path1
variable) file to set the correct path for the data. This command should be used after training. It will evaluate the performance of the model on the test set with text and motion embeddings.
bash retreival.sh
The result will be in a markdown table format.
If you use this repository for research, you need to cite:
@article{humantomato,
title={HumanTOMATO: Text-aligned Whole-body Motion Generation},
author={Lu, Shunlin and Chen, Ling-Hao and Zeng, Ailing and Lin, Jing and Zhang, Ruimao and Zhang, Lei and Shum, Heung-Yeung},
journal={arxiv:2310.12978},
year={2023}
}
@article{chen2023unimocap,
title={UniMocap: Unifier for BABEL, HumanML3D, and KIT},
author={Chen, Ling-Hao and UniMocap, Contributors},
journal={https://github.com/LinghaoChan/UniMoCap},
year={2023}
}
@inproceedings{petrovich23tmr,
title = {{TMR}: Text-to-Motion Retrieval Using Contrastive {3D} Human Motion Synthesis},
author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
booktitle = {International Conference on Computer Vision ({ICCV})},
year = {2023}
}
@InProceedings{Guo_2022_CVPR,
author = {Guo, Chuan and Zou, Shihao and Zuo, Xinxin and Wang, Sen and Ji, Wei and Li, Xingyu and Cheng, Li},
title = {Generating Diverse and Natural 3D Human Motions From Text},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {5152-5161}
}
@conference{AMASS2019,
title = {AMASS: Archive of Motion Capture as Surface Shapes},
author = {Mahmood, Naureen and Ghorbani, Nima and Troje, Nikolaus F. and Pons-Moll, Gerard and Black, Michael J.},
booktitle = {International Conference on Computer Vision},
pages = {5442--5451},
month = oct,
year = {2019},
month_numeric = {10}
}
If you have any question, please contact Ling-Hao Chen (thu [DOT] lhchen [AT] gmail [DOT] com) and Shunlin Lu (shunilnlu0803 [AT] gmail [DOT] com).