SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)
TL;DR
We present the SOS script and SOSControl framework for saliency-aware and precise control of body part orientation and motion timing in text-to-motion generation.
CLICK for full abstract
Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes. We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs. Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.
- ✅ Released model and dataloader code
- ✅ Released model checkpoints and data processing scripts
- ✅ Released code for generating evaluation motion samples
- 🔄 Provide demo script
- 🔄 Detailed instruction on running text-to-motion evaluation scripts in the external repository
-
Clone the repository
git clone https://github.com/asdryau/SOSControl.git cd SOSControl -
Create a conda environment
conda create -n soscontrol python=3.9.13 conda activate soscontrol
-
Install dependencies
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia pip install -r requirements.txt
-
Download
- Download
model_weights.zipanddata.zipfrom HERE
- Download
-
Repository Setup
- Extract both ZIP files and copy the contents into the
SOSControl/directory of the current repository.
- Extract both ZIP files and copy the contents into the
-
File Structure
SOSControl ├── data │ ├── hml3d_motion_data.pkl │ ├── hml3d_split_data.pkl │ └── hml3d_text_data.pkl ├── evaluation │ ├── test_discLP_data.pkl │ └── test_discLP_text.pkl └── model ├── ControlDiffusion/lightning_logs/version_0/checkpoints/last.ckpt ├── ControlPAE/lightning_logs/version_0/checkpoints/last.ckpt ├── Diffusion/lightning_logs/version_0/checkpoints/last.ckpt └── PAE/lightning_logs/version_0/checkpoints/last.ckpt -
Training Data Preprocessing
# process axis-angle and trans into 269-dim motion format python -m processed_data.process_data_format # extract SOS Scripts (before saliency thresholding) python -m processed_data.process_contLP python -m processed_data.process_discLP # process text using CLIP python -m processed_data.process_txtemb
python -m model.PAE.trainpython -m processed_data.process_paecode# train model one by one
python -m model.Diffusion.train
python -m model.ControlDiffusion.train
python -m model.ControlPAE.trainTo generate the evaluation output for our model, execute the following commands:
python -m evaluation.test_diffuse
python -m evaluation.test_optTo run the evaluation for the motion inbetweening task, execute the following commands:
python -m evaluation.evaluation_scriptNote: Please refer to the T2M Repository for details on the text-to-motion evaluation.
We use the SMPL-X Blender add-on to visualize the generated .npz file.
Please register at (https://smpl-x.is.tue.mpg.de), download the SMPL-X for Blender add-on, and follow the provided installation instructions.
Once installed, select Animation -> Add Animation within the SMPL-X sidebar tool, and navigate to the generated .npz file for visualization.
- SMPL/SMPL-X: For human body modeling
- PyTorch3D: For rotation conversion utilities
- HumanML3D Dataset: For motion and text data
- OmniControl: For the HintBlock module in the ControlNet implementation
This project is licensed under the MIT License - see the LICENSE file for details.