SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

🎯 Abstract

TL;DR

We present the SOS script and SOSControl framework for saliency-aware and precise control of body part orientation and motion timing in text-to-motion generation.

CLICK for full abstract

Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes. We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs. Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.

📋 TODO

✅ Released model and dataloader code
✅ Released model checkpoints and data processing scripts
✅ Released code for generating evaluation motion samples
🔄 Provide demo script
🔄 Detailed instruction on running text-to-motion evaluation scripts in the external repository

🔮 Setup

Environment Setup

Clone the repository

git clone https://github.com/asdryau/SOSControl.git
cd SOSControl

Create a conda environment

conda create -n soscontrol python=3.9.13
conda activate soscontrol

Install dependencies

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install -r requirements.txt

Dataset and Pretrained Model

Download
- Download model_weights.zip and data.zip from HERE
Repository Setup
- Extract both ZIP files and copy the contents into the SOSControl/ directory of the current repository.

File Structure

SOSControl
├── data
│   ├──  hml3d_motion_data.pkl
│   ├──  hml3d_split_data.pkl
│   └──  hml3d_text_data.pkl
├── evaluation
│   ├──  test_discLP_data.pkl
│   └──  test_discLP_text.pkl
└── model
    ├──  ControlDiffusion/lightning_logs/version_0/checkpoints/last.ckpt
    ├──  ControlPAE/lightning_logs/version_0/checkpoints/last.ckpt
    ├──  Diffusion/lightning_logs/version_0/checkpoints/last.ckpt
    └──  PAE/lightning_logs/version_0/checkpoints/last.ckpt

Training Data Preprocessing

# process axis-angle and trans into 269-dim motion format
python -m processed_data.process_data_format

# extract SOS Scripts (before saliency thresholding)
python -m processed_data.process_contLP
python -m processed_data.process_discLP

# process text using CLIP
python -m processed_data.process_txtemb

🔧 Training

1. Train ACTOR-PAE

python -m model.PAE.train

2. Encode Training Data into Periodic Latent

python -m processed_data.process_paecode

2. Train Diffusion Model and ControlNets

# train model one by one
python -m model.Diffusion.train
python -m model.ControlDiffusion.train
python -m model.ControlPAE.train

📈 Evaluation

To generate the evaluation output for our model, execute the following commands:

python -m evaluation.test_diffuse
python -m evaluation.test_opt

To run the evaluation for the motion inbetweening task, execute the following commands:

python -m evaluation.evaluation_script

Note: Please refer to the T2M Repository for details on the text-to-motion evaluation.

🖥️ Visualization

We use the SMPL-X Blender add-on to visualize the generated .npz file.

Please register at (https://smpl-x.is.tue.mpg.de), download the SMPL-X for Blender add-on, and follow the provided installation instructions.

Once installed, select Animation -> Add Animation within the SMPL-X sidebar tool, and navigate to the generated .npz file for visualization.

🙏 Acknowledgments

SMPL/SMPL-X: For human body modeling
PyTorch3D: For rotation conversion utilities
HumanML3D Dataset: For motion and text data
OmniControl: For the HintBlock module in the ControlNet implementation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

🎯 Abstract

📋 TODO

🔮 Setup

Environment Setup

Dataset and Pretrained Model

🔧 Training

1. Train ACTOR-PAE

2. Encode Training Data into Periodic Latent

2. Train Diffusion Model and ControlNets

📈 Evaluation

🖥️ Visualization

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
evaluation		evaluation
model		model
processed_data		processed_data
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

asdryau/SOSControl

Folders and files

Latest commit

History

Repository files navigation

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

🎯 Abstract

📋 TODO

🔮 Setup

Environment Setup

Dataset and Pretrained Model

🔧 Training

1. Train ACTOR-PAE

2. Encode Training Data into Periodic Latent

2. Train Diffusion Model and ControlNets

📈 Evaluation

🖥️ Visualization

🙏 Acknowledgments

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages