Skip to content

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

License

Notifications You must be signed in to change notification settings

asdryau/SOSControl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

License: MIT Python 3.9.13 PyTorch arXiv Supplementary

🎯 Abstract

TL;DR

We present the SOS script and SOSControl framework for saliency-aware and precise control of body part orientation and motion timing in text-to-motion generation.

CLICK for full abstract

Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes. We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs. Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.

📋 TODO

  • ✅ Released model and dataloader code
  • ✅ Released model checkpoints and data processing scripts
  • ✅ Released code for generating evaluation motion samples
  • 🔄 Provide demo script
  • 🔄 Detailed instruction on running text-to-motion evaluation scripts in the external repository

🔮 Setup

Environment Setup

  1. Clone the repository

    git clone https://github.com/asdryau/SOSControl.git
    cd SOSControl
  2. Create a conda environment

    conda create -n soscontrol python=3.9.13
    conda activate soscontrol
  3. Install dependencies

    conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
    pip install -r requirements.txt

Dataset and Pretrained Model

  1. Download

    • Download model_weights.zip and data.zip from HERE
  2. Repository Setup

    • Extract both ZIP files and copy the contents into the SOSControl/ directory of the current repository.
  3. File Structure

    SOSControl
    ├── data
    │   ├──  hml3d_motion_data.pkl
    │   ├──  hml3d_split_data.pkl
    │   └──  hml3d_text_data.pkl
    ├── evaluation
    │   ├──  test_discLP_data.pkl
    │   └──  test_discLP_text.pkl
    └── model
        ├──  ControlDiffusion/lightning_logs/version_0/checkpoints/last.ckpt
        ├──  ControlPAE/lightning_logs/version_0/checkpoints/last.ckpt
        ├──  Diffusion/lightning_logs/version_0/checkpoints/last.ckpt
        └──  PAE/lightning_logs/version_0/checkpoints/last.ckpt
  4. Training Data Preprocessing

    # process axis-angle and trans into 269-dim motion format
    python -m processed_data.process_data_format
    
    # extract SOS Scripts (before saliency thresholding)
    python -m processed_data.process_contLP
    python -m processed_data.process_discLP
    
    # process text using CLIP
    python -m processed_data.process_txtemb

🔧 Training

1. Train ACTOR-PAE

python -m model.PAE.train

2. Encode Training Data into Periodic Latent

python -m processed_data.process_paecode

2. Train Diffusion Model and ControlNets

# train model one by one
python -m model.Diffusion.train
python -m model.ControlDiffusion.train
python -m model.ControlPAE.train

📈 Evaluation

To generate the evaluation output for our model, execute the following commands:

python -m evaluation.test_diffuse
python -m evaluation.test_opt

To run the evaluation for the motion inbetweening task, execute the following commands:

python -m evaluation.evaluation_script

Note: Please refer to the T2M Repository for details on the text-to-motion evaluation.

🖥️ Visualization

We use the SMPL-X Blender add-on to visualize the generated .npz file.

Please register at (https://smpl-x.is.tue.mpg.de), download the SMPL-X for Blender add-on, and follow the provided installation instructions.

Once installed, select Animation -> Add Animation within the SMPL-X sidebar tool, and navigate to the generated .npz file for visualization.

🙏 Acknowledgments

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control (AAAI 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages