Official implementation of LegoACE, published at SIGGRAPH Asia 2025.
LegoACE is an autoregressive transformer that generates LEGOยฎ assemblies. The repository supports four generation modes:
- Unconditional generation (GPT-2 backbone)
- Text-conditioned generation (CLIP text encoder)
- Image-conditioned generation (DINOv2; single-view and multi-view)
- DPO refinement for image-conditioned models
- Installation
- Data preparation
- Training
- Inference
- Pre-trained models
- Project structure
- Citation
- Acknowledgments
- License
- Python 3.10+
- PyTorch 2.0+ with CUDA support
- Blender 4.2+ (only required to convert generated LDR files to GLB meshes)
Clone the repository:
git clone https://github.com/VAST-AI-Research/LegoACE.git
cd LegoACEInstall Python dependencies using uv (recommended):
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install the project + dependencies
uv sync
# Install PyTorch with the CUDA wheel that matches your driver
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118Or with plain pip:
pip install -e .
# Optional metrics deps (Chamfer / EMD evaluation):
pip install -e ".[metrics]"Install Blender 4.2.x with the ImportLDraw add-on:
wget https://download.blender.org/release/Blender4.2/blender-4.2.3-linux-x64.tar.xz
tar -xf blender-4.2.3-linux-x64.tar.xz
# Then in Blender: Edit > Preferences > Add-ons > Install... and select the ImportLDraw .zipExport the binary path so the inference scripts can find it:
export BLENDER_BIN=/absolute/path/to/blender-4.2.3-linux-x64/blenderMake sure the project root is on PYTHONPATH whenever you run a script directly:
export PYTHONPATH="$(pwd)"All datasets live under a single root directory. The default root is ./data; override
with the LEGOACE_DATA_ROOT environment variable:
export LEGOACE_DATA_ROOT=/path/to/datasetsEach dataset is a sub-directory:
$LEGOACE_DATA_ROOT/
โโโ <dataset_name>/
โโโ train_dataset.json
โโโ val_dataset.json
โโโ test_dataset.json
โโโ <dataset_name>_dat_dict.json # brick type -> token id
โโโ <dataset_name>_rot_dict.json # rotation matrix string -> token id
โโโ models/
โโโ <model_id>.ldr # LEGO instruction file
โโโ <model_id>.npz # mesh data (vertices/faces/normals per brick)
โโโ <model_id>_normal_*.png # optional: pre-rendered multi-view images
Image-conditioned (train_dataset.json):
{
"model_001": {
"ldr": "/abs/path/to/model_001.ldr",
"npz": "/abs/path/to/model_001.npz"
}
}Text-conditioned:
{
"model_001": {
"ldr": "/abs/path/to/model_001.ldr",
"text": [
"A red sports car",
"Racing vehicle with spoiler",
"Small car model",
"Automotive build"
]
}
}1 <color> x y z r00 r01 r02 r10 r11 r12 r20 r21 r22 <brick_type>.dat
Multi-GPU launch uses ๐ค Accelerate
config files under accelerate_config/ (2/4/6/8 GPU variants are provided).
accelerate launch --config_file ./accelerate_config/4-gpu.yaml \
./train/train_image_npz_mv.py \
--output_dir ./outputs/image-condition \
--split <dataset_name> \
--train_batch_size 6 \
--eval_batch_size 6 \
--dataloader_num_workers 12 \
--num_epochs 200 \
--learning_rate 1e-5 \
--mixed_precision bf16 \
--checkpointing_steps 10000 \
--logger wandb \
--wandb_id my-experimentOr use the example shell script:
OUTPUT_DIR=./outputs/image-condition SPLIT=<dataset_name> bash train/train_npz_mv.shaccelerate launch --config_file ./accelerate_config/8-gpu.yaml \
./train/train_text_condition.py \
--output_dir ./outputs/text-condition \
--split <dataset_name> \
--train_batch_size 8 \
--num_epochs 100 \
--learning_rate 1e-4 \
--mixed_precision bf16 \
--checkpointing_steps 15000 \
--logger wandbaccelerate launch --config_file ./accelerate_config/4-gpu.yaml \
./train/train_unconditional.py \
--train_data_dir /path/to/unconditional/data \
--output_dir ./outputs/unconditional \
--train_batch_size 16 \
--num_epochs 200 \
--learning_rate 1e-4 \
--mixed_precision bf16Build a preference dataset from per-sample Chamfer distance scores, then run DPO:
# Step 1: build preference pairs
python dpo/dpo_dataset/build_cd_dataset.py \
--cd_file /path/to/cd_scores.json \
--output /path/to/preferences.json
# Step 2: DPO training
accelerate launch --config_file ./accelerate_config/8-gpu.yaml \
dpo/train_dpo_acce.py \
--dataset_name <dataset_name> \
--dataset_path /path/to/preferences.json \
--ldr_dir /path/to/per_sample_ldrs \
--ref_image_dir /path/to/reference_4view_images \
--model_path ./outputs/image-condition/checkpoint-260000/transformer \
--save_dir ./outputs/dpo \
--epochs 3 --beta 0.1 --batch_size 2 --lr 1e-6All inference scripts accept arguments via tyro;
run any of them with --help to see every option.
python inference/inference_multi_view.py \
--dataset_name <dataset_name> \
--ckpt_dir ./outputs/image-condition \
--ckpt_iter 260000 \
--dataset_class dataset.MVNpzDataset.MVNpzDataset \
--save_dir ./outputs/inference \
--save_name my-mv-run \
--infer_number 100 \
--batch_size 4 \
--repeat 4 \
--dataset_split valpython inference/inference_image_condition.py \
--dataset_name <dataset_name> \
--ckpt_dir ./outputs/image-condition \
--ckpt_iter 260000 \
--save_dir ./outputs/inference \
--save_name my-image-run \
--infer_number 100python inference/inference_text_condition.py \
--ckpt_dir VAST-AI/LegoACE \
--dataset_name <dataset_name> \
--save_dir ./outputs/inference \
--save_name my-text-run \
--prompts "A red sports car" "A modern brick bed" "A bridge over a river"python inference/infer_uncondition.py \
--ckpt_dir ./outputs/unconditional/checkpoint-200000/transformer \
--dataset_name <dataset_name> \
--dataset_dir /path/to/unconditional/data \
--save_dir ./outputs/inference \
--save_name my-uncond-run \
--num_samples 400Every inference script writes its results into <save_dir>/<save_name>/...:
| Sub-directory | Contents |
|---|---|
ldr/ |
Generated LDR files |
glb/ |
Converted GLB meshes (requires Blender) |
render/ |
Normal-map renderings of each GLB |
input_image/ |
Conditioning images (image-conditioned modes only) |
Defaults are defined in configs/config.py and can be overridden on the command line:
| Argument | Default | Description |
|---|---|---|
--sample_type |
top_k_and_p |
one of top_k_and_p, top_k, no_sample |
--top_k_number |
10 |
top-k sampling cutoff |
--top_p_number |
0.95 |
nucleus sampling threshold |
--cfg_number |
0.0 |
classifier-free guidance scale |
--max_length |
5000 |
maximum sequence length |
Released on the HuggingFace Hub: VAST-AI/LegoACE.
| Model | Conditioning | Subfolder |
|---|---|---|
| LegoACE-MV | Multi-view images (DINOv2) | mv/ |
| LegoACE-Text | Text descriptions (CLIP) | text/ |
Loading from Python:
from model.llama_image_condition import ImageConditionModel
from model.llama_text_condition import TextConditionModel
mv_model = ImageConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="mv").to("cuda")
text_model = TextConditionModel.from_pretrained("VAST-AI/LegoACE", subfolder="text").to("cuda")LegoACE/
โโโ accelerate_config/ # multi-GPU configs (2/4/6/8-gpu, debug)
โโโ configs/
โ โโโ config.py # InferenceArgs dataclass
โโโ dataset/
โ โโโ MVNpzDataset.py # multi-view image dataset
โ โโโ SingleTokenDataset.py # unconditional dataset
โ โโโ dpodataset.py # DPO preference dataset
โ โโโ textDataset.py # text-conditioned dataset
โโโ dpo/
โ โโโ dpo_dataset/
โ โ โโโ build_cd_dataset.py # build preference pairs from CD scores
โ โโโ train_dpo_acce.py # DPO training
โ โโโ train_dpo_multi_gpu.sh
โโโ inference/
โ โโโ inference_image_condition.py
โ โโโ inference_multi_view.py
โ โโโ inference_text_condition.py
โ โโโ infer_uncondition.py
โโโ model/
โ โโโ gpt2.py # GPT2 baseline (unconditional)
โ โโโ llama_image_condition.py # image-conditioned Llama
โ โโโ llama_text_condition.py # text-conditioned Llama
โ โโโ logitsprocessor.py # brick-format-valid logits masking
โ โโโ tokenizer.py # LDR tokenizer
โโโ train/
โ โโโ train_image_npz_mv.py # multi-view image training
โ โโโ train_text_condition.py # text-conditioned training
โ โโโ train_unconditional.py # unconditional training
โโโ utils/
โ โโโ brick_ids.py # brick id <-> class id mappings
โ โโโ data_utils.py # LDR I/O helpers
โ โโโ infer_utils.py # inference-time image grid helpers
โ โโโ ldr_export_dir.py # Blender script: LDR directory -> GLBs
โ โโโ log_utils.py # code-snapshot logger
โ โโโ metric.py # CD/EMD evaluation (optional deps)
โ โโโ misc.py # config-string instantiation
โ โโโ render.py # pyrender normal-map renderer
โ โโโ utils.py # math / geometry helpers
โ โโโ shader/ # pyrender GLSL shaders
โโโ LICENSE
โโโ pyproject.toml
โโโ README.md
If you find this work useful, please cite:
@inproceedings{xu2025legoace,
author = {Hao Xu and Yuqing Zhang and Yiqian Wu and Xinyang Zheng and
Yutao Liu and Xiangjun Tang and Yunhan Yang and Ding Liang and
Yingtian Liu and Yuanchen Guo and Yanpei Cao and Xiaogang Jin},
title = {LegoACE: Autoregressive Construction Engine for Expressive LEGO{\textregistered}
Assemblies},
booktitle = {Proceedings of the {SIGGRAPH} Asia 2025 Conference Papers},
publisher = {{ACM}},
year = {2025},
pages = {40:1--40:11},
doi = {10.1145/3757377.3763881},
url = {https://doi.org/10.1145/3757377.3763881}
}This project builds on several excellent open-source projects:
- DINOv2 (Meta AI) โ image feature extraction
- CLIP (OpenAI) โ text encoding
- Transformers โ model implementations
- Accelerate โ distributed training
- Diffusers โ LR schedulers / utilities
- PyRender โ mesh rendering
- Trimesh โ mesh processing
- Blender + ImportLDraw โ LDR โ GLB conversion
This project is released under the MIT License.
LEGOยฎ is a trademark of the LEGO Group, which does not sponsor, authorize, or endorse this project.