这是 3DTopia/OpenLRM 的研究分支。本仓库在原始 OpenLRM 推理与训练管线基础上,加入了一套针对编码器误差结构与解码器频谱响应的诊断与修正实验代码,配套实验结果与论文草稿位于
exps/目录。原版 OpenLRM 的 README 保留在下方供查阅。
本工作以 OpenLRM-Mix-Base-1.1 为研究对象,深入前馈架构的内部特征传播机制,首次量化揭示了编码器误差与解码器敏感度之间的频域不匹配(Frequency-domain Mismatch) 现象。基于该诊断,进一步提出与解码器频率传递函数相对齐的参数高效微调方法 LoRA-FreqLoss。核心发现:
- 频谱不匹配现象:编码器残差以高频为主(占 47–52% 能量),解码器受频谱偏置(Spectral Bias)影响表现为低通放大器(低频敏感度是高频的 2.4 倍)。结果是仅 15–17% 的低频残差贡献了 53–77% 的渲染退化,低频误差被放大 30–50 倍。
- 隐空间等价类陷阱:Triplane 优化景观存在被 L1 距离约 2.0 分隔的多个等价盆地。直接以随机初始化的真值做残差分析等同于比较两个无关解。本文提出
pred-init锚定策略消除该表征多解性干扰。 - 极低维误差流形:渲染相关的编码器误差栖身于极低维子空间(rank-4 LoRA 的 147K 参数即可达到 LPIPS 全局最优,超出后反而过拟合)。
- LoRA-FreqLoss 修正:架构层面用低秩自适应阻断高频噪声记忆,优化层面用频率加权损失将梯度引导至低频关键结构。仅用全模型 0.09% 可训练参数实现 LPIPS 降低 11.4%、PSNR +1.9 dB、L1 -16.6%,且无任何样本退化。
新增的实验脚本均在 scripts/ 目录下,结果可视化在 exps/。完整论文见 ./typst
# 1. 下载 Objaverse 物体并用 Blender 渲染 32 视角(参考脚本)
python scripts/render_objaverse_subset.py
# 2. 优化 Pred-Init Oracle Triplane(核心方法论修正)
python scripts/optimize_gt_triplane.py \
--data_dir ./data/rendered/<uid> \
--output_dir ./exps/gt_triplane \
--num_iters 2000 --lr 0.01 \
--init_from_pred
# 3. 验证频谱不一致
python scripts/spatial_frequency_analysis.py \
--triplane_path ./exps/gt_triplane/<uid>_triplane.pt
# 4. 解码器频谱函数 H(f) 测量
python scripts/decoder_sensitivity_and_calibration.py \
--triplane_path ./exps/gt_triplane/<uid>_triplane.pt
# 5. LoRA-FreqLoss 微调
python scripts/finetune_lora_freq.py \
--rank 4 --freq_weight 5.0 --freq_cutoff 0.3 \
--steps 500 --lr 5e-4
# 6. LoRA Rank 消融
python scripts/lora_rank_ablation.py| 子目录 | 内容 |
|---|---|
exps/residual_vis/ |
Triplane 残差热力图与 3D 表面误差点云 |
exps/frequency_analysis_predinit/ |
Pred-Init 基准下的 Triplane 残差功率谱 |
exps/decoder_analysis/ |
解码器频谱传递函数 |
exps/spatial_freq/ |
频段选择性因果干预(低频/高频修正对比) |
exps/lowfreq_decomp/ |
DC 与结构化低频的内部分解 |
exps/latent_refine/ |
测试时优化(TTO)失败案例 |
exps/finetune_freq_v2/ |
全量微调对照实验 |
exps/lora_freq/ |
LoRA-FreqLoss 主实验 |
exps/lora_ablation/ |
LoRA Rank 消融(r ∈ {2, 4, 8, 16, 32}) |
exps/visibility_analysis/ |
可见性分组的误差对比(排除遮挡假设) |
下表列出所有新增实验脚本与其在论文中对应的章节/图表产出。
| 脚本 | 职责 | 产出目录 / 对应图 |
|---|---|---|
scripts/render_objaverse_subset.py |
从 Objaverse 下载物体并用 Blender 渲染 32 视角 | data/rendered/<uid>/ |
scripts/optimize_gt_triplane.py |
基于多视角渲染 L2 优化 Pred-Init Oracle Triplane |
exps/gt_triplane/ |
scripts/visualize_triplane_residual.py |
Triplane 残差热力图 + 3D 表面误差点云 |
exps/residual_vis/ · fig:residual-heatmap
|
scripts/visibility_aware_analysis.py |
可见性分组的残差对比(排除遮挡假设) |
exps/visibility_analysis/ · fig:visibility
|
scripts/frequency_analysis.py |
Triplane 特征空间频率分解(Pred-Init 基准) |
exps/frequency_analysis_predinit/ · fig:cross-sample-freq
|
scripts/decoder_sensitivity_and_calibration.py |
Decoder 频谱传递函数 |
exps/decoder_analysis/ · fig:decoder-sens
|
scripts/spatial_frequency_analysis.py |
频段选择性因果干预(仅修正低频 / 仅修正高频) |
exps/spatial_freq/ · fig:correction-effect
|
scripts/lowfreq_decomposition.py |
低频内部结构分解(DC 与结构化低频) |
exps/lowfreq_decomp/ · fig:lowfreq-decomp
|
scripts/latent_space_refinement.py |
潜空间测试时优化(TTO)失败案例 |
exps/latent_refine/ · fig:tto
|
scripts/finetune_freq_loss.py |
全量微调 + 频率加权损失(样本退化对照) |
exps/finetune_freq_v2/ · fig:fullft-comparison
|
scripts/finetune_lora_freq.py |
LoRA-FreqLoss 主方法(Std MSE vs FreqLoss 对照) |
exps/lora_freq/ · fig:convergence
|
scripts/lora_rank_ablation.py |
LoRA Rank 消融(r ∈ {2, 4, 8, 16, 32}) |
exps/lora_ablation/ · fig:rank-ablation
|
scripts/visualize_finetune_multi.py |
多样本多视角渲染对比总览图 | exps/finetune_freq_v2/render_comparison_labeled.png |
- [2024.03.13] Update training code and release OpenLRM v1.1.1.
- [2024.03.08] We have released the core blender script used to render Objaverse images.
- [2024.03.05] The Huggingface demo now uses
openlrm-mix-base-1.1model by default. Please refer to the model card for details on the updated model architecture and training settings. - [2024.03.04] Version update v1.1. Release model weights trained on both Objaverse and MVImgNet. Codebase is majorly refactored for better usability and extensibility. Please refer to v1.1.0 for details.
- [2024.01.09] Updated all v1.0 models trained on Objaverse. Please refer to HF Models and overwrite previous model weights.
- [2023.12.21] Hugging Face Demo is online. Have a try!
- [2023.12.20] Release weights of the base and large models trained on Objaverse.
- [2023.12.20] We release this project OpenLRM, which is an open-source implementation of the paper LRM.
git clone https://github.com/3DTopia/OpenLRM.git
cd OpenLRM
- Install requirements for OpenLRM first.
pip install -r requirements.txt - Please then follow the xFormers installation guide to enable memory efficient attention inside DINOv2 encoder.
- Model weights are released on Hugging Face.
- Weights will be downloaded automatically when you run the inference script for the first time.
- Please be aware of the license before using the weights.
| Model | Training Data | Layers | Feat. Dim | Trip. Dim. | In. Res. | Link |
|---|---|---|---|---|---|---|
| openlrm-obj-small-1.1 | Objaverse | 12 | 512 | 32 | 224 | HF |
| openlrm-obj-base-1.1 | Objaverse | 12 | 768 | 48 | 336 | HF |
| openlrm-obj-large-1.1 | Objaverse | 16 | 1024 | 80 | 448 | HF |
| openlrm-mix-small-1.1 | Objaverse + MVImgNet | 12 | 512 | 32 | 224 | HF |
| openlrm-mix-base-1.1 | Objaverse + MVImgNet | 12 | 768 | 48 | 336 | HF |
| openlrm-mix-large-1.1 | Objaverse + MVImgNet | 16 | 1024 | 80 | 448 | HF |
Model cards with additional details can be found in model_card.md.
- We put some sample inputs under
assets/sample_input, and you can quickly try them. - Prepare RGBA images or RGB images with white background (with some background removal tools, e.g., Rembg, Clipdrop).
-
Run the inference script to get 3D assets.
-
You may specify which form of output to generate by setting the flags
EXPORT_VIDEO=trueandEXPORT_MESH=true. -
Please set default
INFER_CONFIGaccording to the model you want to use. E.g.,infer-b.yamlfor base models andinfer-s.yamlfor small models. -
An example usage is as follows:
# Example usage EXPORT_VIDEO=true EXPORT_MESH=true INFER_CONFIG="./configs/infer-b.yaml" MODEL_NAME="zxhezexin/openlrm-mix-base-1.1" IMAGE_INPUT="./assets/sample_input/owl.png" python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH
- The recommended PyTorch version is
>=2.1. Code is developed and tested under PyTorch2.1.2. - If you encounter CUDA OOM issues, please try to reduce the
frame_sizein the inference configs. - You should be able to see
UserWarning: xFormers is availableifxFormersis actually working.
- We provide a sample accelerate config file under
configs/accelerate-train.yaml, which defaults to use 8 GPUs withbf16mixed precision. - You may modify the configuration file to fit your own environment.
- We provide the core Blender script used to render Objaverse images.
- Please refer to Objaverse Rendering for other scripts including distributed rendering.
-
A sample training config file is provided under
configs/train-sample.yaml. -
Please replace data related paths in the config file with your own paths and customize the training settings.
-
An example training usage is as follows:
# Example usage ACC_CONFIG="./configs/accelerate-train.yaml" TRAIN_CONFIG="./configs/train-sample.yaml" accelerate launch --config_file $ACC_CONFIG -m openlrm.launch train.lrm --config $TRAIN_CONFIG
-
The inference pipeline is compatible with huggingface utilities for better convenience.
-
You need to convert the training checkpoint to inference models by running the following script.
python scripts/convert_hf.py --config <YOUR_EXACT_TRAINING_CONFIG> convert.global_step=null -
The converted model will be saved under
exps/releasesby default and can be used for inference following the inference guide.
- We thank the authors of the original paper for their great work! Special thanks to Kai Zhang and Yicong Hong for assistance during the reproduction.
- This project is supported by Shanghai AI Lab by providing the computing resources.
- This project is advised by Ziwei Liu and Jiaya Jia.
If you find this work useful for your research, please consider citing:
@article{hong2023lrm,
title={Lrm: Large reconstruction model for single image to 3d},
author={Hong, Yicong and Zhang, Kai and Gu, Jiuxiang and Bi, Sai and Zhou, Yang and Liu, Difan and Liu, Feng and Sunkavalli, Kalyan and Bui, Trung and Tan, Hao},
journal={arXiv preprint arXiv:2311.04400},
year={2023}
}
@misc{openlrm,
title = {OpenLRM: Open-Source Large Reconstruction Models},
author = {Zexin He and Tengfei Wang},
year = {2023},
howpublished = {\url{https://github.com/3DTopia/OpenLRM}},
}
- OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license. Users are responsible for complying with the respective licensing terms of each component.
- Model weights are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. They are provided for research purposes only, and CANNOT be used commercially.






