Yuanxun Lu1*, Jingyang Zhang2*, Tian Fang2, Jean-Daniel Nahmias2, Yanghai Tsin2, Long Quan3, Xun Cao1, Yao Yao1†, Shiwei Li2
1Nanjing University, 2Apple, 3HKUST
*Equal contribution †Corresponding author
Project Page | Paper | Weights
This is the official implementation of Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using the same model.
This repository includes the model inference pipeline and the modified 3DGS reconstruction pipeline for 3D reconstruction.
Matrix3D supports various photogrammetry tasks via masked inference.
-
This project is successfully tested on Ubuntu 20.04 with PyTorch 2.4 (Python 3.10). We recommend creating a new environment and install necessary dependencies:
conda create -y -n matrix3d python=3.10 conda activate matrix3d # Here we take Pytorch 2.4 with cuda 11.8 as an example # If you install a different PyTorch version, please select a matched xformers/pytorch3d version pip install torch==2.4.0 torchvision==0.19.0 xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118 pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.7+pt2.4.0cu118 pip install -r requirements.txt # fixed the requirement conflicts from nerfstudio pip install timm==1.0.11
Some dependencies may require CUDA with the same version used by
torch
in your system, and the installation may not work out of the box. Please refer to their official repo for troubleshooting.
- Download the Pre-trained model:
- Download the checkpoints: matrix3d_512.pt
- Create a
checkpoints
folder and put the pre-trained model into it.
- (Optional) Download
IS-Net
checkpoint if you would like to use single-view to 3d reconstruction:- Download the pre-trained model
isnet-general-use.pth
from the DIS official repo and also put it into thecheckpoints
folder.
- Download the pre-trained model
-
Matrix3D supports several photogrammetry tasks and their dynamic compositions via masked inference. Here we provide several example scripts on the CO3Dv2 dataset. All results will be saved to the
results
folder by default.-
Novel View Synthesis
sh scripts/novel_view_synthesis.sh examples/co3dv2-samples/31_1359_4114
This script demonstrates the usage of novel view synthesis from single-view image input.
For all diffusion sampling tasks, we use indicators
mod_flags
andview_ids
to control the input states inL48-L56
. You could try to set a different modality flag or view numbers to achieve different tasks, such as predict novel views from 2 posed RGB images. -
Pose Estimation
sh scripts/pose_estimation.sh examples/co3dv2-samples/31_1359_4114
This script demonstrates the usage of pose prediction from images. The saved
*.png
and*.html
file demonstrates a visual comparison between predictions and groundtruth values.Replace the data root to an unposed data folder like
examples/unposed-samples/co3dv2/201_21613_43652
would generate the results without comparisons to groundtruth poses.It is strongly recommended to provide the camera intrinsics saved in the .txt files since the model is trained with known camera intrinsics. If not, the processor would set a default fov=60 and performance may degrade. You could also change the default Fov value by passing
--default_fov
. -
Depth Prediction
sh scripts/depth_prediction.sh examples/co3dv2-samples/31_1359_4114
This script demonstrates the usage of depth prediction from several posed images. The back-projected groundtruth and prediction point clouds can be found in the folder.
-
-
By dynamically combining the above tasks, one could later apply a modified 3DGS pipeline to achieve 3D reconstruction from various inputs, even with unknown camera parameters. In the following, we provide two specific examples:
-
Single-view to 3D
sh scripts/single_view_to_3d.sh single-view-to-3d examples/single-view/skull.png
The 3DGS rendering results are saved in
results/single-view-to-3d/skull/3DGS-render-traj.mp4
.In this task, camera Fov is set to 60 by default, while you could also manually set it by creating a
$name.txt
file along with the image. The dataprocessor would automatically load it. For example, you could replace theskull.png
withghost.png
.Please check the
examples/single-view
folder for more examples. -
Unposed Few-shot to 3D
sh scripts/unposed_fewshot_to_3d_co3dv2.sh unposed-fewshot-to-3d examples/unposed-samples/co3dv2/31_1359_4114
This script demonstrates a reconstruction process from unposed images in CO3Dv2 dataset. Note that the camera trajectories of novel views are sampled on fitted splines from predicted poses and designed to work under object-centric scenes. The specific interpolation video is saved as
3DGS-render-traj1.mp4
by default. You could also change to apply reconstruction on arkitscenes data as follows:sh scripts/unposed_fewshot_to_3d_arkitscenes.sh unposed-fewshot-to-3d examples/unposed-samples/arkitscenes/41069043
The only difference lies in the splined camera generation while the 3DGS part is exactly same. You may need to tune the parameters of trajectory generation and 3DGS reconstruction for different datasets to achieve higher performance.
-
-
Based on the examples above, you can flexibly define specifically tailored tasks by combining different inputs.
-
Notes:
- When trying on the diffusion process, please carefully assign the values of indicators
mods_flags
andview_ids
. Besides, the model is trained with a maximum view number of 8, so do not setview_ids
larger than 8 views. - The example data in
examples/co3dv2-samples
andexamples/unposed-samples
are part of CO3Dv2 and ARKitScenes datasets. The camera extrinsic is saved in FOV values or Blender camera coordinates. In processing, we would convert them into PyTorch3D cameras, and these part codes could be found inL654-659
fromdata/data_preprocessor.py
. Therefore, it is easy for users to change to different camera representations, e.g., you could apply the official Pytorch3D conversion functionpytorch3d.utils.cameras_from_opencv_projection
to convert OpenCV cameras into Pytorch3D cameras.
- When trying on the diffusion process, please carefully assign the values of indicators
This sample code is released under the LICENSE terms.
@article{lu2025matrix3d,
title={Matrix3D: Large Photogrammetry Model All-in-One},
author={Lu, Yuanxun and Zhang, Jingyang and Fang, Tian and Nahmias, Jean-Daniel and Tsin, Yanghai and Quan, Long and Cao, Xun and Yao, Yao and Li, Shiwei},
journal={Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}