Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu^1*, Jingyang Zhang^2*, Tian Fang², Jean-Daniel Nahmias², Yanghai Tsin², Long Quan³, Xun Cao¹, Yao Yao^1†, Shiwei Li²
¹Nanjing University, ²Apple, ³HKUST
^*Equal contribution ^†Corresponding author

Project Page | Paper | Weights

This is the official implementation of Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using the same model.

This repository includes the model inference pipeline and the modified 3DGS reconstruction pipeline for 3D reconstruction.

Matrix3D supports various photogrammetry tasks via masked inference.

Environment Setup

This project is successfully tested on Ubuntu 20.04 with PyTorch 2.4 (Python 3.10). We recommend creating a new environment and install necessary dependencies:

conda create -y -n matrix3d python=3.10
conda activate matrix3d
# Here we take Pytorch 2.4 with cuda 11.8 as an example
# If you install a different PyTorch version, please select a matched xformers/pytorch3d version
pip install torch==2.4.0 torchvision==0.19.0 xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.7+pt2.4.0cu118
pip install -r requirements.txt
# fixed the requirement conflicts from nerfstudio
pip install timm==1.0.11

Some dependencies may require CUDA with the same version used by torch in your system, and the installation may not work out of the box. Please refer to their official repo for troubleshooting.

Download the Pre-trained model:
- Download the checkpoints: matrix3d_512.pt
- Create a checkpoints folder and put the pre-trained model into it.
(Optional) Download IS-Net checkpoint if you would like to use single-view to 3d reconstruction:
- Download the pre-trained model isnet-general-use.pth from the DIS official repo and also put it into the checkpoints folder.

Run Demo

Matrix3D supports several photogrammetry tasks and their dynamic compositions via masked inference. Here we provide several example scripts on the CO3Dv2 dataset. All results will be saved to the results folder by default.
- Novel View Synthesis
```
sh scripts/novel_view_synthesis.sh examples/co3dv2-samples/31_1359_4114
```
  This script demonstrates the usage of novel view synthesis from single-view image input.
  
  For all diffusion sampling tasks, we use indicators mod_flags and view_ids to control the input states in L48-L56. You could try to set a different modality flag or view numbers to achieve different tasks, such as predict novel views from 2 posed RGB images.
- Pose Estimation
```
sh scripts/pose_estimation.sh examples/co3dv2-samples/31_1359_4114
```
  This script demonstrates the usage of pose prediction from images. The saved *.png and *.html file demonstrates a visual comparison between predictions and groundtruth values.
  
  Replace the data root to an unposed data folder like examples/unposed-samples/co3dv2/201_21613_43652 would generate the results without comparisons to groundtruth poses.
  
  It is strongly recommended to provide the camera intrinsics saved in the .txt files since the model is trained with known camera intrinsics. If not, the processor would set a default fov=60 and performance may degrade. You could also change the default Fov value by passing --default_fov.
- Depth Prediction
```
sh scripts/depth_prediction.sh examples/co3dv2-samples/31_1359_4114
```
  This script demonstrates the usage of depth prediction from several posed images. The back-projected groundtruth and prediction point clouds can be found in the folder.
By dynamically combining the above tasks, one could later apply a modified 3DGS pipeline to achieve 3D reconstruction from various inputs, even with unknown camera parameters. In the following, we provide two specific examples:
- Single-view to 3D
```
sh scripts/single_view_to_3d.sh single-view-to-3d examples/single-view/skull.png
```
  The 3DGS rendering results are saved in results/single-view-to-3d/skull/3DGS-render-traj.mp4.
  
  In this task, camera Fov is set to 60 by default, while you could also manually set it by creating a $name.txt file along with the image. The dataprocessor would automatically load it. For example, you could replace the skull.png with ghost.png.
  
  Please check the examples/single-view folder for more examples.
- Unposed Few-shot to 3D
```
sh scripts/unposed_fewshot_to_3d_co3dv2.sh unposed-fewshot-to-3d examples/unposed-samples/co3dv2/31_1359_4114
```
  This script demonstrates a reconstruction process from unposed images in CO3Dv2 dataset. Note that the camera trajectories of novel views are sampled on fitted splines from predicted poses and designed to work under object-centric scenes. The specific interpolation video is saved as 3DGS-render-traj1.mp4 by default. You could also change to apply reconstruction on arkitscenes data as follows:
```
sh scripts/unposed_fewshot_to_3d_arkitscenes.sh unposed-fewshot-to-3d examples/unposed-samples/arkitscenes/41069043
```
  The only difference lies in the splined camera generation while the 3DGS part is exactly same. You may need to tune the parameters of trajectory generation and 3DGS reconstruction for different datasets to achieve higher performance.
Based on the examples above, you can flexibly define specifically tailored tasks by combining different inputs.
Notes:
- When trying on the diffusion process, please carefully assign the values of indicators mods_flags and view_ids. Besides, the model is trained with a maximum view number of 8, so do not set view_ids larger than 8 views.
- The example data in examples/co3dv2-samples and examples/unposed-samples are part of CO3Dv2 and ARKitScenes datasets. The camera extrinsic is saved in FOV values or Blender camera coordinates. In processing, we would convert them into PyTorch3D cameras, and these part codes could be found in L654-659 from data/data_preprocessor.py. Therefore, it is easy for users to change to different camera representations, e.g., you could apply the official Pytorch3D conversion function pytorch3d.utils.cameras_from_opencv_projection to convert OpenCV cameras into Pytorch3D cameras.

License

This sample code is released under the LICENSE terms.

Citation

@article{lu2025matrix3d,
  title={Matrix3D: Large Photogrammetry Model All-in-One},
  author={Lu, Yuanxun and Zhang, Jingyang and Fang, Tian and Nahmias, Jean-Daniel and Tsin, Yanghai and Quan, Long and Cao, Xun and Yao, Yao and Li, Shiwei},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
IS_Net		IS_Net
configs		configs
data		data
docs		docs
examples		examples
model		model
nerfacto_matrix3d		nerfacto_matrix3d
scripts		scripts
splatfacto_matrix3d		splatfacto_matrix3d
utils		utils
.gitignore		.gitignore
ACKNOWLEDGMENTS		ACKNOWLEDGMENTS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
pipeline_depth_prediction.py		pipeline_depth_prediction.py
pipeline_novel_view_synthesis.py		pipeline_novel_view_synthesis.py
pipeline_pose_estimation.py		pipeline_pose_estimation.py
pipeline_single_to_3d.py		pipeline_single_to_3d.py
pipeline_unposed_few_shot_to_3d.py		pipeline_unposed_few_shot_to_3d.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Matrix3D: Large Photogrammetry Model All-in-One

Project Page | Paper | Weights

Environment Setup

Run Demo

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

apple/ml-matrix3d

Folders and files

Latest commit

History

Repository files navigation

Matrix3D: Large Photogrammetry Model All-in-One

Project Page | Paper | Weights

Environment Setup

Run Demo

License

Citation

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages