Physics-guided Shape-from-Template:
Monocular Video Perception through Neural Surrogate Models

David Stotko · Nils Wandel · Reinhard Klein
University of Bonn

CVPR 2024

Paper | Project Page | Video

This repository contains our source code of the paper "Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models".

Abstract

3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400–500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.

Prerequisites

To install the necessary dependencies, you can use the provided environment file. Note that the environment is written for CUDA 12.1. Please change the environment file according to your CUDA version or install CUDA 12.1.

conda env create --name physics-guided_SfT --file=environment.yml

Usage

To perform a 3D reconstruction, the method needs to be provided with a $32 \times 32$ grid mesh (obj file) of the first frame with sorted vertices and uv-coordinates, a corresponding texture file, as well as RGB and mask image files of the video sequence. All paths and additional information about camera properties are listed in a json file that is given as a command line argument to the program. Moreover, you need to provide a neural network which can be downloaded from here or you can train one yourself as described in the section below. Make sure to move the network file into the lowest subdirectory of ./code/logger.

This method was evaluated on the ϕ-SfT real dataset and corresponding option files are given in ./code/scenes/. To reproduce the results, make sure to download the dataset, create a $32 \times 32$ template mesh grid with the properties described above (note that the cloth is always held at the two corner vertices with indices 0 and 992) and set the paths in the json scene files. Pytorch files with sample points on the ground truth surface need to be provided for evaluation. The reconstruction quality heavily depends on the template and results might deviate significantly. In order to evaluate all scenes at once, run the command:

cd code
python main.py --json=[scenes/R1.json,scenes/R2.json,scenes/R3.json,scenes/R4.json,scenes/R5.json,scenes/R6.json,scenes/R7.json,scenes/R8.json,scenes/R9.json] --evaluate=False --net=SMP_param_a_gated3 --load_index=network

There are four command line argument:

--json provides a list of all scenes to evaluate
--evaluate do evaluate against ground truth point cloud during reconstruction (track quality over iterations)
--net determines the network architecture
--load_index determindes which network file to load (given in subdirectories of ./code/logger)

Training a network

For training a new cloth network, navigate into the ./network directory and start the training script from there via the command

cd network
python train.py --clip_grad_value=1 --max_stretching=5000 --min_stretching=100 --max_shearing=20 --min_shearing=0.05 --max_bending=1 --min_bending=0.001 --lr=0.001 --dataset_size=5000 --batch_size=300 --g=0.125 --net=SMP_param_a_gated3

You can vary the parameter ranges for stretching, shearing and bending as well as other training parameters. More information is given in README.md or in ./network/get_param.py. The network states will be saved in the subdirectories of ./network/logger. Note that the results might vary for each run.

Citation

If you use this code for your project, please cite the following publication:

@article{stotko2023physics,
  title={Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models},
  author={Stotko, David and Wandel, Nils and Klein, Reinhard},
  journal={arXiv preprint arXiv:2311.12796},
  year={2023}
}

License

This software is provided under MIT license. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

evaluation

evaluation

network

network

readme_images

readme_images

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Physics-guided Shape-from-Template:
Monocular Video Perception through Neural Surrogate Models

David Stotko · Nils Wandel · Reinhard Klein
University of Bonn

CVPR 2024

Paper | Project Page | Video

Abstract

Prerequisites

Usage

Training a network

Citation

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
data		data
evaluation		evaluation
network		network
readme_images		readme_images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

DavidStotko/Physics-guided-Shape-from-Template

Folders and files

Latest commit

History

Repository files navigation

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

David Stotko · Nils Wandel · Reinhard Klein University of Bonn

CVPR 2024

Paper | Project Page | Video

Abstract

Prerequisites

Usage

Training a network

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Physics-guided Shape-from-Template:
Monocular Video Perception through Neural Surrogate Models

David Stotko · Nils Wandel · Reinhard Klein
University of Bonn