Skip to content
View Vid2Sim's full-sized avatar

Block or report Vid2Sim

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Vid2Sim/README.md

🎬 Vid2Sim πŸ€–: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng, Wayne Wu, Bolei Zhou

Paper Project Page

Vid2Sim is a novel framework that converts monocular videos into photorealistic and physically interactive simulation environments for training embodied agents with minimal sim-to-real gap.

🚧 Installation

# Clone the repository
git clone https://github.com/Vid2Sim/Vid2Sim.git --recursive
cd Vid2Sim

# Create a new environment
conda create -n vid2sim python=3.10
conda activate vid2sim

# Install dependencies
pip install -e .

# Install reconstruction dependencies
pip install -e submodules/vid2sim-rasterizer
pip install -e submodules/vid2sim-deva-segmentation
pip install -e submodules/simple-knn

# Install RL dependencies
pip install -r src/vid2sim_rl/requirements.txt
pip install -e submodules/ml-agents/ml-agents
[Optional] pip install -e submodules/r3m

πŸŽ₯ Reconstruct the simulation envs from videos

Vid2Sim transforms monocular videos into simulation environments by reconstructing the scene geometry and appearance. The generated environments preserve real-world diversity and visual fidelity, providing minimal sim-to-real gap for agent training.

πŸ‘‰ To get started, follow the reconstruction guide in vid2sim_recon to reconstruct the simulation environment from video.

πŸ€– Train the Agent in Real-to-Sim Environments

After the environment is reconstructed, Vid2Sim translates the real-to-sim environments into a interactive environment with both realistic visual appearance and physical collision to train the agent in diverse situations.

πŸ‘‰ To set up the environment and launch RL training, refer to vid2sim_rl.

πŸ“¦ Repository Structure

Vid2Sim/
β”œβ”€β”€ data/ # Source data
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ vid2sim_recon/ # Reconstruct the simulation environment from video
β”‚   β”œβ”€β”€ vid2sim_rl/ # Train the agent in real-to-sim environments
β”œβ”€β”€ tools/ # Tools scripts
β”œβ”€β”€ README.md # This file

πŸ“š Vid2Sim Dataset

The Vid2Sim dataset includes 30 high-quality real-to-sim simulation environments reconstructed from video clips sourced from 9 web videos. Each clip includes 15 seconds of forward-facing video recorded at 30 fps, providing 450 frames per scene for environment reconstruction and simulation.

We provide the source video data, and interactive Unity environments for agent training.

Citation πŸ“

If you find this work useful in your research, please consider citing:

@article{xie2024vid2sim,
  title={Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation},
  author={Xie, Ziyang and Liu, Zhizheng and Peng, Zhenghao and Wu, Wayne and Zhou, Bolei},
  journal={CVPR},
  year={2025}
}

Popular repositories Loading

  1. Vid2Sim Vid2Sim Public

    [CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

    Python 207 6