Skip to content

ContinuousPerceptionResearch/cp-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models

A benchmark designed to expose fundamental temporal integration failures in modern multimodal models, revealing their limitations in spatiotemporally consistent visual reasoning

arXiv Website Bench

⚙️ Setup

First, clone the repository:

git clone https://github.com/ContinuousPerceptionResearch/cp-bench.git

We use uv to manage Python dependencies. After installing uv, run the following commands to set up the environment:

uv sync
source .venv/bin/activate
MAX_JOBS=4 uv pip install flash-attn --no-build-isolation

🌟 Data Generation

To generate CP-Bench data, run the following command from the data_generation directory:

uv run bash render_videos.sh

Rendering a single 10-second video may take several minutes, depending on your GPU. To speed up the process, you can split the configuration files in the configs directory and run them in parallel.

The data configurations are easy to modify, allowing you to generate variations with different object counts, colors, shapes, textures, and more.

🚀 Training

Data Preparation

Download the pre-rendered dataset from Hugging Face by running:

bash prepare_data.sh

Start Training

Launch fine-tuning with:

uv run bash finetune.sh

The fine-tuning pipeline is built on the SFT Trainer from TRL. You can customize the training process by adjusting the relevant arguments as needed.

🙏 Acknowledgement

This repo is built upon TRL and CLEVR. We sincerely thank the developers of these projects.

📚 Citation

If you find CP-Bench useful, please consider citing our work:

@article{cpbench2025,
  title={Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models},
  author={Zeyu Wang and Zhenzhen Weng and Serena Yeung-Levy},
  journal={arXiv preprint arXiv:2408.07867},
  year={2025}
}

About

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors