Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models

A benchmark designed to expose fundamental temporal integration failures in modern multimodal models, revealing their limitations in spatiotemporally consistent visual reasoning

⚙️ Setup

First, clone the repository:

git clone https://github.com/ContinuousPerceptionResearch/cp-bench.git

We use uv to manage Python dependencies. After installing uv, run the following commands to set up the environment:

uv sync
source .venv/bin/activate
MAX_JOBS=4 uv pip install flash-attn --no-build-isolation

🌟 Data Generation

To generate CP-Bench data, run the following command from the data_generation directory:

uv run bash render_videos.sh

Rendering a single 10-second video may take several minutes, depending on your GPU. To speed up the process, you can split the configuration files in the configs directory and run them in parallel.

The data configurations are easy to modify, allowing you to generate variations with different object counts, colors, shapes, textures, and more.

🚀 Training

Data Preparation

Download the pre-rendered dataset from Hugging Face by running:

bash prepare_data.sh

Start Training

Launch fine-tuning with:

uv run bash finetune.sh

The fine-tuning pipeline is built on the SFT Trainer from TRL. You can customize the training process by adjusting the relevant arguments as needed.

🙏 Acknowledgement

This repo is built upon TRL and CLEVR. We sincerely thank the developers of these projects.

📚 Citation

If you find CP-Bench useful, please consider citing our work:

@article{cpbench2025,
  title={Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models},
  author={Zeyu Wang and Zhenzhen Weng and Serena Yeung-Levy},
  journal={arXiv preprint arXiv:2408.07867},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_generation		data_generation
docs		docs
finetune		finetune
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models

⚙️ Setup

🌟 Data Generation

🚀 Training

Data Preparation

Start Training

🙏 Acknowledgement

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models

⚙️ Setup

🌟 Data Generation

🚀 Training

Data Preparation

Start Training

🙏 Acknowledgement

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages