Skip to content
@Q-Future

Visual Evaluation with Foundation Models

We are working towards a future that one foundation model can be a multi-purpose expert for low-level visual perception and visual evaluation.

👁️‍🗨️ Low-level Visual Perception in the Foundation Model Era

🔖Aiming at next-era cornerstone research

Low-level Visual Perception | Multi-Modality Large Language Models | Visual Quality Assessment

📖Main Projects

  • Co-Instruct: Homepage, Repo, Demo. Open-ended visual quality comparer (up to 4 images), low-level visual assistant, an improved version of ②Q-Instruct [CVPR 2024].

  • Q-Align [ICML 2024]: Homepage, Repo, Demo. A unified visual scorer for images and videos, via text-instructed alignment on multi-modality foundation models; can efficiently fine-tune to more datasets with stable good performance. State-of-the-art on IQA, VQA, and IAA.

  • Q-Instruct [CVPR 2024]: Homepage, Repo, 200K Dataset, Technical Report A large-scale instruction tuning dataset to improve low-level perceptual abilities of foundation models.

  • Q-Bench+ [ICLR2024, Spotlight]: Homepage, Repo, Data-Single, Data-Pair, Preprint The first low-level benchmark for foundation models on low-level vision.

🖋️Extension Projects

  • Q-Boost: Homepage A discussion on boosting the IQA performance for non-specially-IQA-aligned MLLMs.

  • [Pending]Chinese-Q-Bench/质衡: Homepage, Repo The first attempt to test multi-lingual abilities on low-level vision.

Maintained by Teo Wu@Singapore and Zicheng Zhang@Shanghai.

Pinned Loading

  1. Q-Align Q-Align Public

    ③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 445 26

  2. Q-Bench Q-Bench Public

    ①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

    Jupyter Notebook 269 13

  3. Q-Instruct Q-Instruct Public

    ②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.

    Python 224 10

  4. Co-Instruct Co-Instruct Public

    ④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.

    81 5

  5. Visual-Question-Answering-for-Video-Quality-Assessment Visual-Question-Answering-for-Video-Quality-Assessment Public

    Official released code for VQA² series models

    Python 42 1

  6. A-Bench A-Bench Public

    [ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

    143 3

Repositories

Showing 10 of 16 repositories

Top languages

Loading…

Most used topics

Loading…