Skip to content

[CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

License

Notifications You must be signed in to change notification settings

zhiyuanyou/DeQA-Score

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 2The Chinese University of Hong Kong, 3Shanghai AI Laboratory, 4Shenzhen University of Advanced Technology
#Corresponding author.
Homepage | Model Weights ( Full Tuning / LoRA Tuning ) | Datasets | Paper

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Only with transformers==4.36.1. No need to install this GitHub repo

import requests
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
  "zhiyuanyou/DeQA-Score-Mix3",
  trust_remote_code=True,
  attn_implementation="eager",
  torch_dtype=torch.float16,
  device_map="auto",
)

from PIL import Image

# The inputs should be a list of multiple PIL images
model.score(
  [Image.open(requests.get(
    "https://raw.githubusercontent.com/zhiyuanyou/DeQA-Score/main/fig/singapore_flyer.jpg", stream=True
    ).raw)]
)

Installation

If you only need to infer / evaluate:

git clone https://github.com/zhiyuanyou/DeQA-Score.git
cd DeQA-Score
pip install -e .

For training, you need to further install additional dependencies as follows:

pip install -e ".[train]"
pip install flash_attn --no-build-isolation

Quick Start

Image Quality Scorer

  • CLI Interface
python src/evaluate/scorer.py --img_path fig/singapore_flyer.jpg
  • Python API
from src import Scorer
from PIL import Image

scorer = Scorer()
img_list = [Image.open("fig/singapore_flyer.jpg")] # can be a list of multiple PIL images
print(scorer(img_list).tolist())

Training, Inference & Evaluation

Datasets

|-- DeQA-Score
|-- Data-DeQA-Score
  |-- KONIQ
    |-- images/*.jpg
    |-- metas
  |-- SPAQ
    |-- images/*.jpg
    |-- metas
  |-- KADID10K
    |-- images/*.png
    |-- metas
  |-- PIPAL
    |-- images/Distortion_*/*.bmp
    |-- metas
  |-- LIVE-WILD
    |-- images/*.bmp
    |-- metas
  |-- AGIQA3K
    |-- images/*.jpg
    |-- metas
  |-- TID2013
    |-- images/distorted_images/*.bmp
    |-- metas
  |-- CSIQ
    |-- images/dst_imgs/*/*.png
    |-- metas

Pretrained Weights

We provide two model weights (full tuning and LoRA tuning) with similar performance.

Training Datasets Weights
Full Tuning KonIQ, SPAQ, KADID Huggingface Full
LoRA Tuning KonIQ, SPAQ, KADID Huggingface LoRA

Download one of the above model weights, then arrange the folders as follows:

|-- DeQA-Score
  |-- checkpoints
    |-- DeQA-Score-Mix3
    |-- DeQA-Score-LoRA-Mix3

If you would like to use the LoRA tuning weights, you need to download the base mPLUG-Owl2 weights from Huggingface mPLUG-Owl2, then arrange the folders as follows:

|-- DeQA-Score
|-- ModelZoo
  |-- mplug-owl2-llama2-7b

Inference

After preparing the datasets, you can infer using pre-trained DeQA-Score or DeQA-Score-LoRA:

sh scripts/infer.sh $ONE_GPU_ID
sh scripts/infer_lora.sh $ONE_GPU_ID

Evaluation

After inference, you can evaluate the inference results:

  • SRCC / PLCC for quality score.
sh scripts/eval_score.sh
  • KL Divergence / JS Divergence / Wasserstein Distance for score distribution.
sh scripts/eval_dist.sh

Fine-tuning

Fine-tuning needs to download the mPLUG-Owl2 weights as in Pretrained Weights.

LoRA Fine-tuning

  • Only 2 RTX3090 GPUs are required. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train_lora.sh $GPU_IDs

Full Fine-tuning from the Scratch

  • At least 8 A6000 GPUs or 4 A100 GPUs will be enough. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train.sh $GPU_IDs

Soft Label Construction

  • Download split.json (training & test split info) and mos.json (mos & std info) of KonIQ, SPAQ, and KADID from Huggingface Metas, and arrange the folders as in Datasets.

  • Run the following scripts to construct the distribution-based soft labels.

cd build_soft_labels
python gen_soft_label.py

Acknowledgements

This work is based on Q-Align. Sincerely thanks for this awesome work.

Citation

If you find our work useful for your research and applications, please cite using the BibTeX:

@inproceedings{deqa_score,
  title={Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution},
  author={You, Zhiyuan and Cai, Xin and Gu, Jinjin and Xue, Tianfan and Dong, Chao},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2025},
}

@article{depictqa_v2,
    title={Descriptive Image Quality Assessment in the Wild},
    author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={arXiv preprint arXiv:2405.18842},
    year={2024}
}

@inproceedings{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    booktitle={European Conference on Computer Vision},
    year={2024}
}

About

[CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published