Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Zhiyuan You¹², Xin Cai², Jinjin Gu⁴, Tianfan Xue²³^#, Chao Dong¹³⁴^#

¹Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, ²The Chinese University of Hong Kong, ³Shanghai AI Laboratory, ⁴Shenzhen University of Advanced Technology

^#Corresponding author.

Homepage | Model Weights ( Full Tuning / LoRA Tuning ) | Datasets | Paper

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Only with transformers==4.36.1. No need to install this GitHub repo

import requests
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
  "zhiyuanyou/DeQA-Score-Mix3",
  trust_remote_code=True,
  attn_implementation="eager",
  torch_dtype=torch.float16,
  device_map="auto",
)

from PIL import Image

# The inputs should be a list of multiple PIL images
model.score(
  [Image.open(requests.get(
    "https://raw.githubusercontent.com/zhiyuanyou/DeQA-Score/main/fig/singapore_flyer.jpg", stream=True
    ).raw)]
)

Installation

If you only need to infer / evaluate:

git clone https://github.com/zhiyuanyou/DeQA-Score.git
cd DeQA-Score
pip install -e .

For training, you need to further install additional dependencies as follows:

pip install -e ".[train]"
pip install flash_attn --no-build-isolation

Quick Start

Image Quality Scorer

CLI Interface

python src/evaluate/scorer.py --img_path fig/singapore_flyer.jpg

Python API

from src import Scorer
from PIL import Image

scorer = Scorer()
img_list = [Image.open("fig/singapore_flyer.jpg")] # can be a list of multiple PIL images
print(scorer(img_list).tolist())

Training, Inference & Evaluation

Datasets

Download our meta files from Huggingface Metas.
Download source images from KonIQ, SPAQ, KADID, PIPAL, LIVE-Wild, AGIQA, TID2013, and CSIQ.
Arrange the folders as follows:

|-- DeQA-Score
|-- Data-DeQA-Score
  |-- KONIQ
    |-- images/*.jpg
    |-- metas
  |-- SPAQ
    |-- images/*.jpg
    |-- metas
  |-- KADID10K
    |-- images/*.png
    |-- metas
  |-- PIPAL
    |-- images/Distortion_*/*.bmp
    |-- metas
  |-- LIVE-WILD
    |-- images/*.bmp
    |-- metas
  |-- AGIQA3K
    |-- images/*.jpg
    |-- metas
  |-- TID2013
    |-- images/distorted_images/*.bmp
    |-- metas
  |-- CSIQ
    |-- images/dst_imgs/*/*.png
    |-- metas

Pretrained Weights

We provide two model weights (full tuning and LoRA tuning) with similar performance.

	Training Datasets	Weights
Full Tuning	KonIQ, SPAQ, KADID	Huggingface Full
LoRA Tuning	KonIQ, SPAQ, KADID	Huggingface LoRA

Download one of the above model weights, then arrange the folders as follows:

|-- DeQA-Score
  |-- checkpoints
    |-- DeQA-Score-Mix3
    |-- DeQA-Score-LoRA-Mix3

If you would like to use the LoRA tuning weights, you need to download the base mPLUG-Owl2 weights from Huggingface mPLUG-Owl2, then arrange the folders as follows:

|-- DeQA-Score
|-- ModelZoo
  |-- mplug-owl2-llama2-7b

Inference

After preparing the datasets, you can infer using pre-trained DeQA-Score or DeQA-Score-LoRA:

sh scripts/infer.sh $ONE_GPU_ID

sh scripts/infer_lora.sh $ONE_GPU_ID

Evaluation

After inference, you can evaluate the inference results:

SRCC / PLCC for quality score.

sh scripts/eval_score.sh

KL Divergence / JS Divergence / Wasserstein Distance for score distribution.

sh scripts/eval_dist.sh

Fine-tuning

Fine-tuning needs to download the mPLUG-Owl2 weights as in Pretrained Weights.

LoRA Fine-tuning

Only 2 RTX3090 GPUs are required. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.

sh scripts/train_lora.sh $GPU_IDs

Full Fine-tuning from the Scratch

At least 8 A6000 GPUs or 4 A100 GPUs will be enough. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.

sh scripts/train.sh $GPU_IDs

Soft Label Construction

Download split.json (training & test split info) and mos.json (mos & std info) of KonIQ, SPAQ, and KADID from Huggingface Metas, and arrange the folders as in Datasets.
Run the following scripts to construct the distribution-based soft labels.

cd build_soft_labels
python gen_soft_label.py

Acknowledgements

This work is based on Q-Align. Sincerely thanks for this awesome work.

Citation

If you find our work useful for your research and applications, please cite using the BibTeX:

@inproceedings{deqa_score,
  title={Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution},
  author={You, Zhiyuan and Cai, Xin and Gu, Jinjin and Xue, Tianfan and Dong, Chao},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2025},
}

@article{depictqa_v2,
    title={Descriptive Image Quality Assessment in the Wild},
    author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={arXiv preprint arXiv:2405.18842},
    year={2024}
}

@inproceedings{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    booktitle={European Conference on Computer Vision},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
build_soft_labels		build_soft_labels
fig		fig
preprocessor		preprocessor
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Installation

Quick Start

Image Quality Scorer

Training, Inference & Evaluation

Datasets

Pretrained Weights

Inference

Evaluation

Fine-tuning

LoRA Fine-tuning

Full Fine-tuning from the Scratch

Soft Label Construction

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zhiyuanyou/DeQA-Score

Folders and files

Latest commit

History

Repository files navigation

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Installation

Quick Start

Image Quality Scorer

Training, Inference & Evaluation

Datasets

Pretrained Weights

Inference

Evaluation

Fine-tuning

LoRA Fine-tuning

Full Fine-tuning from the Scratch

Soft Label Construction

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages