Only with transformers==4.36.1
. No need to install this GitHub repo
import requests
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"zhiyuanyou/DeQA-Score-Mix3",
trust_remote_code=True,
attn_implementation="eager",
torch_dtype=torch.float16,
device_map="auto",
)
from PIL import Image
# The inputs should be a list of multiple PIL images
model.score(
[Image.open(requests.get(
"https://raw.githubusercontent.com/zhiyuanyou/DeQA-Score/main/fig/singapore_flyer.jpg", stream=True
).raw)]
)
If you only need to infer / evaluate:
git clone https://github.com/zhiyuanyou/DeQA-Score.git
cd DeQA-Score
pip install -e .
For training, you need to further install additional dependencies as follows:
pip install -e ".[train]"
pip install flash_attn --no-build-isolation
- CLI Interface
python src/evaluate/scorer.py --img_path fig/singapore_flyer.jpg
- Python API
from src import Scorer
from PIL import Image
scorer = Scorer()
img_list = [Image.open("fig/singapore_flyer.jpg")] # can be a list of multiple PIL images
print(scorer(img_list).tolist())
-
Download our meta files from Huggingface Metas.
-
Download source images from KonIQ, SPAQ, KADID, PIPAL, LIVE-Wild, AGIQA, TID2013, and CSIQ.
-
Arrange the folders as follows:
|-- DeQA-Score
|-- Data-DeQA-Score
|-- KONIQ
|-- images/*.jpg
|-- metas
|-- SPAQ
|-- images/*.jpg
|-- metas
|-- KADID10K
|-- images/*.png
|-- metas
|-- PIPAL
|-- images/Distortion_*/*.bmp
|-- metas
|-- LIVE-WILD
|-- images/*.bmp
|-- metas
|-- AGIQA3K
|-- images/*.jpg
|-- metas
|-- TID2013
|-- images/distorted_images/*.bmp
|-- metas
|-- CSIQ
|-- images/dst_imgs/*/*.png
|-- metas
We provide two model weights (full tuning and LoRA tuning) with similar performance.
Training Datasets | Weights | |
---|---|---|
Full Tuning | KonIQ, SPAQ, KADID | Huggingface Full |
LoRA Tuning | KonIQ, SPAQ, KADID | Huggingface LoRA |
Download one of the above model weights, then arrange the folders as follows:
|-- DeQA-Score
|-- checkpoints
|-- DeQA-Score-Mix3
|-- DeQA-Score-LoRA-Mix3
If you would like to use the LoRA tuning weights, you need to download the base mPLUG-Owl2 weights from Huggingface mPLUG-Owl2, then arrange the folders as follows:
|-- DeQA-Score
|-- ModelZoo
|-- mplug-owl2-llama2-7b
After preparing the datasets, you can infer using pre-trained DeQA-Score or DeQA-Score-LoRA:
sh scripts/infer.sh $ONE_GPU_ID
sh scripts/infer_lora.sh $ONE_GPU_ID
After inference, you can evaluate the inference results:
- SRCC / PLCC for quality score.
sh scripts/eval_score.sh
- KL Divergence / JS Divergence / Wasserstein Distance for score distribution.
sh scripts/eval_dist.sh
Fine-tuning needs to download the mPLUG-Owl2 weights as in Pretrained Weights.
- Only 2 RTX3090 GPUs are required. Revise
--data_paths
in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train_lora.sh $GPU_IDs
- At least 8 A6000 GPUs or 4 A100 GPUs will be enough. Revise
--data_paths
in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train.sh $GPU_IDs
-
Download
split.json
(training & test split info) andmos.json
(mos & std info) of KonIQ, SPAQ, and KADID from Huggingface Metas, and arrange the folders as in Datasets. -
Run the following scripts to construct the distribution-based soft labels.
cd build_soft_labels
python gen_soft_label.py
This work is based on Q-Align. Sincerely thanks for this awesome work.
If you find our work useful for your research and applications, please cite using the BibTeX:
@inproceedings{deqa_score,
title={Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution},
author={You, Zhiyuan and Cai, Xin and Gu, Jinjin and Xue, Tianfan and Dong, Chao},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
year={2025},
}
@article{depictqa_v2,
title={Descriptive Image Quality Assessment in the Wild},
author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
journal={arXiv preprint arXiv:2405.18842},
year={2024}
}
@inproceedings{depictqa_v1,
title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
booktitle={European Conference on Computer Vision},
year={2024}
}