This project is a semantic full-reference IQA pipeline with very small training data (510 pairs), score labels 0..5, and official metrics SROCC / PLCC.
Implemented in siqa/model.py as a Siamese network with two frozen pretrained backbones:
- Backbone A (structure-aware): configurable
DINOv3-Base(vit_base_patch16_dinov3, default)Swin-T(swin_tiny_patch4_window7_224, rollback option)
- Backbone B (semantic-aware):
CLIP Vision(defaultclip_vit_l14_336, can switch toclip_vit_b32)
For each pair (Ref, Dist), model extracts:
-
$F_{swin_ref}$ ,$F_{swin_dist}$ -
$F_{clip_ref}$ ,$F_{clip_dist}$
To break the “quality-only shortcut”, the model explicitly injects CLIP cosine similarity:
Fusion uses:
$Diff_{swin}=|F_{swin_ref}-F_{swin_dist}|$ $Diff_{clip}=|F_{clip_ref}-F_{clip_dist}|$ - CLIP multiplication branch: $F_{mult_clip}=\tilde{F}{clip_ref}\odot\tilde{F}{clip_dist}$
- Safe default (no feature-dim inflation):
$Fused=Concat(F_{swin_ref},F_{swin_dist},Diff_{swin},F_{clip_ref},Diff_{clip},F_{mult_clip},S_{cos})$ - (
F_{clip_dist}is replaced byF_{mult_clip})
Then a required anti-overfitting bottleneck is applied:
Linear -> BN -> SiLU -> Dropout(0.5)- default
bottleneck_dim=256
In inference/eval (model.eval()), a hard semantic gate is enabled:
- If
S_cos < semantic_gate_threshold(default0.4), force logits toward class0 - This makes semantically unrelated pairs strongly biased to near-zero scores
Now supports rollback-friendly gate modes:
semantic_gate_mode: off→ disable gatesemantic_gate_mode: hard→ single-threshold hard vetosemantic_gate_mode: soft→ dual-threshold soft gate (recommended)
Soft gate behavior (when semantic_gate_mode: soft):
S_cos < semantic_gate_threshold(e.g.,0.4) → hard veto to class0semantic_gate_threshold <= S_cos < semantic_gate_high_threshold(e.g.,0.5) → soft logit penalty toward lower scoresS_cos >= semantic_gate_high_threshold→ no gate intervention
train_siqa.py still uses hybrid objective:
Prediction score:
Because servers in China may have unstable external access:
- CLIP supports local-first loading:
model.clip_local_dirmodel.clip_local_files_only: true
- Swin supports local checkpoint loading:
model.swin_local_path
- DINOv3 (timm) supports online pretrained loading (default) and cache reuse
If no local files are given:
- CLIP may require
HF_ENDPOINT=https://hf-mirror.com - Swin may download from
torchvisionmodel hub once - DINOv3 may download via
timm/HuggingFace cache on first run
Latest verified defaults in configs/siqa_base.yaml:
model.structure_backbone: vit_base_patch16_dinov3model.swin_local_path: ""(empty = allow automatic Swin download)model.clip_local_dir: ""model.clip_local_files_only: false(allow online CLIP download)
Important note:
- One-line switch back to Swin: set
model.structure_backbone: swin_tiny_patch4_window7_224. HF_ENDPOINTaffects HuggingFace downloads (CLIP and most timm DINOv3 checkpoints), but not torchvision Swin download.
In configs/siqa_base.yaml:
model.structure_backbonemodel.ablation_mode(full/clip_only/structure_only)model.swin_namemodel.clip_namemodel.freeze_backbonesmodel.swin_local_pathmodel.clip_local_dirmodel.clip_local_files_onlymodel.clip_interpolate_pos_encodingmodel.clip_mult_enabledmodel.clip_mult_replace_rawmodel.clip_mult_l2_normmodel.bottleneck_dimmodel.bottleneck_dropoutmodel.semantic_gate_enabledmodel.semantic_gate_modemodel.semantic_gate_thresholdmodel.semantic_gate_high_thresholdmodel.gate_logit_strengthmodel.soft_gate_logit_strength
Install:
pip install -r requirements.txtTrain:
python3 train_siqa.py --config configs/siqa_base.yamlOne-line rollback to Swin in config:
model:
structure_backbone: swin_tiny_patch4_window7_224Ablation by config only:
model:
ablation_mode: full # full model (default)
# ablation_mode: clip_only # semantic-only (CLIP)
# ablation_mode: structure_only # structural-only (DINOv3/Swin)One-click ablation suite (Baseline + A1/A2/A3/A4):
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
bash run_ablation_suite.sh configs/siqa_base.yamlOutputs are saved under ${output.work_dir}_ablation_suite:
- per experiment:
kfold_summary.json,kfold_prediction_mean.csv,kfold_prediction_weighted.csv - global summary table:
ablation_summary.csvandablation_summary.md
Run stratified 5-fold training + OOF + ensemble in one command:
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
bash run_kfold.sh configs/siqa_base.yamlK-fold options:
- Default uses
--num_folds 5and stratification byscore_cls. - You can run a single fold manually:
python3 train_siqa.py --config configs/siqa_base.yaml --num_folds 5 --fold 0- To use weighted ensemble with custom fold weights:
FOLD_WEIGHTS=0.10,0.20,0.25,0.20,0.25 bash run_kfold.sh configs/siqa_base.yamlK-fold outputs (under output.work_dir):
fold_i/checkpoints/best.pth: best checkpoint for foldifold_i/oof_val_predictions.csv: fold validation predictionskfold_oof_predictions.csv: merged OOF predictions from all foldskfold_summary.json: OOF metrics and ensemble metadatakfold_prediction_mean.csv: mean-ensemble inference predictionskfold_prediction_weighted.csv: weighted-ensemble inference predictions
Train with HF mirror (for China):
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
python3 train_siqa.py --config configs/siqa_base.yamlOptional fast transfer:
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1Troubleshooting:
FileNotFoundError: Swin local checkpoint not found ...meansmodel.swin_local_pathpoints to a non-existing local file. Set it to empty ("") for auto-download, or provide a valid local path.HF_HUB_ENABLE_HF_TRANSFER=1requireshf_transferpackage installed.- Older
transformersmay not supportinterpolate_pos_encoding; code now has backward-compatible handling. - If DINOv3 preload is slow/fails in China, keep
HF_ENDPOINT=https://hf-mirror.comand retry.
Submission inference:
python3 infer_val_submission.py \
--config configs/siqa_base.yaml \
--ckpt /data/SIQAdn2/workdirs/siqa_dual_dinov3_clip/fold_1/checkpoints/best.pth \
--out_dir /data/SIQAdn2/submissionTo analyze the relationship between discrete labels (0..5) and CLIP cosine similarity on all training pairs:
python3 tools/analyze_clip_semantic_distribution.py \
--config configs/siqa_base.yaml \
--output_dir tools/output_clip_semantic_analysisOutputs:
clip_cosine_per_pair.csv: per-pairimg_name, score, score_cls, cos_simclip_cosine_class_summary.csv: class-wise cosine range/percentiles/statisticsclip_cosine_global_summary.json: Pearson/Spearman/linear-fit (R^2)clip_score_relationship.png: visualization (scatter + trend)
For end-score calibration (especially to stretch both extremes near 0 and 5 smoothly), use:
python3 tools/logistic_5pl_mapping.py \
--train_pred_csv /data/SIQAdn2/workdirs/siqa_dual_dinov3_clip/kfold_oof_predictions.csv \
--label_xlsx /data/dataset/LoViF/Train/Train_scores.xlsx \
--test_pred_csv /data/SIQAdn2/submission/prediction.csv \
--out_csv /data/SIQAdn2/submission/prediction_mapped.csv \
--out_json /data/SIQAdn2/submission/logistic_5pl_params.jsonOutputs:
- mapped prediction CSV (
Scoreafter 5PL calibration) - fitted beta parameters (
b1..b5) and before/after metrics in JSON
这是一个小样本(510 对)语义全参考 IQA 项目,标签范围 0..5,评价指标是 SROCC / PLCC。
siqa/model.py 已改为 双骨干孪生网络,并默认冻结两套预训练参数:
- Backbone A(结构感知):可配置
DINOv3-Base(vit_base_patch16_dinov3,当前默认)Swin-T(swin_tiny_patch4_window7_224,可一键回退)
- Backbone B(语义对齐):
CLIP Vision(默认clip_vit_l14_336,可改clip_vit_b32)
对每个 (Ref, Dist),提取四组特征:
-
$F_{swin_ref}$ ,$F_{swin_dist}$ -
$F_{clip_ref}$ ,$F_{clip_dist}$
为打破“只看画质、不看语义”的捷径,显式加入:
融合方式:
$Diff_{swin}=|F_{swin_ref}-F_{swin_dist}|$ $Diff_{clip}=|F_{clip_ref}-F_{clip_dist}|$ - CLIP 点乘分支:$F_{mult_clip}=\tilde{F}{clip_ref}\odot\tilde{F}{clip_dist}$
- 默认安全融合(不增加总维度):
$Fused=Concat(F_{swin_ref},F_{swin_dist},Diff_{swin},F_{clip_ref},Diff_{clip},F_{mult_clip},S_{cos})$ - (即用
F_{mult_clip}替换原先的F_{clip_dist}分支)
然后立即经过防过拟合瓶颈层:
Linear -> BN -> SiLU -> Dropout(0.5)- 默认降维到
256(可改128)
在推理与验证阶段(model.eval())启用硬阈值门控:
- 当
S_cos < semantic_gate_threshold(默认0.4) - 直接把 logits 强制偏向
0分类别
这样对“语义完全不相关但画质看起来好”的样本会更强地压低分数。
目前支持可回退门控模式:
semantic_gate_mode: off:关闭门控semantic_gate_mode: hard:单阈值硬门控semantic_gate_mode: soft:双阈值软门控(推荐)
软门控逻辑(当 semantic_gate_mode: soft 时):
S_cos < semantic_gate_threshold(如0.4):执行硬否决,强压到0分方向semantic_gate_threshold <= S_cos < semantic_gate_high_threshold(如0.5):执行软惩罚,温和下压分数S_cos >= semantic_gate_high_threshold:不干预
训练仍采用混合损失:
输出分数:
考虑国内网络环境,已支持本地优先:
- CLIP 本地优先:
model.clip_local_dirmodel.clip_local_files_only: true
- Swin 本地权重:
model.swin_local_path
- DINOv3(timm)默认在线预训练加载,可复用本地缓存
若未提供本地文件:
- CLIP 建议设置
HF_ENDPOINT=https://hf-mirror.com - Swin 可能在首次从
torchvision下载一次权重 - DINOv3 可能在首次通过
timm/ HuggingFace 缓存下载权重
当前已验证可用的默认配置(configs/siqa_base.yaml):
model.structure_backbone: vit_base_patch16_dinov3model.swin_local_path: ""(留空表示允许自动下载 Swin)model.clip_local_dir: ""model.clip_local_files_only: false(允许在线下载 CLIP)
注意:
- 一行切回 Swin:把
model.structure_backbone改成swin_tiny_patch4_window7_224。 HF_ENDPOINT会影响 HuggingFace 下载(CLIP 和大多数 timm DINOv3 权重),但不影响torchvision的 Swin 下载。
configs/siqa_base.yaml 已新增:
model.structure_backbonemodel.ablation_mode(full/clip_only/structure_only)model.swin_namemodel.clip_namemodel.freeze_backbonesmodel.swin_local_pathmodel.clip_local_dirmodel.clip_local_files_onlymodel.clip_interpolate_pos_encodingmodel.clip_mult_enabledmodel.clip_mult_replace_rawmodel.clip_mult_l2_normmodel.bottleneck_dimmodel.bottleneck_dropoutmodel.semantic_gate_enabledmodel.semantic_gate_modemodel.semantic_gate_thresholdmodel.semantic_gate_high_thresholdmodel.gate_logit_strengthmodel.soft_gate_logit_strength
安装依赖:
pip install -r requirements.txt训练:
python3 train_siqa.py --config configs/siqa_base.yaml配置里一行切回 Swin:
model:
structure_backbone: swin_tiny_patch4_window7_224仅通过配置做消融:
model:
ablation_mode: full # 默认:全模型
# ablation_mode: clip_only # 仅语义分支(CLIP)
# ablation_mode: structure_only # 仅结构分支(DINOv3/Swin)一键跑完整消融套件(Baseline + A1/A2/A3/A4):
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
bash run_ablation_suite.sh configs/siqa_base.yaml结果保存在 ${output.work_dir}_ablation_suite:
- 每组实验:
kfold_summary.json、kfold_prediction_mean.csv、kfold_prediction_weighted.csv - 总表:
ablation_summary.csv和ablation_summary.md
一键运行 5 折分层训练 + OOF + 集成:
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
bash run_kfold.sh configs/siqa_base.yamlK 折参数说明:
- 默认
--num_folds 5,按score_cls分层。 - 单独跑某一折:
python3 train_siqa.py --config configs/siqa_base.yaml --num_folds 5 --fold 0- 自定义加权集成权重:
FOLD_WEIGHTS=0.10,0.20,0.25,0.20,0.25 bash run_kfold.sh configs/siqa_base.yamlK 折输出目录(位于 output.work_dir 下):
fold_i/checkpoints/best.pth:第i折最优模型fold_i/oof_val_predictions.csv:第i折验证预测kfold_oof_predictions.csv:全折合并 OOF 预测kfold_summary.json:OOF 指标与集成信息kfold_prediction_mean.csv:均值集成结果kfold_prediction_weighted.csv:加权集成结果
国内镜像训练:
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
cd /data/SIQAv3
export HF_ENDPOINT=https://hf-mirror.com
export HF_HUB_ENABLE_HF_TRANSFER=0
python3 train_siqa.py --config configs/siqa_base.yaml可选加速下载:
source /data/miniforge3/etc/profile.d/conda.sh
conda activate lovif
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1常见报错说明:
FileNotFoundError: Swin local checkpoint not found ...:说明model.swin_local_path指向了不存在的本地文件。请改为空字符串("")走自动下载,或改成正确的本地权重路径。- 设置
HF_HUB_ENABLE_HF_TRANSFER=1时,必须先安装hf_transfer。 - 旧版本
transformers可能不支持interpolate_pos_encoding参数,代码已做向后兼容处理。 - 若 DINOv3 在国内下载慢或失败,请保留
HF_ENDPOINT=https://hf-mirror.com后重试。
提交式推理:
python3 infer_val_submission.py \
--config configs/siqa_base.yaml \
--ckpt /data/SIQAdn2/workdirs/siqa_dual_dinov3_clip/fold_1/checkpoints/best.pth \
--out_dir /data/SIQAdn2/submission用于分析训练集 510 对图像中,离散评分(0..5)与 CLIP 余弦相似度之间的关系:
python3 tools/analyze_clip_semantic_distribution.py \
--config configs/siqa_base.yaml \
--output_dir tools/output_clip_semantic_analysis输出文件:
clip_cosine_per_pair.csv:逐样本img_name, score, score_cls, cos_simclip_cosine_class_summary.csv:各分数类别余弦相似度范围/分位数/统计量clip_cosine_global_summary.json:Pearson/Spearman/线性拟合(含R^2)clip_score_relationship.png:可视化图(散点 + 趋势)
用于把模型分数做平滑校准(尤其把两端更自然地拉向 0/5):
python3 tools/logistic_5pl_mapping.py \
--train_pred_csv /data/SIQAdn2/workdirs/siqa_dual_dinov3_clip/kfold_oof_predictions.csv \
--label_xlsx /data/dataset/LoViF/Train/Train_scores.xlsx \
--test_pred_csv /data/SIQAdn2/submission/prediction.csv \
--out_csv /data/SIQAdn2/submission/prediction_mapped.csv \
--out_json /data/SIQAdn2/submission/logistic_5pl_params.json输出:
- 映射后的预测 CSV(
Score为 5PL 校准后结果) - 拟合得到的
b1..b5参数及映射前后指标对比 JSON