Skip to content

699zjl/CAFramework

Repository files navigation

CAFramework: Consistency-Aware Evaluation for Fundus MLLMs

A consistency-aware evaluation framework built on top of FunBench, extending it with multi-dimensional consistency analysis for Multimodal Large Language Models (MLLMs) in fundus image interpretation.

Overview

This framework extends FunBench with four consistency evaluation dimensions:

  • L5 — Cross-task Reasoning Consistency: Detects logical contradictions between L3 lesion recognition and L4 disease diagnosis
  • L6 — Description Dependency: Evaluates how much models rely on text descriptions (E-mode2 vs E-mode3)
  • L7 — Option Order Robustness: Tests prediction stability when answer options are shuffled
  • L8 — Hierarchical Consistency: Evaluates logical consistency across DR grading granularities

Repository Structure

├── evaluate.py              # Original FunBench evaluation entry
├── evaluate_all.py          # Batch evaluation across all tasks
├── evaluate_cot.py          # Chain-of-thought evaluation
├── predict.py               # Prediction script for MLLMs
├── predict_cot.py           # CoT prediction script
├── predict_hf_api.py        # HuggingFace API predictor
├── predict_huatuo_local.py  # Local HuatuoGPT predictor
├── predict_qilin_local_cot.py  # Local Qilin CoT predictor
├── preprocess.py            # Dataset preprocessing
├── preprocess_info.json     # Preprocessing configuration
├── evaluation/              # Evaluation modules (L5–L8)
├── cot_evaluation/          # Chain-of-thought evaluation modules
├── FunBench/                # FunBench benchmark data (L1–L4)
└── requirements.txt

Getting Started

1. Install dependencies

pip install -r requirements.txt

2. Download FunBench images

FunBench uses 14 public fundus datasets. Download images from the links below and place them under datasets/:

3. Preprocess images

python preprocess.py

4. Run prediction

python predict.py

5. Run evaluation

python evaluate.py
# or for all tasks:
python evaluate_all.py

Acknowledgements

This work builds upon FunBench. We sincerely thank the original authors for their contribution:

@inproceedings{miccai25-funbench,
  title  = {FunBench: Benchmarking Fundus Reading Skills of MLLMs},
  author = {Qijie Wei and Kaiheng Qian and Xirong Li},
  booktitle = {MICCAI},
  year   = {2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages