Skip to content

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey [Miyai+, arXiv2024]

Notifications You must be signed in to change notification settings

AtsuMiyai/Awesome-OOD-VLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

paper     paper     paper

Generalized OOD Detection v2

🚀 Our framework encapsulates the evolution of OOD detection and related tasks in the VLM era, fostering collaborative efforts among each community 🤝

1The University of Tokyo  S-Lab, 2Nanyang Technological University  3Duke University  4Salesforce AI Research  5LY Corporation  6Tokyo University of Science  7University of Wisconsin-Madison  

About This Repository

This is a repository of our survey paper. We hope that our survey can help readers and participants better understand the demanding challenges on OOD detection and related topics in the VLM era.
This repository plays the following two roles:

  • This repository provides an easily accessible list of the references mentioned in the paper Table 2. This list will continue to include more promising works as new ones emerge. Please feel free to recommend relevant and good works via Pull Request.
  • We hope that this repository will become a discussion panel for readers to ask questions, raise concerns, and make constructive comments. Feel free to post your ideas to Issues.

Abstract

We present a generalized OOD detection v2, encapsulating the evolution of Anomaly Detection (AD), Novelty Detection (ND), Open-set Recognition (OSR), Out-of-distribution (OOD) detection, and Outlier Detection (OD) in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges in the VLM era have become OOD detection and AD. In addition to the inter-field evolution, we also highlight the significant shift in the definition, problem settings, and benchmarks; our work thus features a comprehensive review of the methodology for OOD detection, including in-depth discussion over other related tasks to clarify their relationship to and influence on OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, represented by GPT-4V. We conclude this survey with open challenges and potential research directions of OOD detection in the VLM and LVLM era.

Common Benchmarks

CLIP-based OOD Detection
CLIP-based AD

Methodology

We introduce methods for CLIP-based OOD detection and CLIP-based AD.
To provide diverse perspectives on OOD detection approaches, we have encompassed a wide range of methods, including preprints.

Timeline

timeline.png

Paper List

methods.png

CLIP-based OOD Detection

Zero-shot
  • ZOC Star
  • MCM Star, Star, Star
  • GL-MCM Star, Star
  • CLIPN Star
  • NegLabel Star
  • EOE Star
  • SeTAR Star
Few-shot
  • PEFT-MCM Star, Star
  • LoCoOp Star
  • LSN
  • IDPrompt Star
  • NegPrompt Star
  • GalLoP
  • Dual-Adapter
  • LAPT Star
Others
  • LSA Star
  • EmptyClass

CLIP-based AD

Zero-shot
  • WinCLIP Star
  • APRIL-GAN Star
  • AnoCLIP Star
  • AnomalyCLIP Star
  • RWDA
  • SDP
  • FiLo
Few-shot
  • WinCLIP+ Star
  • APRIL-GAN Star
  • PromptAD Star
  • InCTRL Star
Others
  • LTAD

Early Advance in LVLM Era

evolution_lvlm.png

In the LVLM Era, OOD detection and related topics have evolved as follows:

(i) Sensory Anomaly Detection ⇒ Sensory Anomaly Detection

AD
  • AnomalyGPT Star
  • Myriad Star
  • Generic AD Star

(ii) OOD Detection ⇒ Unsolvable Problem Detection

UPD
  • UPD Star

Acknowledgment

This repository is built upon the foundation of the following resources: generalized OOD detection v1, OpenOOD codebase.

Contact

If you have questions or find any mistake, please open an issue mentioning @AtsuMiyai.

Citation

If you find our survey paper helpful for your research, please consider citing the following paper:

@article{miyai2024generalized2,
  title={Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey},
  author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Lin, Yueqian and Yu, Qing and Irie, Go and Joty, Shafiq and Li, Yixuan and Li, Hai and Liu, Ziwei and Yamasaki, Toshihiko and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2407.21794},
  year={2024}
}

Besides, please also consider citing our other projects that are closely related to this survey.

  • Our Directly Related Projects
# generalized OOD detection framework v1, survey
@article{yang2024generalized,
  title={Generalized out-of-distribution detection: A survey},
  author={Yang, Jingkang and Zhou, Kaiyang and Li, Yixuan and Liu, Ziwei},
  journal={IJCV},
  pages={1--28},
  year={2024},
}

# MCM (Zero-shot OOD detection)
@inproceedings{ming2022delving,
  title={Delving into out-of-distribution detection with vision-language representations},
  author={Ming, Yifei and Cai, Ziyang and Gu, Jiuxiang and Sun, Yiyou and Li, Wei and Li, Yixuan},
  booktitle={NeurIPS},
  year={2022}
}

# GL-MCM (Zero-shot OOD detection)
@article{miyai2023zero,
  title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2304.04521},
  year={2023}
}

# PEFT-MCM (Few-shot OOD detection, Concurrent work with LoCoOp)
@article{ming2024does,
  title={How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?},
  author={Ming, Yifei and Li, Yixuan},
  journal={IJCV},
  volume={132},
  number={2},
  pages={596--609},
  year={2024},
}

# LoCoOp (Few-shot OOD detection, Concurrent work with PEFT-MCM)
@inproceedings{miyai2023locoop,
  title={LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  booktitle={NeurIPS},
  year={2023}
}

# UPD
@article{miyai2024upd,
  title={Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models},
  author={Miyai, Atsuyuki and Yang, Jingkang and Zhang, Jingyang and Ming, Yifei and Yu, Qing and Irie, Go and Li, Yixuan and Li, Hai and Liu, Ziwei and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2403.20331},
  year={2024}
}
  • Our Other Projects
# OpenOOD 
@inproceedings{yang2022openood,
  title={Openood: Benchmarking generalized out-of-distribution detection},
  author={Yang, Jingkang and Wang, Pengyun and Zou, Dejian and Zhou, Zitang and Ding, Kunyuan and Peng, Wenxuan and Wang, Haoqi and Chen, Guangyao and Li, Bo and Sun, Yiyou and others},
  booktitle={NeurIPS Datasets and Benchmarks Track},
  year={2022}
}

# OpenOOD v1.5 report
@article{zhang2023openood,
  title={OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection},
  author={Zhang, Jingyang and Yang, Jingkang and Wang, Pengyun and Wang, Haoqi and Lin, Yueqian and Zhang, Haoran and Sun, Yiyou and Du, Xuefeng and Zhou, Kaiyang and Zhang, Wayne and Li, Yixuan and Liu, Ziwei and Chen, Yiran and Li, Hai},
  journal={arXiv preprint arXiv:2306.09301},
  year={2023}
}

About

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey [Miyai+, arXiv2024]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published