Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

This codebase provides a Pytorch implementation for the paper "Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models".

Novel Task: Zero-Shot ID detection

Abstract

Extracting in-distribution (ID) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which has traditionally been done manually. Automating this preprocessing with deep learning techniques presents two key challenges. First, images should be collected using only the name of the ID class without training on the ID data. Second, as we can see why COCO was created, it is crucial to identify images containing not only ID objects but also both ID and OOD objects as ID images to create robust recognizers. In this paper, we propose a novel problem setting called zero-shot in-distribution (ID) detection, where we identify images containing ID objects as ID images, even if they contain OOD objects, and images lacking ID objects as out-of-distribution (OOD) images without any training. To solve this problem, we present a simple and effective approach, Global-Local Maximum Concept Matching (GL-MCM), based on both global and local visual-text alignments of CLIP features. Extensive experiments demonstrate that GL-MCM outperforms comparison methods on both multi-object datasets and single-object ImageNet benchmarks.

Illustration

Global-Local Maximum Concept Matching (GL-MCM)

Set up

Required Packages

We have done the codes with a single Nvidia A100 (or V100) GPU. We follow the environment in MCM.

Our experiments are conducted with Python 3.8 and Pytorch 1.10. Besides, the following commonly used packages are required to be installed:

$ pip install ftfy regex tqdm scipy matplotlib seaborn tqdm scikit-learn

Data Preparation

In-distribution Datasets

We use following datasets as ID:

COCO_single: used in Table1, Table3, and Table4 (Each Image in COCO_single has single-class ID objects and one or more OOD objects)
VOC_single: used in Table1 and Table3 (Each Image in VOC_single has single-class ID objects and one or more OOD objects)
ImageNet: used in Table2
COCO_multi: used in supplementary (Each Image in COCO_multi has multi-class ID objects and one or more OOD objects)

We provide our curated ID and OOD datasets via this url.
For ImageNet-1k, we use the validation partion of the official provided dataset.
After downloads and, please set the datasetes to ./datasets

Out-of-Distribution Datasets

We use the large-scale OOD datasets iNaturalist, SUN, Places, and Texture curated by Huang et al. 2021. We follow instruction from the this repository to download the subsampled datasets. For ImageNet-22K, we use this url in this repository curated by Wang et al. 2021

In addition, we also use ood_coco and ood_voc in this url.

The overall file structure is as follows:

GL-MCM
|-- datasets
    |-- ImageNet
    |-- ID_COCO_single
    |-- ID_VOC_single
    |-- ID_COCO_multi
    |-- OOD_COCO
    |-- OOD_VOC
    |-- iNaturalist
    |-- SUN
    |-- Places
    |-- Texture
    |-- ImageNet-22K
    ...

Quick Start

The main script for evaluating OOD detection performance is eval_id_detection.py. Here are the list of arguments:

--name: A unique ID for the experiment, can be any string
--score: The OOD detection score, which accepts any of the following:
- MCM: Maximum Concept Matching score
- L-MCM: Local MCM (ours)
- GL-MCM: Global-Local MCM (ours)
--seed: A random seed for the experiments
--gpu: The index of the GPU to use. For example --gpu=0
--in_dataset: The in-distribution dataset
- Accepts: ImageNet, COCO_single, COCO_multi, VOC_single
-b, --batch_size: Mini-batch size
--CLIP_ckpt: Specifies the pretrained CLIP encoder to use
- Accepts: RN50, RN101, ViT-B/16.
--num_ood_sumple: the number of OOD samples
--lambda_local: the weight for the local score

The OOD detection results will be generated and stored in results/in_dataset/score/CLIP_ckpt_name/.

We provide bash scripts:

sh scripts/eval_coco_single.sh

Zero-shot OOD Detection

GL-MCM is originally proposed for the Zero-shot ID Detection, but it is also appliable for the Zero-shot OOD Detection.
To apply to the Zero-shot OOD Detection, we recommend to set the value of lambda_local to 0.5.

We provide bash scripts:

sh scripts/eval_imagenet_ood_detection.sh

The comparison results are as follows:

Methods	iNaturalist		SUN		Places		Textures		Avg
Methods	FPR95	AUROC	FPR95	AUROC	FPR95	AUROC	FPR95	AUROC	FPR95	AUROC
ViT-B-16
GL_MCM (lambda=1.0)	15.18	96.71	30.42	93.09	38.85	89.90	57.93	83.63	35.47	90.83
GL_MCM (lambda=0.5)	17.46	96.44	30.73	93.45	37.65	90.64	55.23	85.54	35.27	91.51

Acknowledgement

This code is based on the implementations of MCM

Citaiton

If you find our work interesting or use our code/models, please cite:

@article{miyai2023zero,
  title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
  author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
  journal={arXiv preprint arXiv:2304.04521},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clip		clip
data/ImageNet		data/ImageNet
readme_figs		readme_figs
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_id_detection.py		eval_id_detection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Novel Task: Zero-Shot ID detection

Abstract

Illustration

Global-Local Maximum Concept Matching (GL-MCM)

Set up

Required Packages

Data Preparation

In-distribution Datasets

Out-of-Distribution Datasets

Quick Start

Zero-shot OOD Detection

Acknowledgement

Citaiton

About

Releases

Packages

Languages

License

AtsuMiyai/GL-MCM

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Novel Task: Zero-Shot ID detection

Abstract

Illustration

Global-Local Maximum Concept Matching (GL-MCM)

Set up

Required Packages

Data Preparation

In-distribution Datasets

Out-of-Distribution Datasets

Quick Start

Zero-shot OOD Detection

Acknowledgement

Citaiton

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages