GitHub - CityU-AIM-Group/SOMA: [ICCV' 23 ORAL] Novel Scenes & Classes: Towards Adaptive Open-set Object Detection

Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.

This work breaks through the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.

The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.

If you have any ideas and problems you hope to discuss, you can reach me via E-mail.

2024/02/29:

I sincerely apologize for the big mistake I made when cleaning and publishing my code. I sincerely apologize to readers who ran our code before and could not achieve similar results.

Since City Val only has 500 images and is insufficient to evaluate the open-set performance (e.g., AOSE), we follow the p2c setting to use all unlabeled data for evaluation. Please check on our corrected target domain dataset settings. I am so so sorry that I forgot this when I cleaned my code!! Besides, thank you for raising the issue #5 (comment) #4 (comment) to let me notice this mistake.

SOMA/datasets/DAOD.py

Line 44 in 97af7f0

# 'val_data_list': root / 'Cityscapes/AOOD_Main/val_target.txt',

💡 Preparation

Step 1: Clone and Install the Project

(a) Clone the repository

git clone https://github.com/CityU-AIM-Group/SOMA.git

(b) Install the project following Deformable DETR

Note that the following is in line with our experimental environments, which is slightly different from the official one.

# Linux, CUDA>=9.2, GCC>=5.4
# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100 
# Establish the conda environment

conda create -n aood python=3.7 pip
conda activate aood
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

# Compile the project
cd ./models/ops
sh ./make.sh

# unit test (should see all checking is True)
python test.py

# NOTE: If you meet the permission denied issue when starting the training
cd ../../ 
chmod -R 777 ./

Step 2: Download Necessary Resources

(a) Download pre-processed datasets (VOC format) from the following links

	(Foggy) Cityscapes	Pascal VOC	Clipart	BDD100K (Daytime)
Official Links	Imgs	Imgs+Labels	-	Imgs
Our Links	Labels	-	Imgs+Labels	Labels

(b) Download DINO-pretrained ResNet-50 from this link

Step 3: Change the Path

(a) Change the data path as follows.

[DATASET_PATH]
└─ Cityscapes
   └─ AOOD_Annotations
   └─ AOOD_Main
      └─ train_source.txt
      └─ train_target.txt
      └─ val_source.txt
      └─ val_target.txt
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ bdd_daytime
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ clipart
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ VOCdevkit
   └─ VOC2007
   └─ VOC2012

For bdd100k daytime, put all images into bdd_daytime/JPEGImages/*.jpg.

The image settings for other benchmarks are consistent with SIGMA.

(b) Change the data root in the config files

Replace the DATASET.COCO_PATH in all yaml files in config by your data root $DATASET_PATH, e.g.,

SOMA/configs/soma_aood_city_to_foggy_r50.yaml

Line 22 in 41c11cb

COCO_PATH: /home/wuyangli2/data/

(c) Change the path of DINO-pretrained backbone

Replace the backbone loading path:

SOMA/models/backbone.py

Line 107 in 41c11cb

state_dict = torch.load('./dino_resnet50_pretrain.pth')

🔥 Start Training

We use two GPUs for training with 2 source images and 2 target images as input. Please take a look at the generated eval_results.txt file in OUTPUT_DIR, which saves the per-epoch evaluation results in the latex table format.

GPUS_PER_NODE=2 
./tools/run_dist_launch.sh 2 python main_multi_eval.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1

We provide some scripts in our experiments in run.sh. After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.

📦 Well-trained models

Will be provided later

💬 Notification

The core idea is to select informative motifs (which can be treated as the mix-up of object queries) for self-training.
You can try the DA version of OW-DETR in this repository by setting:

-opts AOOD.OW_DETR_ON True

Adopting SAM to address AOOD may be a good direction.
To visualize unknown boxes, post-processing is needed in PostProcess.

📝 Citation

If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.

@InProceedings{Li_2023_ICCV,
    author    = {Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
    title     = {Novel Scenes \& Classes: Towards Adaptive Open-set Object Detection},
    booktitle = {ICCV},
    year      = {2023},
}

Relevant project:

Exploring a similar task for the image classification. [link]

@InProceedings{Li_2023_CVPR,
    author    = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
    title     = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
    booktitle = {CVPR},
    year      = {2023},
}

🤞 Acknowledgements

We greatly appreciate the tremendous effort for the following works.

This work is based on the DAOD framework AQT.
Our work is highly inspired by OW-DETR and OpenDet.
The implementation of the basic detector is based on Deformable DETR.

📒 Abstract

Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, i.e., statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
configs		configs
datasets		datasets
models		models
tools		tools
util		util
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
config.py		config.py
engine.py		engine.py
engine_aood.py		engine_aood.py
main.py		main.py
main_multi_eval.py		main_multi_eval.py
requirements.txt		requirements.txt
run.sh		run.sh

License

CityU-AIM-Group/SOMA

Folders and files

Latest commit

History

Repository files navigation