OMG

Title: OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval

The paper has been accepted by CVPR 2022 Workshop.

Abstract

Retrieving tracked-vehicles by natural language descriptions plays a critical role in smart city construction. It aims to find the best match for the given texts from a set of tracked vehicles in surveillance videos. Existing works generally solve it by a dual-stream framework, which consists of a text encoder, a visual encoder and a cross-modal loss function. Although some progress has been made, they failed to fully exploit the information at various levels of granularity. To tackle this issue, we propose a novel framework for the natural language-based vehicle retrieval task, OMG, which Observes Multiple Granularities with respect to visual representation, textual representation and objective functions. For the visual representation, target features, context features and motion features are encoded separately. For the textual representation, one global embedding, three local embeddings and a color-type prompt embedding are extracted to represent various granularities of semantic features. Finally, the overall framework is optimized by a cross-modal multi-granularity contrastive loss function. Experiments demonstrate the effectiveness of our method. Our OMG significantly outperforms all previous methods and ranks the 9th on the 6th AI City Challenge Track2. The codes are available at https://github.com/dyhBUPT/OMG.

Framework

Experiments

Run

Data Preparation

Baidu Disk: link with code "city"

Requirements

CLIP
requirements.txt

Train

python train.py --config configs/Swin-B+CLIP-B_OMG2a_NLAug_IDLoss.yaml --valnum 4

Test

python test.py

Note

We also design the OSG framework for the ensemble. Please refer to the code for details.

Citation

@InProceedings{Du_2022_CVPR,
    author    = {Du, Yunhao and Zhang, Binyu and Ruan, Xiangning and Su, Fei and Zhao, Zhicheng and Chen, Hong},
    title     = {OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2022},
    pages     = {3124-3133}
}

Acknowledgement

A large part of the codes are borrowed from CLT. Thanks for their excellent work!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
ensemble		ensemble
models		models
README.md		README.md
ReRanking.py		ReRanking.py
config.py		config.py
datasets.py		datasets.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py
validate.py		validate.py

dyhBUPT/OMG

Folders and files

Latest commit

History

Repository files navigation

OMG

Abstract

Framework

Experiments

Run

Data Preparation

Requirements

Train

Test

Note

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages