Skip to content

dengyuhai/MS-UGCML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation

This code is tested with pytorch 1.12.0 and CUDA 11.6. Follow the below steps for installation. We have trained and tested our model on Ubuntu 18.04, CUDA 11.6, and PyTorch 1.12.0. You can follow the below steps for installation.

conda create -n msgcml python=3.8
conda activate msgcml
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

Backbone

We use GoogLeNet and MiT-B2 backbone pretrained on the ImageNet-1K dataset. In addition, we also use ViT-B/16 backbone trained by the MoCo-V3 and CLIP.

Dataset

Training data preparation

The training set includes training set of COCO2017 and train-val set of VOC2007, totalling 121298 images.

We utilize SAM to extract 8035395 objects for training.

cd data_prepare/segment_anything
python segment.py --image_dir DATA_root/mscoco2017/train2017 --save_dir DATA_root/SAM_train
python segment.py --image_dir DATA_root/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007JPEGImages --save_dir DATA_root/SAM_train
Images Objects
training dataset 121298 8035395
COCO-VOC test query 9904 51757
COCO-VOC test gallery 9904 862861
BelgaLogos test query 55 55
BelgaLogos test gallery 10000 939361
LVIS test query 4726 50537
LVIS test gallery 4726 433671
Visual Genome test query 108077 79921
Visual Genome test gallery 108077 9803549

Test set

The test set consists of 862,861 gallery objects extracted using SAM from the validation set of COCO 2017 and the test set of VOC 2007, totaling 9,904 images. The query set contains 51,757 objects extracted based on labeled bounding boxes.

We also conduct evaluation experiments on BelgaLogos, LVIS and Visual Genome. In the BelgaLogos dataset, the provided 55 query logo images are utilized as the query set, while the gallery set comprises 939,361 objects extracted using SAM.For the LVIS dataset, the query set contains 4,726 images and 50,537 labeled objects, while the gallery set includes 433,671 objects extracted using SAM from the images. The Visual Genome dataset encompasses a total of 108,077 images, with 2,516,939 labeled bounding boxes. We define a set of 79,921 objects as the query set, while the gallery set contains 9,803,549 objects extracted using SAM.

Training

bash train.sh

Evaluation

bash eval.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published