Installation

This code is tested with pytorch 1.12.0 and CUDA 11.6. Follow the below steps for installation. We have trained and tested our model on Ubuntu 18.04, CUDA 11.6, and PyTorch 1.12.0. You can follow the below steps for installation.

conda create -n msgcml python=3.8
conda activate msgcml
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

Backbone

We use GoogLeNet and MiT-B2 backbone pretrained on the ImageNet-1K dataset. In addition, we also use ViT-B/16 backbone trained by the MoCo-V3 and CLIP.

Dataset

Training data preparation

The training set includes training set of COCO2017 and train-val set of VOC2007, totalling 121298 images.

We utilize SAM to extract 8035395 objects for training.

cd data_prepare/segment_anything
python segment.py --image_dir DATA_root/mscoco2017/train2017 --save_dir DATA_root/SAM_train
python segment.py --image_dir DATA_root/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007JPEGImages --save_dir DATA_root/SAM_train

	Images	Objects
training dataset	121298	8035395
COCO-VOC test query	9904	51757
COCO-VOC test gallery	9904	862861
BelgaLogos test query	55	55
BelgaLogos test gallery	10000	939361
LVIS test query	4726	50537
LVIS test gallery	4726	433671
Visual Genome test query	108077	79921
Visual Genome test gallery	108077	9803549

Test set

The test set consists of 862,861 gallery objects extracted using SAM from the validation set of COCO 2017 and the test set of VOC 2007, totaling 9,904 images. The query set contains 51,757 objects extracted based on labeled bounding boxes.

We also conduct evaluation experiments on BelgaLogos, LVIS and Visual Genome. In the BelgaLogos dataset, the provided 55 query logo images are utilized as the query set, while the gallery set comprises 939,361 objects extracted using SAM.For the LVIS dataset, the query set contains 4,726 images and 50,537 labeled objects, while the gallery set includes 433,671 objects extracted using SAM from the images. The Visual Genome dataset encompasses a total of 108,077 images, with 2,516,939 labeled bounding boxes. We define a set of 79,921 objects as the query set, while the gallery set contains 9,803,549 objects extracted using SAM.

Training

bash train.sh

Evaluation

bash eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data_prepare

data_prepare

README.md

README.md

eval.sh

eval.sh

requirements.txt

requirements.txt

train.sh

train.sh

Repository files navigation

Installation

Backbone

Dataset

Training data preparation

Test set

Training

Evaluation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data_prepare		data_prepare
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt
train.sh		train.sh

dengyuhai/MS-UGCML

Folders and files

Latest commit

History

Repository files navigation

Installation

Backbone

Dataset

Training data preparation

Test set

Training

Evaluation

About

Resources

Stars

Watchers

Forks

Languages