Zero-shot GCN in pytorch

Introduction

This repository replicates the result in zero-shot-gcn. The code is largely migrated from

Requirement

+pytorch (tensorflow is still needed for tf.FLAGS)

Difference to the original version

Fix the wrong word embedding
Fix the wrong dropout setting
Change a few things as type in adgpm.

None of these actually change the result much.

Before training

Extract features

Download the ImageNet22k images in your own ways.

Download resnet model from site.

(You can also use the torchvision pretrained model (which is used by adgpm). That actually gives better result.)

python main.py xxx/ImageNet22k/images/ --arch resnet50 --pretrained --evaluate --batch-size 750 --workers 24

Then create a symbolic link for features to ../feats.

Preprocess

python tools/obtain_word_embedding.py
python convert_to_gcn_data.py --fc res50 --wv glove

Training

python gcn/train_gcn.py --dataset ../data/glove_res50/

(To run the gpm model in adgpm paper, use following command

python gcn/train_gcn.py --dataset ../data/glove_res50/ --save_path log_gpm --hiddens 2048d,d --adj_norm_type in --feat_norm_type l2

)

Test

Evaluate images in parallel (Faster than the original tensorflow version)

python test_imagenet_pll.py --model xxxx/feat__300 --feat ../feats

To evaluate conse result:

python test_imagenet_conse.py --model xxxx/feat__300 --feat ../feats

Original README

This code is a re-implementation of the zero-shot classification in ImageNet in the paper Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs. The code is developed based on the TensorFlow framework and the Graph Convolutional Network (GCN) repo.

Our pipeline consists of two parts: CNN and GCN.

CNN: Input an image and output deep features for the image.
GCN: Input the word embedding for every object class, and output the visual classifier for every object class. Each visual classifier (1-D weight vector) can be applied on the deep features for classification.

Citation

If you use our code in your research or wish to refer to the benchmark results, please use the following BibTeX entry.

@article{wang2018zero,
  title={Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs},
  author={Wang, Xiaolong and Ye, Yufei and Gupta, Abhinav},
  journal={CVPR},
  year={2018}
}

Using Our Code

git clone git@github.com:JudyYe/zero-shot-gcn.git
cd zero-shot-gcn/src

Without further specification, we default the root directory to zero-shot-gcn/src.

Dataset Preparation

Please read DATASET.md for downloading images and extracting image features.

Testing Demo

With extracted feature and semantic embeddings, at this point, we can perform zero-shot classification with the model we provide.

wget -O ../data/wordnet_resnet_glove_feat_2048_1024_512_300 https://www.dropbox.com/s/e7jg00nx0h2gbte/wordnet_resnet_glove_feat_2048_1024_512_300?dl=0
python test_imagenet.py --model ../data/wordnet_resnet_glove_feat_2048_1024_512_300

The above line defaults to res50 + 2-hops combination and test under two settings: unseen classes with or without seen classes. (see the paper for further explaination.)

We also provide other configurations. Please refer to the code for details.

Main Results

We report the results with the above testing demo code (using ResNet-50 visual features and GloVe word embeddings). All experiments are conducted with the ImageNet dataset.

We first report the results on testing with only unseen classes. We compare our method with the state-of-the-art method SYNC in this benchmark.

ImageNet Subset	Method	top 1	top 2	top 5	top 10	top 20
2-hops	SYNC GCNZ (Ours)	10.5 21.0	17.7 33.7	28.6 52.7	40.1 64.8	52.0 74.3
3-hops	SYNC GCNZ (Ours)	2.9 4.3	4.9 7.7	9.2 14.2	14.2 20.4	20.9 27.6
All	SYNC GCNZ (Ours)	1.4 1.9	2.4 3.4	4.5 6.4	7.1 9.3	10.9 12.7

We then report the results under the generalized zero-shot setting, i.e. testing with both unseen and seen classes. We compare our method with the state-of-the-art method ConSE in this benchmark.

ImageNet Subset	Method	top 1	top 2	top 5	top 10	top 20
2-hops (+1K)	ConSE GCNZ (Ours)	0.1 10.2	11.2 21.2	24.3 42.1	29.1 56.2	32.7 67.5
3-hops (+1K)	ConSE GCNZ (Ours)	0.2 2.4	3.2 5.3	7.3 12.0	10.0 18.2	12.2 25.4
All (+1K)	ConSE GCNZ (Ours)	0.1 1.1	1.5 2.4	3.5 5.4	4.9 8.3	6.2 11.7

We also visualize the t-SNE plots of GCN inputs and outputs for two subtrees of WordNet as followings.

synset word	t-SNE of input word embeddings	t-SNE of output visual classifiers
instrumentality instrumentation
animal, animate being, beast, brute, creature, fauna

Training

As DATASET.md illustrates, convert_to_gcn_data.py prepares data to train GCN. It supports two CNN network fc = res50 or inception, and three semantic embedding wv = glove or google or fasttext. The output will be saved to ../data/$wv_$fc/

python convert_to_gcn_data.py --fc res50 --wv glove

After preparing the data, we can start training by using:

python gcn/train_gcn.py --gpu $GPU_ID 	--dataset ../data/glove_res50/ --save_path $SAVE_PATH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

DATASET.md

DATASET.md

README.md

README.md

feats

feats

images

images

Repository files navigation

Zero-shot GCN in pytorch

Introduction

Requirement

Difference to the original version

Before training

Training

Test

Original README

Citation

Using Our Code

Dataset Preparation

Testing Demo

Main Results

Training

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
src		src
.gitignore		.gitignore
DATASET.md		DATASET.md
README.md		README.md
feats		feats
images		images

ruotianluo/zsl-gcn-pth

Folders and files

Latest commit

History

Repository files navigation

Zero-shot GCN in pytorch

Introduction

Requirement

Difference to the original version

Before training

Training

Test

Original README

Citation

Using Our Code

Dataset Preparation

Testing Demo

Main Results

Training

About

Resources

Stars

Watchers

Forks

Languages