Skip to content
<work in progress> Tensorflow implementation of our phrase detection framework
Python Shell Cuda Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
experiments initial commit Nov 5, 2019
external initial commit Nov 5, 2019
lib
tools initial commit Nov 5, 2019
.gitmodules initial commit Nov 5, 2019
LICENSE
README.md updated readme Nov 5, 2019

README.md

phrase_detection

A Tensorflow implementation of phrase detection framework by Bryan Plummer (bplum@bu.edu). This repository is based on the tensorflow implementation of Faster R-CNN available here which in turn was based on the python Caffe implementation of Faster RCNN available here.

Prerequisites

  • A basic Tensorflow installation. The code follows r1.2 format.
  • Python packages you might not have: nltk cython opencv-python easydict==1.6 scikit-image pyyaml

Code was tested using python 2.7

Installation

  1. Clone the repository
git clone --recursive https://github.com/BryanPlummer/phrase_detection.git

We shall refer to the repo's root directory as $ROOTDIR

  1. Update your -arch in setup script to match your GPU
cd $ROOTDIR/lib
# Change the GPU architecture (-arch) if necessary
vim setup.py
  1. Download and unpack the Flickr30K Entities and ReferIt Game datasets and build the modules and vocabularies from $ROOTDIR using,
./data/scripts/fetch_datasets.sh
  1. Download pretrained COCO models of the desired network which were released in this repo. Dy default, the code assumes they have been unpacked in a directory called pretrained.

  2. Download a pretrained word embedding. By default, the code assumes you have downloaded the HGLMM 6K-D vectors from here and placed the unziped file in the data directory. If you want to use a different word embedding, please update the pointer to the embedding file and its dimensions in lib/model/config.py.

Train your own model

Assuming you completed the Installation setup correctly, you should be able to train a model with,

./experiments/scripts/train_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]
# GPU_ID is the GPU you want to test on
# NET in {vgg16, res50, res101, res152} is the network arch to use
# DATASET {flickr, referit} is defined in train_phrase_detector.sh
# TAG is an experiment name
# Examples:
./experiments/scripts/train_phrase_detector.sh 0 flickr res101 default
./experiments/scripts/train_phrase_detector.sh 1 referit res101 default

Test and evaluate

You can test your models using,

./experiments/scripts/test_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]
# GPU_ID is the GPU you want to test on
# NET in {vgg16, res50, res101, res152} is the network arch to use
# DATASET {flickr, referit} is defined in test_phrase_detector.sh
# TAG is an experiment name
# Examples:
./experiments/scripts/test_phrase_detector.sh 0 flickr res101 default
./experiments/scripts/test_phrase_detector.sh 1 referit res101 default

By default, trained networks are saved under:

output/[NET]/[DATASET]/{TAG}/

Citation

If you find our code useful please consider citing:

@article{plummerPhrasedetection,
  title={Revisiting Image-Language Networks for Open-ended Phrase Detection},
  author={Bryan A. Plummer and Kevin J. Shih and Yichen Li and Ke Xu and Svetlana Lazebnik and Stan Sclaroff and Kate Saenko},
  journal={arXiv:1811.07212},
  year={2018}
}
You can’t perform that action at this time.