Skip to content
Implementation of CVPR2017 paper "A Hierarchical Approach for Generating Descriptive Image Paragraphs" in Tensorflow (in progress...)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Implementation of CVPR2017 paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs by ** Jonathan Krause, Justin Johnson, Ranjay Krishna, Fei-Fei Li**

NOTE: This repo is based on densecap-tensorflow, and it's still buggy.


Update 2018.1.27

  • Following procedures will be adapted for IM2P soon.


To install required python modules by:

pip install -r lib/requirements.txt

Preparing data


Website of Visual Genome Dataset

  • Make a new directory VG wherever you like.
  • Download images Part1 and Part2, extract all (two parts) to directory VG/images
  • Download image meta data, extract to directory VG/1.2 or VG/1.0 according to the version you download.
  • Download region descriptions, extract to directory VG/1.2 or VG/1.0 accordingly.
  • For the following process, we will refer directory VG as raw_data_path

Unlimit RAM

If one has RAM more than 16G, then you can preprocessing dataset with following command.

$ cd $ROOT/lib
$ python --version [version] --path [raw_data_path] \
        --output_dir [dir] --max_words [max_len]

Limit RAM (Less than 16G)

If one has RAM less than 16G.

  • Firstly, setting up the data path in info/ accordingly, and run the script with python. Then it will dump regions in REGION_JSON directory. It will take time to process more than 100k images, so be patient.
$ cd $ROOT/info
$ python read_regions --version [version] --vg_path [raw_data_path]
  • In lib/, set up data path accordingly. After running the file, it will dump gt_regions of every image respectively to OUTPUT_DIR as directory.
$ cd $ROOT/lib
$ python --version [version] --path [raw_data_path] \
        --output_dir [dir] --max_words [max_len] --limit_ram

Compile local libs

$ cd root/lib
$ make


Add or modify configurations in root/scripts/dense_cap_config.yml, refer to 'lib/' for more configuration details.

$ cd $ROOT
$ bash scripts/ [dataset] [net] [ckpt_to_init] [data_dir] [step]


  • dataset: visual_genome_1.2 or visual_genome_1.0.
  • net: res50, res101
  • ckpt_to_init: pretrained model to be initialized with. Refer to tf_faster_rcnn for more init weight details.
  • data_dir: the data directory where you save the outputs after prepare data.
  • step: for continue training.
    • step 1: fix convnet weights
    • stpe 2: finetune convnets weights
    • step 3: add context fusion, but fix convnets weights
    • step 4: finetune the whole model.


Create a directory data/demo

$ mkdir $ROOT/data/demo

Then put the images to be tested in the directory and run

$ cd $ROOT
$ bash scripts/ [ckpt_path] [vocab_path]

It will create html files in $ROOT/demo, just click it. Or you can use the web-based visualizer created by karpathy by

$ cd $ROOT/vis
$ python -m SimpleHTTPServer 8181

Then point your web brower to http://localhost:8181/view_results.html.


  • Debugging.


You can’t perform that action at this time.