Skip to content

anubhavshrimal/Attention-Beam-Image-Captioning

Repository files navigation

Attention-Beam-Image-Captioning

We present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Beam search helps in finding the most optimal caption that can be generated by the model instead of greedily choosing the word with best score at each decoding step. Following shows how a beam width (k) of 3 helps in generating better captions:

beam search

Dependencies

For dependencies related to this project, environment.yml and requirements.txt files have been provided.

To install the dependencies using conda:

conda env create -f environment.yml
conda env list

Training

Reference data folder and annotations json file for the downloaded dataset (MSCOCO, Flickr8k, Flickr30k) in create_input_files.py and run the python script to create the required dataset.

To train a model run python train.py. All training hyper-parameters are mentioned in train.py.

Note: Pretrained models for MSCOCO, Flickr8k, Flickr30k can be downloaded from here.

The downloaded zip file needs to be extracted in the models/ directory.

Testing / Inference

  • You may use caption.py to generate image captions and attention map over an image.

    python caption.py --img='path/to/image.jpeg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5
    
  • The Jupyter Notebook Caption-Sample-Images.ipynb can be used to caption specified images using the trained model.

  • Generate-Testset-Predictions.ipynb is used for generating predictions in the required format for the testing dataset.

Results

results table

comparing captions

image1 image1a
image2 image2a
image3 image3a

Intercative User Interface

To use the UI based image captioner module run the following commands:

cd ui/
python MainWindowUI.py 

This would open the following user interface:

ui-view1 ui-view3

Project UI Demo

You can find the demo video here on youtube.

About

Image captioning using beam search heuristic on top of the encoder-decoder based architecture

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published