PyTorch Implementation of Knowing When to Look: Adaptive Attention via a Visual Sentinal for Image Captioning Paper
Original Torch Implementation by Lu. et al can be found here
I'm using the Flickr30k Dataset. You may download the images from here. If you wish to use the COCO Dataset, you will need to comment out 2 lines in the code.
I'm also using Karpathy's Train/Val/Test Split. You may download it from here.
You may also use the WORMAP.json
file in the directory if you don't wish to create it again.
preprocess.py
Creates the WORDMAP.json
file and the .h5
files
dataset.py
Creates the custom dataset
util.py
Functions to be used throught the code
models.py
Defines the architectures
train_eval
For Training and Evaluation
visualization.ipynb
For Testing and Visualization
It's very simple! Place the test image in your directory, and name it as test.jpg
, and then run the visualization.ipynb
jupyter notebook file to get the results.
The results of some validation and testing images of the Flickr30k from Karpathy's Split is shown below.
Thanks to @https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning