Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. So in this work I try to implement neural network that is capable of generating text given an image.
In this work I use two different network one for extracting features from image and second is LSTM network for generating the text using that image.
The dataset used in this project is COCO dataset for image captioning 2015, I cannot upload dataset here but you can find it by clicking the link below. Link: https://cocodataset.org/#download- dataset.py :- This file contain the custom dataloader that return train data and test data.
- vocabulary.py :- This file is used for creating the vocabulary.
- model.py :- This file is basically where I define model and sample function for testing
- train.py :- This file is basically used for training the network
- test.py :- As name suggest this file is for testing
- Python==3.6.6
- Pytorch==1.6.0
I am just the beginner and learning about this fascinating field, Please feel free to point out my mistake as well as feel free to contribute. Hope to upload more interesting project in future.


