Object-detection-captioning

Abstract

Our aim is to perform image-to-sentence generation, also known as ‘Image Captioning’, which will bridge the gap between vision and natural language. If our results comes out to be accurate, we can then utilize NLP technologies to understand the world in images. The current dataset that we are working on is: Flickr8K. To achieve our goal, we are going to study and apply existing pre-trained CNN Models i.e. VGG16, ResNet50 & Inception V3. Further, we’ll be attaching our RNN using LSTM language model to it. In the end, We’ll be evaluating our results & comparing each one of them.

Main Focus

Convolution Neural Network used as a feature extractor.

Dataset used

Flickr8k data set used Brief introduction: Flickr8k_Dataset: Contains a total of 8092 images in JPEG format with different shapes and sizes. Of which 6000 are used for training, 1000 for test and 1000 for development. Flickr8k_text : Contains text files describing train_set ,test_set. Flickr8k.token.txt contains 5 captions for each image i.e. total 40460 captions.

Download from here: https://www.kaggle.com/ming666/flicker8k-dataset

EDA

VGG 16 Model

VGG 16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
CNN.jpg		CNN.jpg
My takeaways		My takeaways
Object-detection-captioning.pptx		Object-detection-captioning.pptx
README.md		README.md
code.jpg		code.jpg
eda.jpg		eda.jpg
image-captioning-using-vgg16-on-flicker8k-dataset.ipynb		image-captioning-using-vgg16-on-flicker8k-dataset.ipynb
result.jpg		result.jpg
result_image.jpg		result_image.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Object-detection-captioning

Abstract

Main Focus

Dataset used

EDA

VGG 16 Model

The final predictions:

About

Uh oh!

Releases

Packages

Languages

amandeep25/Object-detection-captioning

Folders and files

Latest commit

History

Repository files navigation

Object-detection-captioning

Abstract

Main Focus

Dataset used

EDA

VGG 16 Model

The final predictions:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages