Image Captioning Project

A deep Learning models that takes as input an image and describes its contents.

Concepts used : Convolutional Neural Networks, Recurrent Neural Networks, Transfer Learning, Word embeddings, Image and Text Processing, Multi Layered Perceptrons, Backpropagation, Gradient Descent, and many more...

Language used : Python

Libraries used : Tensorflow, Keras, Numpy, Pandas, re, etc.

For transfer Learning, used glove embeddings and resnet50 model.

REAL WORLD APPLICATIONS OF IMAGE CAPTIONING

In self driving cars, it can be used to properly caption scene around a car and give a boost to the self driving system.
It can serve as an aid to the blind, wherein we can first convert the scene into text abd then text into voice. This can help guide them on roads and crowded places.
In CCTV cameras, alongwith viewing the world, if we can also generate relevant captions, then we can raise alarms if some malicious activity takes place. Malicious cativity could be detected based on generated captions.
It can help make Google Image Search as good as Google Search. Every image could be first converted to a caption and then search could be performed for other similar images.

DATA DESCRIPTION (Dataset used - Flickr8K)

A total of around 8000 images are there in Flickr8k dataset, divided into training and testing sets. Each image is given 5 different captions by 5 different humans, to account for the fact that an image can be described multiple ways.

METHODOLOGY ADOPTED

STEP 1 :

Words will be generated one at a time in order to generate complete sentences. To generate each word, we provide 2 types of inputs :

Image
Part of the sentence that has already been predicted so that the model can use the context and predict the next word.

STEP 2 : PREPROCESSING TEXT DATA

We add 2 special tokens to each caption that represents start of sentence and end of sentence.
Then we make multiple data samples for each caption and image pair.

STEP 3 : EXTRACTING TEXT FEATURES

We use word embeddings to represent our words.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
image captioning results		image captioning results
model_weights		model_weights
README.md		README.md
descriptions_1.txt		descriptions_1.txt
embedding_matrix.pkl		embedding_matrix.pkl
encoded_test_features.pkl		encoded_test_features.pkl
encoded_train_features.pkl		encoded_train_features.pkl
image captioning project internal evaluation 2.pdf		image captioning project internal evaluation 2.pdf
image captioning project.ipynb		image captioning project.ipynb
image_captioning_train_google_colab_notebook.ipynb		image_captioning_train_google_colab_notebook.ipynb
mid sem eval.ipynb		mid sem eval.ipynb
mid sem eval.pdf		mid sem eval.pdf
model_weights.h5		model_weights.h5
part 2 - create vocab.ipynb		part 2 - create vocab.ipynb
resnet50.py		resnet50.py
train_descriptions.pkl		train_descriptions.pkl
train_encoded_images.p		train_encoded_images.p
word_to_idx		word_to_idx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning Project

REAL WORLD APPLICATIONS OF IMAGE CAPTIONING

DATA DESCRIPTION (Dataset used - Flickr8K)

METHODOLOGY ADOPTED

STEP 1 :

STEP 2 : PREPROCESSING TEXT DATA

STEP 3 : EXTRACTING TEXT FEATURES

Results obtained on test images :

About

Releases

Packages

Languages

ChetnaAgarwal/Image-Captioning-Project

Folders and files

Latest commit

History

Repository files navigation

Image Captioning Project

REAL WORLD APPLICATIONS OF IMAGE CAPTIONING

DATA DESCRIPTION (Dataset used - Flickr8K)

METHODOLOGY ADOPTED

STEP 1 :

STEP 2 : PREPROCESSING TEXT DATA

STEP 3 : EXTRACTING TEXT FEATURES

Results obtained on test images :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages