Skip to content

adityaj3/Image_Captioning_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Image_Captioning_Project

Image Captioning Project from Coursera. In this project we will define and train an image-to-caption model, that can produce descriptions for real world images!

Model architecture: CNN encoder and RNN decoder.(https://research.googleblog.com/2014/11/a-picture-is-worth-thousand-coherent.html)

Encoder: We will use pre-trained InceptionV3 model for CNN encoder (https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html) and extract its last hidden layer as an embedding:

Decoder: The decoder part of the model is using a recurrent neural networks and LSTM cells to generate the captions.

Since our problem is to generate image captions, RNN text generator should be conditioned on image. The idea is to use image features as an initial state for RNN instead of zeros. During training we will feed ground truth tokens into the lstm to get predictions of next tokens.(http://cs.stanford.edu/people/karpathy/):

Dataset

The dataset is a collection of images and captions. Here, it’s the COCO dataset. For each image, a set of sentences (captions) is used as a label to describe the scene. Ralavent Links:

Results

Following are a few results obtained after training the model for 12 epochs.

Image Caption
Generated Caption: a close up of a laptop computer on a desk
Generated Caption: a elephant is standing in the dirt next to a fence
Generated Caption: a baseball player swinging a bat at a ball
Generated Caption: a group of people standing around a boat on a river
Generated Caption: a man sitting at a table with a laptop computer
Generated Caption: a group of people sitting on a couch playing video games

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors