Image Captioning is a task where we write a short description provided an image. When an image with multiple objects is shown to humans each would have a different perspective on the image hence results in different captions. Caption generation from images could be used to gain knowledge on a Deep Learning model’s perspective on the image. We build an Attention Based Image Captioning System which would preserve the model’s perspective. Captions being language specific can deter the performance of an Image Caption Generator. We therefore compare the usage of Tamil and Telugu caption for accurate Image Caption Generators. We Use Flickr 8k Dataset and translate its captions to Tamil and Telugu using Google Translation services. The Image Captioning Model used in this project is a combination of CNN and LSTM networks. CNN is used as a Feature Extractor for extracting the semantic and object information from the input image and to produce attention maps. These Information are passed into a LSTM for predicting the Image Captions.
cse99388/Image-Captioning
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|