Image Captioning using Recurrent Neural Networks
- In this project we use deep neural network models to caption Flickr images.
- The dataset has 8091 images and each image in this dataset has an ID and there are 5 caption for each image.
- we used pretrained bert model to get the embedings and an LSTM layer for generating.
|--------------------------------| | -------------------------------|
| pictures_input(2048,) | | captions_input(max_length,) |
|--------------------------------| | -------------------------------|
↓ ↓
|--------------------------------| | -------------------------------|
| Dropout(0.5) | | Embedding(vocab_size, 128) |
|--------------------------------| | -------------------------------|
↓ ↓
|--------------------------------| | -------------------------------|
| Dense(256, relu) | | LSTM(128) |
|--------------------------------| | -------------------------------|
↓ ↓
|--------------------------------| ↓
| Dropout(0.5) | ↓
|--------------------------------| ↓
↓ ↓
|--------------------------------| ↓
| Dense(256, relu) | ↓
|--------------------------------| ↓
↓ ↓
| --------------------------------------------------------------------|
| Concatenate |
| --------------------------------------------------------------------|
↓
| --------------------------------------------------------------------|
| Dense(128, relu) |
| --------------------------------------------------------------------|
↓
| --------------------------------------------------------------------|
| Dense(vocab_size, softmax) |
| --------------------------------------------------------------------|
↓