# Image Caption Generator
This process aims to create a model capable of auto-generating an image caption from the features within the photo. 

## Business Understanding
[...]

## Data Understanding
The dataset leveraged throughout this process is the Flickr8k dataset.  

### Resources:
1. The following [post](https://www.analyticsvidhya.com/blog/2020/11/create-your-own-image-caption-generator-using-keras/) discussing implementation of an auto-caption generator in Keras. 
2. Download the Flickr30k dataset [here](https://www.kaggle.com/datasets/adityajn105/flickr8k) 

#### Load Dependencies

In [38]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import string
import os
import glob
from nltk import word_tokenize
from PIL import Image
from time import time
from keras import Input, layers, optimizers
from keras.preprocessing import sequence, image
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import LSTM, Embedding, Dense, Activation, Flatten, Reshape, Dropout
from keras.layers.wrappers import Bidirectional
from keras.layers.merge import add
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.models import Model
from keras.utils import to_categorical

In [39]:
# set up filepath for text file
caption_path = '/Users/addingtongraham/Documents/datasets/Flickr8k/'

In [40]:
# load first few lines of captions.txt
captions = open(caption_path+'captions.txt', 'r').read()

# print first 10 lines
print(captions.split('\n')[:10])

['image,caption', '1000268201_693b08cb0e.jpg,A child in a pink dress is climbing up a set of stairs in an entry way .', '1000268201_693b08cb0e.jpg,A girl going into a wooden building .', '1000268201_693b08cb0e.jpg,A little girl climbing into a wooden playhouse .', '1000268201_693b08cb0e.jpg,A little girl climbing the stairs to her playhouse .', '1000268201_693b08cb0e.jpg,A little girl in a pink dress going into a wooden cabin .', '1001773457_577c3a7d70.jpg,A black dog and a spotted dog are fighting', '1001773457_577c3a7d70.jpg,A black dog and a tri-colored dog playing with each other on the road .', '1001773457_577c3a7d70.jpg,A black dog and a white dog with brown spots are staring at each other in the street .', '1001773457_577c3a7d70.jpg,Two dogs of different breeds looking at each other on the road .']


We get a list of our images, the captions, and the format of each image. Looking at the above, we can see that there are a number of descriptions presented for each image. For this reason, we will parse all the captions, and set up a dictionary with imageID as keys, and various captions as the values. 

We can also see that the first item in `captions.split('\n')` can be dropped as it looks like a header

In [41]:
# drop first item
split_captions = captions.split('\n')[1:]
split_captions[:10]

['1000268201_693b08cb0e.jpg,A child in a pink dress is climbing up a set of stairs in an entry way .',
 '1000268201_693b08cb0e.jpg,A girl going into a wooden building .',
 '1000268201_693b08cb0e.jpg,A little girl climbing into a wooden playhouse .',
 '1000268201_693b08cb0e.jpg,A little girl climbing the stairs to her playhouse .',
 '1000268201_693b08cb0e.jpg,A little girl in a pink dress going into a wooden cabin .',
 '1001773457_577c3a7d70.jpg,A black dog and a spotted dog are fighting',
 '1001773457_577c3a7d70.jpg,A black dog and a tri-colored dog playing with each other on the road .',
 '1001773457_577c3a7d70.jpg,A black dog and a white dog with brown spots are staring at each other in the street .',
 '1001773457_577c3a7d70.jpg,Two dogs of different breeds looking at each other on the road .',
 '1001773457_577c3a7d70.jpg,Two dogs on pavement moving toward each other .']

In [47]:
line = split_captions[1]
tokens = word_tokenize(line)
tokens[0]

'1000268201_693b08cb0e.jpg'

With the first, nonsensical item in the list removed, will want to enter the file_id as a key in a dictionary, with all descriptions held as the value in a list.

In [52]:
caption_dict = {}
for line in split_captions:
    
    # ensure line is longer than 2 chars
    if len(line) > 2:
        
        # generate tokens
        tokens = word_tokenize(line)
        img_id = tokens[0]
        img_caption = ' '.join(tokens[1:])

        # check if the id is already in the dictionary 
        if img_id not in caption_dict:

            # add to dictionary if not in already
            caption_dict[img_id] = []

        # append description to appropriate id
        caption_dict[img_id].append(img_caption)

In [59]:
# check this worked
print(caption_dict['1000268201_693b08cb0e.jpg'])
print('\n', caption_dict['1001773457_577c3a7d70.jpg'])

[', A child in a pink dress is climbing up a set of stairs in an entry way .', ', A girl going into a wooden building .', ', A little girl climbing into a wooden playhouse .', ', A little girl climbing the stairs to her playhouse .', ', A little girl in a pink dress going into a wooden cabin .']

 [', A black dog and a spotted dog are fighting', ', A black dog and a tri-colored dog playing with each other on the road .', ', A black dog and a white dog with brown spots are staring at each other in the street .', ', Two dogs of different breeds looking at each other on the road .', ', Two dogs on pavement moving toward each other .']
