[View in Colaboratory](https://colab.research.google.com/github/SwapnilSParkhe/Project-Image_Caption_Generation/blob/master/Building&Fitting_Model.ipynb)

# Building Model Architecture and Fitting to Data

**Checking GPU status**

In [2]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 4435247333737526409, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 11288962663
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 15647989203120819664
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]

## Analytical Data Creation

**Image ID or Identifiers**

In [5]:
#Uploading relevant files from local to cloud (using google.colab lib)

#Library for Uploading data from local to cloud
from google.colab import files

#Upload train image text
files.upload()   #upload files 

#Upload valid image text
files.upload()   #upload files

{}

In [0]:
#Importing file: reading the content into Py file
def import_file(input_file):
    file=open(input_file,'r')   #creating a bridge btwn OS and Py files
    content=file.read()   #reading content via the bridge
    file.close()   #closing the bridge
    return content

imported_train=import_file('Flickr_8k.trainImages.txt') 
imported_valid=import_file('Flickr_8k.devImages.txt')    

#Creating a set of image-IDs
def create_img_set(file):
    imgID_set=list()
    for item in file.split('\n'):   #accessing line by line
        if len(item)<1:   #rejecting empty spaces
            continue
        imgID=item.split('.')[0]   #only taking imgID (rejecting 'jpg')
        imgID_set.append(imgID)   #appending imgIDs to imgID_set
    return set(imgID_set)

imgID_trainset=create_img_set(imported_train)
imgID_validset=create_img_set(imported_valid)

**Importing previously created files (from PreprocessingData NoteBook): Img desc and Img features**

In [5]:
#Uploading relevant files from local to cloud (using google.colab lib)

#Library for Uploading data from local to cloud
from google.colab import files

#Upload cleaned organised text file (from Text precprosssing step)
files.upload()   #upload files 

#Upload features or weights file (from Image preprocessing step)
files.upload()   #upload files

Saving cln_orgnse_text.txt to cln_orgnse_text.txt


Saving features.pkl to features.pkl


In [0]:
#Importing image desc files for this image data germane to training set
def import_prepro_desc(prepro_file, dataset):
    file=import_file(prepro_file)
    desc=dict()
    for item in file.split('\n'):
        tokens=item.split()   #splitting by whitespaces
        image_ID,image_desc=tokens[0],tokens[1:]   #separating ID, desc
        if image_ID in dataset:   #inner join imgID & training imgID 
            if image_ID not in desc:   #new list for new image_ID key 
                desc[image_ID]=list()
            desc_='start ' + ' '.join(image_desc)+' end'   #wrap in tokens
            desc[image_ID].append(desc_)
    return desc

desc_train=import_prepro_desc('cln_orgnse_text.txt',imgID_trainset)
desc_valid=import_prepro_desc('cln_orgnse_text.txt',imgID_validset)

#Importing image features for this image data germane to training set
from pickle import load
def import_features(feature_file, dataset):
    all_features = load(open(feature_file, 'rb'))  #load all features
    features = {k: all_features[k] for k in dataset} #inner join
    return features

feature_train=import_features('features.pkl',imgID_trainset) #used later
feature_valid=import_features('features.pkl',imgID_validset) #used later

**Training data manipulations: Creating a custom Tokeizer function: Tokenizing descriptions**

In [8]:
#Creating a simple list of desc from dict of desc
def dict2list(input_dict):
    desc_list=list()
    for key in input_dict.keys():
        [desc_list.append(d) for d in input_dict[key]]
    return desc_list

desc_train_list=dict2list(desc_train)

#tokeinizing (could be improved by filetring english stopwords later)
#Note: turning each text into sequence of integers (integer: token ID)
from keras.preprocessing.text import Tokenizer
def tokenize(input_list):
    tokenizer=Tokenizer()
    tokenizer.fit_on_texts(input_list)
    return tokenizer

tokenizer=tokenize(desc_train_list) #to be used later
vocab_size=len(tokenizer.word_index)+1 #to be used later
print("Vocab Size:",vocab_size)

#Length of the description with the most words
def max_length(desc_list):
    max_len=max([len(item.split()) for item in desc_list])
    return max_len
max_length = max_length(desc_train_list) #to be used later
print('Description Length', max_length)

#Longest desc check
def longest_desc(desc_list):
    max_length=max([len(item.split()) for item in desc_list])
    print("Max_len:",max_length)
    print("Desc:", [item for item in desc_list if len(item.split())==max_length])

longest_desc(desc_train_list)

Using TensorFlow backend.


Vocab Size: 7264
Description Length 33
Max_len: 33
Desc: ['start an man wearing green sweatshirt and blue vest is holding up dollar bills in front of his face while standing on busy sidewalk in front of group of men playing instruments end']


**LSTM's Analytical Dataset: Input(ImageID and Seq_item)-Ouput(SeqWord) data**

In [0]:
#Creating ADS for LSTM: Input(Image_ID and Seq_item)-Ouput(SeqWord)
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
import numpy as np
def create_ADS(tokenizer, max_length, desc_list, img):
    X_img_ID, X_desc_item, y=list(), list(), list()
    for desc in desc_list:
        seq=tokenizer.texts_to_sequences([desc])[0] #encoding seq
        for i in range(1,len(seq)):#split seq into multi X,y pairs
            in_seq, out_seq=seq[:i], seq[i] #desc input-output pair
            in_seq=pad_sequences([in_seq], maxlen=max_length)[0]
            out_seq=to_categorical([out_seq], num_classes=vocab_size)[0]
            X_img_ID.append(img) #appending  img IDs
            X_desc_item.append(in_seq)  #multi X-y pairs encoding
            y.append(out_seq)   #oneHot encoded version of output word
    return np.array(X_img_ID), np.array(X_desc_item), np.array(y)

#Progressive Data Loading: Generate data (yield one photo’s data/batch) 
#Note: intended to be used in a call to model.fit_generator()
def generate_data(tokenizer, max_length, desc_dict, img):
    while 1:   #loop for ever over images
        for key, desc_list in desc_dict.items(): #access image feature
            img_ = img[key][0]  #image ID
            in_img,in_seq,out_word=create_ADS(tokenizer,
                                              max_length,
                                              desc_list,img_)
            yield [[in_img, in_seq], out_word]

## Defining the model (Merge Model of Embeddings+LSTMs with CNN penultimate layer)
**Note:** Combines both the encoded form (features) of the image input with the encoded form (context) of the text description generated so far; Combination of these two encoded inputs is then used by a very simple decoder model to generate the next word in the sequence

![Merge Model of Image Captioning](https://i.pinimg.com/originals/35/8b/dc/358bdc11e71f8c78632560c7c819919d.png)

**Importing relevant libraries**

In [0]:
from keras.layers import Input, Dropout, Dense #feat. encoding
from keras.layers import Embedding, Dropout, LSTM #desc. encoding
from keras.layers.merge import add #decoding
from keras.models import Model #Model-Input-Output architecture

In [11]:
def build_model_arch(vocab_size, max_length):
    #Encoder Models (Img-Feat and Desc Encoding)
    #1.Image feature extractor model
    feat_input=Input(shape=(4096,))
    feat_1=Dropout(0.5)(feat_input)
    feat_2=Dense(256, activation='relu')(feat_1)

    #2.Embedding+LSTM sequence model
    desc_input=Input(shape=(max_length,))
    desc_1=Embedding(vocab_size, 256, mask_zero=True)(desc_input)
    desc_2=Dropout(0.5)(desc_1)
    desc_3=LSTM(256)(desc_2)

    #Decoder Model ('adding' above encoding model layers; with FFNs)
    deco_1=add([feat_2, desc_3]) #adding element wise for both vectors
    deco_2=Dense(256, activation='relu')(deco_1)
    output=Dense(vocab_size, activation='softmax')(deco_2)

    #Creating Model-Input-Output architecture; Compiling (with loss, opt.)
    model=Model(inputs=[feat_input, desc_input], outputs=output)
    model.compile(loss='categorical_crossentropy', optimizer='adam')

    #Summarizing and Plotting model
    print(model.summary())
    return model

model = build_model_arch(vocab_size, max_length)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 33)           0                                            
__________________________________________________________________________________________________
input_1 (InputLayer)            (None, 4096)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 33, 256)      1859584     input_2[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 4096)         0           input_1[0][0]                    
__________________________________________________________________________________________________
dropout_2 

## Fitting Model

In [12]:
#Defining checkpoint callback; specifying model hyperparams
from keras.callbacks import ModelCheckpoint
filepath = 'best_model_weights.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, 
                             save_best_only=True, mode='min')
epochs_=10
steps_train=len(desc_train)  #steps=N/batch_size
steps_valid=len(desc_valid)  #steps=N/batch_size

#Fitting model to generated data (along side validation loss checks)
generated_data_train=generate_data(tokenizer, max_length, 
                                   desc_train, feature_train)
generated_data_valid=generate_data(tokenizer, max_length, 
                                   desc_valid, feature_valid)
model.fit_generator(generated_data_train, epochs=epochs_,
                    steps_per_epoch=steps_train,
                    validation_data=generated_data_valid,
                    validation_steps=steps_valid,
                    callbacks=[checkpoint], verbose=1)

#Downloading best model
from google.colab import files
files.download('best_model_weights.h5')

Epoch 1/10


Epoch 00001: val_loss improved from inf to 4.11764, saving model to best_model_weights.h5
Epoch 2/10
 296/6000 [>.............................] - ETA: 11:49 - loss: 4.0182




Epoch 00002: val_loss improved from 4.11764 to 3.91799, saving model to best_model_weights.h5
Epoch 3/10
 567/6000 [=>............................] - ETA: 11:13 - loss: 3.7551




Epoch 00003: val_loss improved from 3.91799 to 3.85186, saving model to best_model_weights.h5
Epoch 4/10
 567/6000 [=>............................] - ETA: 11:12 - loss: 3.5764




Epoch 00004: val_loss improved from 3.85186 to 3.82874, saving model to best_model_weights.h5
Epoch 5/10
 567/6000 [=>............................] - ETA: 11:12 - loss: 3.4353




Epoch 00005: val_loss did not improve
Epoch 6/10
 833/6000 [===>..........................] - ETA: 10:38 - loss: 3.3712




Epoch 00006: val_loss did not improve
Epoch 7/10
 833/6000 [===>..........................] - ETA: 10:43 - loss: 3.3277




Epoch 00007: val_loss did not improve
Epoch 8/10
 833/6000 [===>..........................] - ETA: 10:40 - loss: 3.2781




Epoch 00008: val_loss did not improve
Epoch 9/10
 833/6000 [===>..........................] - ETA: 10:36 - loss: 3.2479




Epoch 00009: val_loss did not improve
Epoch 10/10
 826/6000 [===>..........................] - ETA: 10:35 - loss: 3.2285




Epoch 00010: val_loss did not improve


MessageError: ignored

In [0]:
#Training manually (if above doesnt work)
epochs=20
steps=len(desc_train)  #steps=N/batch_size
for epoch in range(epochs):
    generated_data_train=generate_data(tokenizer, max_length, 
                                       desc_train, feature_train)
    generated_data_valid=generate_data(tokenizer, max_length, 
                                       desc_valid, feature_valid)
    model.fit_generator(generated_data_train,steps_per_epoch=steps, 
                        validation_data= generated_data_valid,
                        epochs=1, verbose=1)
    model.save('model_' + str(epoch) + '.h5')

In [0]:
#Downloading best model
from google.colab import files
files.download('best_model_weights.h5')

In [2]:
ls

[0m[01;34mdatalab[0m/


In [4]:
ls

best_model_weights.h5  Flickr_8k.devImages (1).txt
cln_orgnse_text.txt    Flickr_8k.devImages.txt
[0m[01;34mdatalab[0m/               Flickr_8k.trainImages (1).txt
features.pkl           Flickr_8k.trainImages.txt
