# Image Captioning using Deep Learning

### CS5661 - Topics in Data Science
#### Group :- Hiralben Hirpara, Ruchita Savaliya
--------------------------------------------------------------------------------------------------------------------
In This file, we extract  features from images by using VGG16 model and store image feature data into pickle file such as encode_train.pkl for Training set, encode_test.pkl file for Testing set. 

### Import important libraries

In [2]:
from pickle import dump, load

#import numpy as np

from keras.preprocessing.image import load_img, img_to_array
from keras.applications.vgg16 import VGG16, preprocess_input
from keras.models import Model

print('All modules imported.')

All modules imported.


#### The location of the caption file, image file and pickle file

In [3]:
## Set file paths
img_dir = 'IMGCG-DataSet/Flickr8k_Dataset/Images/'
pickle_file = 'IMGCG-DataSet/Pickle File/'

### Load pickle data

In [4]:
## Load Training images path pickle file
fid = open((pickle_file+'train_img_paths.pkl'), 'rb')
train_img_paths = load(fid)
fid.close()

## Load Testing images path pickle file
fid = open((pickle_file+'test_img_paths.pkl'), 'rb')
test_img_paths = load(fid)
fid.close()


In [5]:
print("Size of Training Set Images: ",len(train_img_paths))
print("Size of Testing Set Images: ",len(test_img_paths))

Size of Training Set Images:  6000
Size of Testing Set Images:  1000


## Step 2 :  Prepare Image Data

In [6]:
## Define VGG16 model
base_model = VGG16(include_top=True,weights='imagenet')

## Model Summary
base_model.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

In [7]:
## Initialize VGG16 Model
new_input = base_model.input
hidden_layer = base_model.layers[-2].output

model = Model(new_input, hidden_layer)
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

In [9]:
%%time

# extract features from each photo in the directory   
def extract_img_feature(image_paths):
    # define dictionary
    features = dict()
    for file in image_paths:
        file = img_dir+file
        img = load_img(file, target_size=(224, 224))
        img = img_to_array(img)
        img = img.reshape((1, img.shape[0], img.shape[1], img.shape[2]))
        img = preprocess_input(img)
        # predict feature
        batch_features = model.predict(img, verbose=1)
        image_id = file.split('/')[-1]
        features[image_id] = batch_features
    return features


CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 10 µs


###  Training set  :

In [12]:
%%time
# Get unique images
encode_train = sorted(set(train_img_paths))
print("Size of Train Images: ",len(encode_train))

encode_train = extract_img_feature(encode_train)

Size of Train Images:  6000






































CPU times: user 1h 21min 46s, sys: 5min, total: 1h 26min 47s
Wall time: 15min 8s


In [16]:
print(encode_train['1000268201_693b08cb0e.jpg'])

[[2.5076473 0.        0.        ... 0.        0.        0.       ]]


#### Training Images size and shape :

In [17]:
print("Shape of one Image: ",encode_train["1000268201_693b08cb0e.jpg"].shape)

Shape of one Image:  (1, 4096)


###  Testing set  :

In [25]:
%%time
# Get unique images
encode_test = sorted(set(test_img_paths))
print("Size of Test Images: ",len(encode_test))

encode_test = extract_img_feature(encode_test)

Size of Test Images:  1000






CPU times: user 14min 43s, sys: 57.9 s, total: 15min 41s
Wall time: 4min 23s


In [22]:
encode_test['1056338697_4f7d7ce270.jpg']

array([[0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.9546603]], dtype=float32)

#### Testing Images size and shape :

In [23]:
print("Shape of one Image: ",encode_test["1056338697_4f7d7ce270.jpg"].shape)

Shape of one Image:  (1, 4096)


### Store Training set image features in pickle file :

In [125]:
dump(encode_train, open('encode_train.pkl', 'wb')) 

### Store Testing set image features in pickle file :

In [126]:
dump(encode_test, open('encode_test.pkl', 'wb')) 