# Image Captioning using Deep Learning

### CS5661 - Topics in Data Science
#### Group :- Hiralben Hirpara, Ruchita Savaliya
-------------------------------------------------------------------------------------------------------------------
In This file, we extract  features from images by using InceptionV3 model and store image feature data into pickle file such as inceptionV3_encode_train.pkl for Training set, inceptionV3_encode_test.pkl file for Testing set .

### Import important libraries

In [1]:
from pickle import dump, load

#import numpy as np

from keras.preprocessing.image import load_img, img_to_array
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3, preprocess_input

print('All modules imported.')

All modules imported.


#### The location of the caption file, image file and pickle file

In [2]:
## Set file paths
img_dir = 'IMGCG-DataSet/Flickr8k_Dataset/Images/'
pickle_file = 'IMGCG-DataSet/Pickle File/'

### Load Pickle data

In [3]:
## Load Training images path pickle file
fid = open((pickle_file+'train_img_paths.pkl'), 'rb')
train_img_paths = load(fid)
fid.close()

## Load Testing images path pickle file
fid = open((pickle_file+'test_img_paths.pkl'), 'rb')
test_img_paths = load(fid)
fid.close()


In [4]:
print("Size of Training Set Images: ",len(train_img_paths))
print("Size of Testing Set Images: ",len(test_img_paths))

Size of Training Set Images:  6000
Size of Testing Set Images:  1000


## Step 2 :  Prepare Image Data

In [5]:
## Define InceptionV3 model
image_model = InceptionV3(weights='imagenet')

#image_model.summary()

## Initialize InceptionV3 Model
new_input = image_model.input
hidden_layer = image_model.layers[-2].output

model = Model(new_input, hidden_layer)
model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 149, 149, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 149, 149, 32) 96          conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (None, 149, 149, 32) 0           batch_normalization[0][0]        
_______________________________________________________________________________________

In [6]:
## Initialize InceptionV3 Model
new_input = image_model.input
hidden_layer = image_model.layers[-2].output

model = Model(new_input, hidden_layer)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 149, 149, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 149, 149, 32) 96          conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (None, 149, 149, 32) 0           batch_normalization[0][0]        
______________________________________________________________________________________________

In [7]:
%%time

# extract features from each photo in the directory    
def extract_img_feature(image_paths):
    features = dict()
    for file in image_paths:
        file = img_dir+file
        img = load_img(file, target_size=(299, 299)
        img = img_to_array(img
        img = img.reshape((1, img.shape[0], img.shape[1], img.shape[2]))
        img = preprocess_input(img)
        batch_features = model.predict(img, verbose=1)
        image_id = file.split('/')[-1]
        features[image_id] = batch_features
    return features


CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs


###  Training set  :

In [8]:
%%time
# Get unique images
iv_encode_train = sorted(set(train_img_paths))
print("Size of Train Images: ",len(iv_encode_train))

iv_encode_train = extract_img_feature(iv_train_img_paths)

Size of Train Images:  6000






































CPU times: user 40min 7s, sys: 7min 19s, total: 47min 26s
Wall time: 9min 40s


In [51]:
iv_encode_train["2513260012_03d33305cf.jpg"]

array([[0.08447915, 0.09467582, 0.09229987, ..., 0.07690592, 1.4561858 ,
        0.45119098]], dtype=float32)

#### Training Images size and shape :

In [50]:
print("Shape of one Image: ",iv_encode_train["2513260012_03d33305cf.jpg"].shape)

Shape of one Image:  (1, 2048)


###  Testing set  :

In [43]:
%%time
# Get unique images
iv_encode_test = sorted(set(test_img_paths))
print("Size of Test Images: ",len(iv_encode_test))

iv_encode_test = extract_img_feature(iv_encode_test)

Size of Test Images:  1000






CPU times: user 7min 8s, sys: 1min 11s, total: 8min 20s
Wall time: 3min 4s


In [48]:
iv_encode_test["1056338697_4f7d7ce270.jpg"]

array([[0.4530923 , 0.25752303, 0.11301485, ..., 1.1222706 , 0.34380347,
        1.0150962 ]], dtype=float32)

#### Testing Images size and shape :

In [49]:
print("Shape of one Image: ",iv_encode_test["1056338697_4f7d7ce270.jpg"].shape)

Shape of one Image:  (1, 2048)


### Store Training set image features in pickle file :

In [37]:
dump(iv_encode_train, open('inceptionV3_encode_train.pkl', 'wb'))  

### Store Testing set image features in pickle file :

In [35]:
dump(iv_encode_test, open('inceptionV3_encode_test.pkl', 'wb')) 