# Use Pre-trained Computer Vision Model for Feature Extraction

There are multiple powerful models that have been already pre-trained to recognize certain patterns in images and classify images to multiple classes. Keras provides API to use these models. In this noteook I will use Inception V3.
It was pretrained for ImageNet classification (classification of images into 1000 different categories). I will not be using the final predictions, instead I will use the output of a hidden layer that is few layers away from the output. 

If I use the full output tensor (flatten to form a vector) it will be over 131,000 long. Even for my test sets that are around 11,000 dresses at most, the final matrix (combining feature predictions for each test dress) will be huge and won't fit into my computer memoory. I am instead using a smaller portion of the features that still give over 16,000 predicted features.

In [18]:
# load necessary packages
from keras.preprocessing.image import image
from keras.applications import inception_v3
from keras.models import Model
import pandas as pd
import numpy as np
import os
from PIL import Image

In [2]:
# build the model that uses input from Inception V3 model and output from its hidden layer
base_model = inception_v3.InceptionV3(weights='imagenet', include_top=False)
model = Model(inputs=base_model.input, outputs=base_model.get_layer('average_pooling2d_9').output)

In [19]:
# load test data sets to predict features for
test_data = pd.read_csv('data/test_images.csv', header=0)
test_dresses_small = pd.read_csv('data/test_dresses_small.csv', header=0)
test_dresses_large = pd.read_csv('data/test_dresses_large.csv', header=0)

In [4]:
# feature extraction for test part of the main dataset
predicted1 = []
directory = 'data/cropped_images_300x300/'
for i in range(len(test_data)):
    image_path = directory + test_data.loc[i, 'short_path']
    img = image.load_img(image_path, target_size=(299,299))
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    pred = model.predict(img)
    flat_pred = pred[0][3].flatten()
    predicted1.append(flat_pred)

In [30]:
print('Each prediction contains', len(predicted1[0]), 'features.')

Each prediction contains 16384 features.


In [20]:
# create a directory and save the predicted features
directory = 'data/predictions/'
if not os.path.isdir(directory):
    os.mkdir(directory)

np.save(directory+'test_predictions_inceptionV3.npy', np.asarray(predicted1))

In [21]:
# predict features for small test set
predicted_small = []
directory = 'data/test_dresses_small_squared/'
for i in range(len(test_dresses_small)):
    image_path = directory + 'img' + str(i) + '.png'
    img = image.load_img(image_path, target_size=(299,299))
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    pred = model.predict(img)
    flat_pred = pred[0][3].flatten()
    predicted_small.append(pred)

In [22]:
# predict features for large test set
predicted_large = []
directory = 'data/test_dresses_large_squared/'
for i in range(len(test_dresses_small)):
    image_path = directory + 'img' + str(i) + '.png'
    img = image.load_img(image_path, target_size=(299,299))
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    pred = model.predict(img)
    flat_pred = pred[0][3].flatten()
    predicted_large.append(pred)

In [23]:
# save predictions
directory = 'data/predictions/'

np.save(directory+'small_predictions_inceptionV3.npy', np.asarray(predicted_small))
np.save(directory+'large_predictions_inceptionV3.npy', np.asarray(predicted_large))

In [25]:
# save model
directory = 'data/models/'
if not os.path.isdir(directory):
    os.mkdir(directory)
model.save(directory+'InceptionV3.h5')