## DSI 15 Capstone Project

## Product Image Classification on Amazon

### Problem Statement

E-commerce sites have thousands of listings everyday and at times, the users may not be correctly classifying their uploaded image or be using the wrong product depiction. Mismatch of product listing information will decrease the effectiveness of succssful transactions and also result in unnecessary resources being utilitzed to perform these corrections on a large scale. A product detection system would help to ensure the correct listing and categorization of products or to assist the user in classifying product types. To meet this need, an image classifer will be trained and developed to accurately identify the correct image labels with the use of neural networks.

### Executive Summary

This capstone project would be aimed at building an image classification model using convulated neural networks that will identify the following 3 categories of products: clothing, footwear and watches. The source data to construct this model will be based on images scraped from Amazon, the world's largest online retailer. 

Stakeholders will be the e-commerce companies and the user of the services themselves. It will help the company improve the effectiveness of potential transactions. It will also improve the user experience with more accuracy and also to avoid problems arising from wrongly identifying products.

Metrics used to measure the performance would be the AUC & Type I / Type II errors.

Challenges foreseen would be potential imbalanced data, complex background noise or poor resolution images.  

The goal at present seems to be sufficiently scoped as there are 3 categoeires of distinct image features an the quantity of images are adequate for the purposes of this analysis. The timeline for completion tentatively end of July 2020 is still a reasonable expectation to work towards to.

## Data Extraction

Selenium and chrome webdriver is used to scrape the 3 categories in Amazon. The image formats are known to be fixed in size and hence will be saved as JPEG format. To ensure a sufficient quantity for testing and to account for possible discarding of ineligible datapoints, 10,000 images were saved from each class. They were extracted to the respective folders:
<ul>
    <li>Watch_Images</li>
    <li>Shirt_Images</li>
    <li>Footwear_Images</li>
</ul>

In [None]:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from multiprocessing import Pool
import pandas as pd
from PIL import Image
import requests
import random
import time
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
from os import listdir
from matplotlib import image
from random import shuffle 
from tqdm import tqdm
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import layers, models, Model
# tf.__version__

%matplotlib inline
%config InlineBackend.figure_format = 'svg' # 'svg', 'retina'
plt.style.use('seaborn-white')

### Source URLs & Directory Folders for Image Scrape

In [None]:
# Watches
watch_url = 'https://www.amazon.com/s?i=specialty-aps&bbn=16225019011&rh=n%3A7141123011%2Cn%3A16225019011%2Cn%3A6358539011&pf_rd_i=16225019011&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=5cd8272b-5ce4-4c26-bfcb-d6dca0c1e427&pf_rd_p=5cd8272b-5ce4-4c26-bfcb-d6dca0c1e427&pf_rd_r=CX53J0NV7EFDPJSMDE9S&pf_rd_r=CX53J0NV7EFDPJSMDE9S&pf_rd_s=merchandised-search-left-2&pf_rd_t=101&ref=AE_Men_Watches'
#watch_dir = '../Assets/Watch_Images/'
watch_dir = 'C:/Users/silve\Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/Watch_Images/'

# Shirts
shirt_url = "https://www.amazon.com/s?i=fashion-mens-intl-ship&bbn=16225019011&rh=n%3A16225019011%2Cn%3A1040658%2Cn%3A2476517011&dc&pf_rd_i=16225019011&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=554625a3-8de1-4fdc-8877-99874d353388&pf_rd_r=SK809FT75R844KJ5WXGY&pf_rd_s=merchandised-search-4&pf_rd_t=101&qid=1594157542&rnid=1040658&ref=sr_nr_n_1"
#shirt_dir = '../Assets/Shirt_Images/'
shirt_dir = 'C:/Users/silve\Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/Shirt_Images/'

# Footwear
footwear_url = "https://www.amazon.com/s?i=specialty-aps&bbn=16225019011&rh=n%3A7141123011%2Cn%3A16225019011%2Cn%3A679255011&pf_rd_i=16225019011&pf_rd_m=ATVPDKIKX0DER&pf_rd_p=5cd8272b-5ce4-4c26-bfcb-d6dca0c1e427&pf_rd_p=5cd8272b-5ce4-4c26-bfcb-d6dca0c1e427&pf_rd_r=V3G56PM79KZ6R1KBGHTK&pf_rd_r=V3G56PM79KZ6R1KBGHTK&pf_rd_s=merchandised-search-left-2&pf_rd_t=101&ref=AE_Men_Shoes"
#footwear_dir = '../Assets/Footwear_Images/'
footwear_dir = 'C:/Users/silve\Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/Footwear_Images/'

### Construct a general function to perform the web scrape (based on 10,000 image pull)

In [None]:
def amazon_scrape(url, directory):
  image_num = 0
  row = 1
  page_response = driver.get(url)
  page_content = soup(driver.page_source, 'html.parser')
  images = page_content.findAll("img",{"class":"s-image"})
  image_num += len(images)
  for i in range(len(images)):
    f = open(directory+str((row-1)*(len(images))+i)+".jpg",'wb')
    f.write(requests.get(images[i]['src']).content)
    f.close
  while(1):

    row += 1
    if(page_content.find("li",{'class':'a-last'}) != None):
      driver.find_element_by_xpath("//li[contains(@class, 'a-last')]/a").click()
      time.sleep(3)
      page_content = soup(driver.page_source, 'html.parser')
      images = page_content.findAll("img",{"class":"s-image"})
      image_num += len(images)
      for i in range(len(images)):
        f = open(directory+str((row-1)*(len(images))+i)+".jpg",'wb')
        f.write(requests.get(images[i]['src']).content)
        f.close
      if(image_num > 10000): break

### Extract Images from the 3 respective product categories

Selenium run on chromedriver will be used to extract the images. Since there are 10,000 images per category, each pull will be done in a separate operation rather than a combined loop just in a case disruptions appear.

In [None]:
# Launch the chrome selenium webdriver
driver = webdriver.Chrome("chromedriver")

In [None]:
# Extract Images from the Amazon Category of Watches
amazon_scrape(watch_url,watch_dir)

In [None]:
# Extract Images from the Amazon Category of T-shirts
amazon_scrape(shirt_url,shirt_dir)

In [None]:
# Extract Images from the Amazon Category of Footwear
amazon_scrape(footwear_url, footwear_dir)

Image data must be prepared before it can be used as the basis for modeling in image classification tasks.

One aspect of preparing image data is scaling pixel values, such as normalizing the values to the range 0-1, centering, standardization, and more.

### Load Images from Directory

path = "C:/Users/silve/Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/" 
image_size = 128
types = ["Watch_Images", "Shirt_Images", "Footwear_Images"]
labels = []
dataset = []
height = []
width = []

for i in types:
    print("Reading the  data for: ", i)
    for p in os.listdir(path + i):
        image = cv2.imread(path + i + '/' + p)
        height.append(image.shape[0])
        width.append(image.shape[1])
        image = cv2.resize(image, (image_size, image_size))  
        image = image/255.0 # This scales each value to be between 0 and 1.
        dataset.append(image) 
        labels.append(types.index(i))

print("\nNumber of Dataset {} and Number of Labels {}".format(len(dataset),len(labels))) 
print("Single Image Shape:", dataset[0].shape)

In [None]:
path = "C:/Users/silve/Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/" 
image_size = 128
channels = 3
types = ["Shirt_Images", "Shoe_Images", "Watch_Images"]
# labels = []
# images = []

total_images = 0
for root, dirs, files in os.walk(path):
    total_images += len(files)
labels = np.empty(total_images)
images = np.empty((total_images,image_size,image_size,channels))
gray_images = np.empty((total_images,image_size,image_size))
heights = np.empty(total_images)
widths = np.empty(total_images) 
count = 0

for i in types:
    print("Reading the  data for: ", i)
    for p in os.listdir(path + i):
        image = cv2.imread(path + i + '/' + p)
        heights[count] = image.shape[0]
        widths[count] = image.shape[1]
        #height.append(image.shape[0])
        #width.append(image.shape[1])
        image = cv2.resize(image, (image_size, image_size))
        #images.append(image) 
        #labels.append(types.index(i))
        gray_images[count] = (cv2.cvtColor(image, cv2.COLOR_BGR2GRAY))/255.2
        image = image/255.0
        images[count] = image
        labels[count] = types.index(i)
        
        count+=1

# print("\nNumber of Dataset {} and Number of Labels {}".format(len(images),len(labels))) 
# print("Single Image Shape:", dataset[0].shape)

In [None]:
dataset = np.array(dataset)
dataset_flattened = dataset.reshape(dataset.shape[0],-1)
labels=np.array(labels)

#Random shuffle the dataset and labels
indices = np.arange(dataset.shape[0])
np.random.shuffle(indices)
dataset = dataset[indices]
labels = labels[indices]
print(dataset.shape)
print(dataset_flattened.shape)

### Preliminary EDA

A preliminary visual inspections shows that there minimal image noise due from type inconsistency. There are images that are missclassified and out-of-category but these only comprise of a miniscule number of the data set. 
<ul>
<li>The watch images are mostly consistent and of similar representation, but there appears to be many duplicate images. This will be further investigated.</li>
<li>The shoe images are mostly clean, unique and of similar representation without much internal image noise. However, there is a mix of variations such as slippers, boots and sandals.</li>
<li>The shirt category has a mix image representation. There are variations such as long & short sleeves, hoodies, vests, singlets and also human representation for some of the images.</li>
</ul>

#### Check on Images Integrity

Random sample of images are inspected

In [None]:
index = np.random.randint(0, len(dataset) - 1, size= 10)
plt.figure(figsize=(15,15))
for i, index in enumerate(index, 1):
    img = dataset[index]
    type_ind = labels[index]
    title = types[type_ind]
    plt.subplot(4, 5, i)
    plt.title(title)
    plt.imshow(img)

In [None]:
dimensions_df = pd.DataFrame({'height':height,'width':width})
n,bins,_ = plt.hist(dimensions_df['height'],bins=5)
plt.title('Height Distribution')
plt.show()

plt.hist(dimensions_df['width'],bins=5)
plt.title('Width Distribution')
plt.show()

In [None]:
labels[:100]

### Train Test Split

In [None]:
labels_categorical = to_categorical(labels)

train_ratio = 0.75
validation_ratio = 0.15
test_ratio = 0.10
X_train, X_test, y_train, y_test = train_test_split(dataset, labels_categorical, test_size=1 - train_ratio, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=test_ratio/(test_ratio + validation_ratio), random_state=1)
print("X Train shape:", X_train.shape)
print("X Test shape:", X_test.shape)
print("X Val shape:", X_val.shape)
print("Y Train shape:", y_train.shape)
print("Y Test shape:", y_test.shape)
print("Y Val shape:", y_val.shape)

### Build the Model

In [None]:
def make_model(input_shape, num_classes):
    inputs = keras.Input(shape=input_shape)
    # Image augmentation block
    x = data_augmentation(inputs)

    # Entry block
    x = layers.experimental.preprocessing.Rescaling(1.0 / 255)(x)
    x = layers.Conv2D(32, 3, strides=2, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    x = layers.Conv2D(64, 3, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside residual

    for size in [128, 256, 512, 728]:
        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # Project residual
        residual = layers.Conv2D(size, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    x = layers.SeparableConv2D(1024, 3, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    x = layers.GlobalAveragePooling2D()(x)
    if num_classes == 2:
        activation = "sigmoid"
        units = 1
    else:
        activation = "softmax"
        units = num_classes

    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(units, activation=activation)(x)
    return keras.Model(inputs, outputs)


model = make_model(input_shape=image_size + (3,), num_classes=2)
keras.utils.plot_model(model, show_shapes=True)

Lets apply pca on our dataset and see how dataset looks like

In [None]:
pca = PCA(n_components=3)
pca_result = pca.fit_transform(dataset_flattened)

In [None]:
pca_result.shape

In [None]:
plt.figure(figsize=(10,10))
sns.scatterplot(x=pca_result[:,0], y=pca_result[:,1],hue=labels,palette=sns.color_palette("hls", 3),legend="full")

In [None]:
ax = plt.figure(figsize=(10,10)).gca(projection='3d')
ax.scatter(
    xs=pca_result[:,0], 
    ys=pca_result[:,1], 
    zs=pca_result[:,2], 
    c=labels
)
ax.set_xlabel('pca-one')
ax.set_ylabel('pca-two')
ax.set_zlabel('pca-three')
plt.show()

For above projections, we can clearly see that pca fails, Lets apply t-SNE

### Applying t-SNE

In [None]:
#tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne = TSNE(n_components=2).fit_transform(dataset_flattened)

In [None]:
plt.figure(figsize=(10,10))
sns.scatterplot(
    x=tsne[:,0], y=tsne[:,1],
    hue=labels,
    palette=sns.color_palette("hls", 3),
    legend="full",
    alpha=0.3
)

### Custom Function to Load Images from Directory

def image_load(directory, list_name):

    for filename in listdir(directory):
    
        # load image
        img_data = image.imread(directory + filename)
    
        # store loaded image
        list_name.append(img_data)
    
        print('Loaded %s %s' % (filename, img_data.shape))

### Load Images

watch_images = list()
shirt_images = list()
footwear_images = list()

Load Images
image_load(watch_dir,watch_images)
image_load(shirt_dir,shirt_images)
image_load(footwear_dir,footwear_images)

Check that all images have been loaded in
print('Watch images - {}'.format(len(watch_images)))
print('Shirt images - {}'.format(len(shirt_images)))
print('Footwear images - {}'.format(len(footwear_images)))

### Display first few images from numy array to check integrity

In [None]:
x = watch_images[:4]
_, axs = plt.subplots(nrows=1, ncols=4, figsize=(10, 10))
axs = axs.flatten()
for img, ax in zip(x, axs):
    ax.imshow(img)
plt.show()

In [None]:
y = shirt_images[:4]
_, axs = plt.subplots(nrows=1, ncols=4, figsize=(10, 10))
axs = axs.flatten()
for img, ax in zip(y, axs):
    ax.imshow(img)
plt.show()

In [None]:
z = footwear_images[:4]
_, axs = plt.subplots(nrows=1, ncols=4, figsize=(10, 10))
axs = axs.flatten()
for img, ax in zip(z, axs):
    ax.imshow(img)
plt.show()

In [None]:
watch_images
shirt_images
footwear_images

In [None]:
Reading the dataset

path = "Dataset/" 
image_size = 128
types = ["Shirt_Images", "Shoe_Images", "Watch_Images"]
labels = []
dataset = []
height = []
width = []
for i in types:
    print("Reading the  data for: ", i)
    for p in os.listdir(path + i):
        image = cv2.imread(path + i + '/' + p)
        height.append(image.shape[0])
        width.append(image.shape[1])
        image = cv2.resize(image, (image_size, image_size))
        image = image/255.0
        dataset.append(image) 
        labels.append(types.index(i))

print("\nNumber of Dataset {} and Number of Labels {}".format(len(dataset),len(labels))) 
print("Single Image Shape:", dataset[0].shape)



### Histogram of the average image resolution

In [None]:
dimensions_df = pd.DataFrame({'height':height,'width':width})

In [None]:
n,bins,_ = plt.hist(dimensions_df['height'],bins=5)
plt.show()
plt.hist(dimensions_df['width'],bins=5)
plt.show()

In [None]:
# index_of_Shirt=types.index("Shirt_Images")
# index_of_Shoe=types.index("Shoe_Images")
# index_of_Watch=types.index("Watch_Images")
# total_Shirts=labels.count_nonzero(index_of_Shirt)
# unique, shirts_counts = numpy.unique(a, return_counts=True)
# total_Shoes=labels.count(index_of_Shoe)
# total_Watches=labels.count(index_of_Watch)
# print("Total Number of Shirts: ",total_Shirts)
# print("Total Number of Shoes: ",total_Shoes)
# print("Total Number of Watches: ",total_Watches)

unique, counts = np.unique(labels, return_counts=True)
print(unique)
print(counts)

### Image Augmentation

In [None]:
image_size = (180, 180)
batch_size = 32

train_ds = tf.keras.preprocessing.image_dataset_from_directory('Dataset', validation_split=0.2, subset='training', seed=1337,image_size=image_size, batch_size=batch_size,label_mode='categorical')
val_ds = tf.keras.preprocessing.image_dataset_from_directory('Dataset', validation_split=0.2, subset='validation', seed=1337,image_size=image_size, batch_size=batch_size,label_mode='categorical')

In [None]:
image = np.fliplr(img) # Horizontal Flip
image = np.flipup(img) # Vertical Flip
image = rotate(img, angle = 45) # Rotate Angle
image = random_noise(img) # Adding Noise (from skimage library)
image = cv2.GaussianBlur(img, (11,11),0) # Blurring

In [None]:
data_augmentation = keras.Sequential([
  layers.experimental.preprocessing.RandomFlip('horizontal'),
  layers.experimental.preprocessing.RandomRotation(0.1),
])

In [None]:
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
  for i in range(9):
    augmented_images = data_augmentation(images)
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(augmented_images[0].numpy().astype('uint8'))
    plt.axis('off')

### Original, Average blur & Gaussian Blur

In [None]:
# Make dummy data for the image
a = watch_images[0] 

# Show subplots | shape: (1,3) 
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(12,4))
for i, ax in enumerate(axs.flatten()):
    plt.sca(ax)
    plt.imshow(a**(i+1), cmap=plt.cm.jet)
    
    #plt.colorbar()
    plt.title('Image: {}'.format(i+1))

#plt.tight_layout()
plt.suptitle('Overall Title')
plt.show()

In [None]:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

In [None]:
### Prepare Train 

### Change images dimensions to 224 X 224 px 

In [None]:
import os, sys
import Image

size = 128, 128

for infile in sys.argv[1:]:
    outfile = os.path.splitext(infile)[0] + ".thumbnail"
    if infile != outfile:
        try:
            im = Image.open(infile)
            im.thumbnail(size, Image.ANTIALIAS)
            im.save(outfile, "JPEG")
        except IOError:
            print "cannot create thumbnail for '%s'" % infile

In [None]:
from PIL import Image

basewidth = 300
img = Image.open('somepic.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)
img.save('sompic.jpg') 

### Image Augmentation

### View Grayscale

In [None]:
# example of saving a grayscale version of a loaded image
from PIL import Image
# load the image
image = Image.open('opera_house.jpg')
# convert the image to grayscale
gs_image = image.convert(mode='L')
# save in jpeg format
gs_image.save('opera_house_grayscale.jpg')
# load the image again and show it
image2 = Image.open('opera_house_grayscale.jpg')
# show the image
image2.show()

In [None]:
train_dir = 'C:/Users/silve\Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/Train'
test_dir = 'C:/Users/silve\Desktop/materials-master/materials-master/DSI15 Capstone Project/Assets/Test'
IMG_SIZE = 50
LR = 1e-3

In [None]:
import os

X = []
y = []
base_dir = '<full path to dataset folder>/'
for f in sorted(os.listdir(base_dir)):
    if os.path.isdir(base_dir+f):
        print(f"{f} is a target class")
        for i in sorted(os.listdir(base_dir+f)):
            print(f"{i} is an input image path")
            X.append(base_dir+f+'/'+i)
            y.append(f)
print(X)
print(y)

In [None]:
loaded_images[0]

In [None]:
for label in labels:
    dirname = train_dir + label + '\\'
    for imgfile in os.listdir(dirname):
        if(imgfile[0] == '.'):
            pass
        else:
            image = load_img(dirname + imgfile, target_size=(100, 100, 3))
            image_arr = img_to_array(image)
            image_train.append(image_arr)
            train_Y.append(label)

PCA failed due to very high features to very low, info was lost etc

In [None]:
def create_train_data():
    training_data = []
    for img in tqdm(os.listdir(TRAIN_DIR)):
        label = label_img(img)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        training_data.append([np.array(img),np.array(label)])
    shuffle(training_data)
    np.save('train_data.npy', training_data)
    return training_data

In [None]:
def process_test_data():
    testing_data = []
    for img in tqdm(os.listdir(TEST_DIR)):
        path = os.path.join(TEST_DIR,img)
        img_num = img.split('.')[0]
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        testing_data.append([np.array(img), img_num])
        
    shuffle(testing_data)
    np.save('test_data.npy', testing_data)
    return testing_data

In [None]:
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, 1], name='input')

convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')

model = tflearn.DNN(convnet, tensorboard_dir='log')

### Image Evaluation

In [None]:
from keras import backend as K
import inception_v4
import numpy as np
import cv2
import os

os.environ['CUDA_VISIBLE_DEVICES'] = ''

def preprocess_input(x):
    x = np.divide(x, 255.0)
    x = np.subtract(x, 1.0)
    x = np.multiply(x, 2.0)
    return x

# This function comes from Google's ImageNet Preprocessing Script
def central_crop(image, central_fraction):
	"""Crop the central region of the image.
	Remove the outer parts of an image but retain the central region of the image
	along each dimension. If we specify central_fraction = 0.5, this function
	returns the region marked with "X" in the below diagram.
	   --------
	  |        |
	  |  XXXX  |
	  |  XXXX  |
	  |        |   where "X" is the central 50% of the image.
	   --------
	Args:
	image: 3-D array of shape [height, width, depth]
	central_fraction: float (0, 1], fraction of size to crop
	Raises:
	ValueError: if central_crop_fraction is not within (0, 1].
	Returns:
	3-D array
	"""
	if central_fraction <= 0.0 or central_fraction > 1.0:
		raise ValueError('central_fraction must be within (0, 1]')
	if central_fraction == 1.0:
		return image

	img_shape = image.shape
	depth = img_shape[2]
	fraction_offset = int(1 / ((1 - central_fraction) / 2.0))
	bbox_h_start = np.divide(img_shape[0], fraction_offset)
	bbox_w_start = np.divide(img_shape[1], fraction_offset)

	bbox_h_size = img_shape[0] - bbox_h_start * 2
	bbox_w_size = img_shape[1] - bbox_w_start * 2

	image = image[bbox_h_start:bbox_h_start+bbox_h_size, bbox_w_start:bbox_w_start+bbox_w_size]
	return image

def get_processed_image(img_path):
	# Load image and convert from BGR to RGB
	im = np.asarray(cv2.imread(img_path))[:,:,::-1]
	im = central_crop(im, 0.875)
	im = cv2.resize(im, (299, 299))
	im = preprocess_input(im)
	if K.image_dim_ordering() == "th":
		im = np.transpose(im, (2,0,1))
		im = im.reshape(-1,3,299,299)
	else:
		im = im.reshape(-1,299,299,3)
	return im

if __name__ == "__main__":
	# Create model and load pre-trained weights
	model = inception_v4.create_model(weights='imagenet')

	# Open Class labels dictionary. (human readable label given ID)
	classes = eval(open('validation_utils/class_names.txt', 'r').read())

	# Load test image!
	img_path = 'elephant.jpg'
	img = get_processed_image(img_path)

	# Run prediction on test image
	preds = model.predict(img)
	print("Class is: " + classes[np.argmax(preds)-1])
	print("Certainty is: " + str(preds[0][np.argmax(preds)]))

In [None]:
th ~/fb.resnet.torch/main.lua -nClasses 122 -nEpochs 100 -data ~/imageclassification/train/ -save ~/Desktop/imageclassification_c122e100b30t4g1 -batchSize 30 -nThreads 4 -nGPU 1

### Image Recommender

In [None]:
# import the necessary packages
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.applications import Xception # TensorFlow ONLY
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.applications.inception_v3 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
import argparse

In [None]:
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to the input image")
ap.add_argument("-model", "--model", type=str, default="vgg16",
	help="name of pre-trained network to use")
args = vars(ap.parse_args())

In [None]:
# define a dictionary that maps model names to their classes
# inside Keras
MODELS = {
	"vgg16": VGG16,
	"vgg19": VGG19,
	"inception": InceptionV3,
	"xception": Xception, # TensorFlow ONLY
	"resnet": ResNet50
}
# esnure a valid model name was supplied via command line argument
if args["model"] not in MODELS.keys():
	raise AssertionError("The --model command line argument should "
		"be a key in the `MODELS` dictionary")

### Flask App

In [None]:
from __future__ import absolute_import, division, print_function
import os
from uuid import uuid4
from flask import Flask, request, render_template, send_from_directory
import ann
import sys
from config import *
os.environ['TF_CPP_MIN_LOG_LEVEL']='3'
sys.dont_write_bytecode=True

app = Flask(__name__)

ALLOWED_EXTENSIONS = set(['jpg', 'jpeg'])
APP_ROOT = os.path.dirname(os.path.abspath(__file__))

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

@app.route("/")
def index():
    return render_template("upload.html")

@app.route("/similar", methods=["POST"])
def upload():
    target = os.path.join(APP_ROOT, UPLOAD_FOLDER)
    print(target)
    if not os.path.isdir(target):
            os.mkdir(target)
    else:
        print("Couldn't create upload directory: {}".format(target))
    print(request.files.getlist("file"))
    for upload in request.files.getlist("file"):
        print(upload)
        print("{} is the file name".format(upload.filename))
        filename = upload.filename
        if allowed_file(filename):
            destination = "/".join([target, filename])
            print ("Accept incoming file:", filename)
            print ("Save it to:", destination)
            upload.save(destination)
            uploaded_image = UPLOAD_FOLDER+filename
            similar_images = ann.find_similar_images(uploaded_image)
            similar_images = [image.split("/")[1] for image in similar_images]
            list_images = [filename]+ similar_images 
            print(list_images)
    return render_template("similar.html", image_names=list_images)

@app.route('/upload/<filename>')
def send_uploaded(filename):
    return send_from_directory(UPLOAD_FOLDER, filename)

@app.route('/similar/<filename>')
def send_similar(filename):
    return send_from_directory(DATASET_PATH, filename)


if __name__ == "__main__":
    # app.run(port=8080, debug=True)
    app.run(host= '0.0.0.0', port=8080)