## Theme-based Aesthetic Image Assessment

With the surge of digital device and smart phones, billions of photos are collected everyday. For people who cares about the beauty or the feeling of photos, they hope to be instructed when they are taking or modifying photos to enhance aesthetic values. Also, people love to share photos and watch others' photos in social networks they hope to share and watch the high-quality photos and got mediocre photos filtered out. Therefore, a trained 'aesthetic evaluator' will give them a pre-judgement about the 'beauty' of the photos, help them create, select best photos.

## Project Content
In this project, We will collect image dataset first, and then implement 2 feature extraction approaches, convolutional neural network(CNN) with triple loss function and VGG pretrained network. All these featurets would be test on SVM classfier and compared with each other in the end.

1. Data Preparation 
2. CNN with Triplet loss function
4. Summary 

## Requirements 

In [None]:
import mimetypes, httplib, time, sys, os
import unittest
import urllib2
import cv2
import os
from bs4 import BeautifulSoup
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline

## Data Preparation 

According to the criteria, we chose the photos in Photo.net to train and test the algorithm. Photo.net is a great online photo sharing community with over 400,000 active photographers and they’re constantly doing peer-rating for each other’s works.

To fetch the image data from photo.net, we have parsed the html page of photo.net, collected photos ids and scores information and downloaded image by id. In addition, the size of images in photo.net are different, we have croped and scaled images to have same size.

we choosed to have images in travel and lanscape in our category

In [None]:
def getPage(url):
    
    #Browser Information
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    headers = { 'User-Agent' : user_agent }

    request = urllib2.Request(url,headers = headers)

    #Read response and open the url
    response = urllib2.urlopen(request)

    #Get web page HTML code and decode it
    pageCode = response.read().decode('utf-8')

    return pageCode
def parse_page(html,ids,rates):
    soup = BeautifulSoup(html,"html.parser")
    photos = soup.find_all("div", attrs = {'class':'trp_photo'})
    for photo in photos:
        id = photo.find('a')['href'].split('=')[1]
        rate = str(photo.find('div',attrs={'class':'trp-details'}))
        l = rate.find("<strong>Rating:</strong>")
        r = rate.find("<br>",l)
        rate = rate[l:r].split(' ')[-1]
        if(len(rate)<2):
            continue
        if id in ids:
            continue
        ids.append(id)
        rates.append(float(rate))
    return (ids,rates)
    pass

ids = []
rates = []
for page_index in range(0,3000,12):
    for category in ['Travel','Landscape']:
        for period in ['90','365','365-1','365-2','365-3','365-4','365-5','5000']:
            url = "http://photo.net/gallery/photocritique/filter?period="+ period 
            url += "&rank_by=avg&category=" + category + "&start_index=" + str(page_index) 
            url += "&store_prefs_p=1&shown_tab=1&page=Next"
            html = getPage(url)
            [ids,rates] = parse_page(html,ids,rates)
print("number of different image ids obtained: ",len(ids))

After crawling the images, we visualize the score distribution to have a brief understanding of data.

In [None]:
#score distribution visualization
scores = rates
rate = {}
for score in scores:
    score = int(score*10)
    if score not in rate:
        rate[score] = 0
    rate[score] += 1
sd = sorted(rate.items())
X = []
Y = []
for a,b in sd:
    X.append(a/10.0)
    Y.append(b)
plt.xlabel('Score')
plt.ylabel('Count')
plt.title('Score distribution of imgs by photo.net')
plt.plot(X,Y)
plt.savefig('ds_all.png')
plt.show()

## Download image data
To better prepare the datasets used for training and classification, we only collected images from two tails. Specifically, we have 12% negative examples from least rated side and another 12% examples from high rated side. Therefore, we define images with rating score below 4.0 as “negative” image, and images with scores over 6.0 as ”positive” images.

In [None]:
# download photos
for i in range(len(ids)):
    id = ids[i]
    score = rates[i]
    if (score<6.0 and score>4.0):
        continue
    url = 'http://gallery.photo.net/photo/'+id+'-md.jpg'
    urllib.urlretrieve(url,"dataset/"+id+"_"+str(score)+".jpeg")

## Resizing and Extracting names from file names
We observed different size in different images. Therefore we have cropped images into a square as 96 x 96 pixels image. Also, we parsed the labels from filenames.

In [None]:
def crop_and_scale_image(im):
    if im.mode is not 'RGB':
        im = im.convert('RGB')
    width,height = im.size
    if width > height:
        diff = width - height
        box = diff/2, 0, width - (diff - diff/2), height
    else:
        diff = height - width
        box = 0, diff/2, width, height - (diff - diff/2)
    im = im.crop(box)
    toSize = 96,96
    im= im.resize(toSize, Image.ANTIALIAS)
    return im

def fnames_to_labels(fnames):
    res = []
    for fname in fnames:
        score = float(fname.split('_')[1].split('.jpeg')[0])
        if score > 5:
            res.append(1)
        else:
            res.append(-1)
    return np.asarray(res)

## Splitting data into Train / Validate / Test splits
Next we have loaded all the 9047 images (4503 pos and 4544 neg) and perform our usual data split. We splited these photos into three categories - 1000 for validation, 500 for testing, 7547 for training.

dataset is a folder that has all the images downloaded from the photo.net 

In [None]:
import random

dname = "dataset/"
im_paths = [dname+fname for fname in os.listdir(dname) if fname.endswith(".jpeg")]
im_paths = np.array(im_paths)
random.shuffle(im_paths)
im_labels = fnames_to_labels(im_paths)

im_paths_pos = im_paths[im_labels>0]
im_paths_neg = im_paths[im_labels<0]
print(len(im_paths_pos),len(im_paths_neg))
fnames_te = np.concatenate((im_paths_pos[0:250], im_paths_neg[:250]))
fnames_va = np.concatenate((im_paths_pos[250:750], im_paths_neg[250:750]))
fnames_tr = np.concatenate((im_paths_pos[750:len(im_paths_neg)], im_paths_neg[750:]))
print ("Train data size: ",len(fnames_tr))
print ("Validation data size: ",len(fnames_va))
print ("Test data size: ",len(fnames_te))

## CNN with  loss function
Inspired from Understanding Aesthetics with Deep Learning, a journal form NVIDIA, the challenging problem can be approached by training the Convolution Neural Network of the defined the triplet loss:

Triple loss =  $max(0, c + Dist( \phi(I_1), \phi(I_2)) - Dist(\phi(I_1), \phi(I_3) )$
 
The objective of this loss function is let CNN learn the similarity between high-quality images, and learn the difference between high-quality and mediocre images. Therefore, by training the network, we hope to learn a feature representation  ϕ(.)ϕ(.)  that the feature distance between two high-quality images is smaller than the feature distance between one high-quality image and one low-quality image. Also, by introducing the margin  cc , we can train the network such that the distance between  ϕ(I1)ϕ(I1)  and  ϕ(I3)ϕ(I3)  is greater by c than the distance between  ϕ(I1)ϕ(I1)  and  ϕ(I2)ϕ(I2) .

It is noticeable we are not going to learn the classifier by the triplet loss function and deep network, we are learning features representation method  ϕ(.)ϕ(.) . Once we learned  ϕ(.)ϕ(.) , we will use SVM to train the classifier using the feature representation of images.


## Requirements

In [None]:
import tensorflow as tf 
import numpy as np      
from PIL import Image   
import random           
import time             
from sklearn import svm 
import logging          

dataset = './dataset4'
logging.basicConfig(filename='cnnlog017.log',level=logging.DEBUG)

In [None]:
# Reading train set                                             
posImg_ids_Tr = []                                              
posImg_scores_Tr = []                                           
negImg_ids_Tr = []                                              
negImg_scores_Tr = []                                           
with open( dataset + '/ratingsInfo/train.csv', 'r') as f:       
    f.readline()                                                
    for line in f.readlines():                                  
        img_id,img_score,img_label = line.strip().split(',')    
        if float(img_label) > 0:                                
            posImg_ids_Tr.append(img_id)                        
            posImg_scores_Tr.append(img_score)                  
        else:                                                   
            negImg_ids_Tr.append(img_id)                        
            negImg_scores_Tr.append(img_score)                  
print ("Train Set Positive Images:",len(posImg_ids_Tr))
print ("Train Set Negative Images:",len(negImg_ids_Tr))                                                   
# Reading validation set                                        
posImg_ids_Va = []                                              
posImg_scores_Va = []                                           
negImg_ids_Va = []                                              
negImg_scores_Va = []                                           
with open(dataset + '/ratingsInfo/validation.csv', 'r') as f:   
    f.readline()                                                
    for line in f.readlines():                                  
        img_id,img_score,img_label = line.strip().split(',')    
        if float(img_label) > 0:                                
            posImg_ids_Va.append(img_id)                        
            posImg_scores_Va.append(img_score)                  
        else:                                                   
            negImg_ids_Va.append(img_id)                        
            negImg_scores_Va.append(img_score)                  
            
print ("Validation Set Positive Images: ",len(posImg_ids_Va))
print ("Validation Set Negative Images: ", len(negImg_ids_Va))

# Reading the test set                                          
posImg_ids_Te = []                                              
negImg_ids_Te = []                                              
posImg_scores_Te = []                                           
negImg_scores_Te = []                                           
with open(dataset + '/ratingsInfo/test.csv', 'r') as f:         
    f.readline()                                                
    for line in f.readlines():                                  
        img_id,img_score,img_label = line.strip().split(',')    
        if float(img_label) > 0:                                
            posImg_ids_Te.append(img_id)                        
            posImg_scores_Te.append(img_score)                  
        else:                                                   
            negImg_ids_Te.append(img_id)                        
            negImg_scores_Te.append(img_score)    

print ("Test Set Positive Images: ", len(posImg_ids_Te))
print ("Test Set Negative Images: ", len(negImg_ids_Te))

In [None]:
def readImgs_intoNpArray(path, image_id_list, image_score_list):
    filenames = []
    for img_id, img_score in zip(image_id_list, image_score_list):
        name = path + img_id + '_' + img_score + '.jpeg'
        filenames.append(name)
    imgs = np.array([np.array(Image.open(fname)) for fname in filenames])
    return imgs


path_Tr = dataset + '/train/'                                              
pos_img_Tr = readImgs_intoNpArray(path_Tr, posImg_ids_Tr, posImg_scores_Tr)
neg_img_Tr = readImgs_intoNpArray(path_Tr, negImg_ids_Tr, negImg_scores_Tr)
anchor_img_Tr = readImgs_intoNpArray(path_Tr, posImg_ids_Tr, posImg_scores_Tr)
pos_list_Tr = range(len(pos_img_Tr))                                       
neg_list_Tr = range(len(neg_img_Tr))                                       
                                                                           
path_Va = dataset + '/validation/'                                         
pos_img_Va = readImgs_intoNpArray(path_Va, posImg_ids_Va, posImg_scores_Va)
neg_img_Va = readImgs_intoNpArray(path_Va, negImg_ids_Va, negImg_scores_Va)
                                                                           
path_Te = dataset + '/test/'                                               
pos_img_Te = readImgs_intoNpArray(path_Te, posImg_ids_Te, posImg_scores_Te)
neg_img_Te = readImgs_intoNpArray(path_Te, negImg_ids_Te, negImg_scores_Te)

print ("Positive Training Images size ", pos_img_Tr.shape)
print ("Negative Training Images size ", neg_img_Tr.shape)

print ("Positive Validation Images size ", pos_img_Va.shape)
print ("Negative Training Images size ", neg_img_Va.shape)

print ("Positive Test Images size ", pos_img_Te.shape)
print ("Negative Test Images size ", neg_img_Te.shape)


In [None]:
def weight_variable(shape):                                       
  initial = tf.truncated_normal(shape, stddev=0.1)                
  return tf.Variable(initial)                                     
                                                                  
def bias_variable(shape):                                         
  initial = tf.constant(0.1, shape=shape)                         
  return tf.Variable(initial)                                     
                                                                  
def max_pool_2x2(x):                                              
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                    
                        strides=[1, 2, 2, 1], padding='SAME')     
def avg_pool_2x2(x):                                              
  return tf.nn.avg_pool(x, ksize=[1, 2, 2, 1],                    
                        strides=[1, 2, 2, 1], padding='SAME')  

def predict_from_features(X_train, y, X_test, SVMReg = 1):                                              
                                                                                                        
    # C = 1 # SVM regularization parameter                                                              
                                                                                                        
    lin_svc = svm.SVC(kernel='linear', C=SVMReg, verbose=True, max_iter= 1000).fit(X_train, y)       
                                                                                                        
    Te_res = lin_svc.predict(X_test)                                                                    
    Tr_res = lin_svc.predict(X_train)                                                                   
    return Te_res, Tr_res                                                                               
                                                                                                        
# featuresRes1: features positive                                                                       
# featuresRes2: features negative                                                                       
def doSVM(featuresRes_pos_Tr,featuresRes_neg_Tr, featuresRes_pos_Te, featuresRes_neg_Te, SVMReg = 1):   
    TrainExamplesNum = len(featuresRes_pos_Tr) / 2;                                                     
    posTrainX =featuresRes_pos_Tr[:TrainExamplesNum,:]                                                  
    posTrainy = np.ones( len(posTrainX) )                                                               
                                                                                                        
    posTestX =featuresRes_pos_Te                                                                        
    posTesty = np.ones( len(posTestX) )                                                                 
                                                                                                                                                                       
    negTrainX = featuresRes_neg_Tr                                                                      
    negTrainy = -1 * np.ones( len(negTrainX) )                                                          
    negTestX = featuresRes_neg_Te                                                                       
    negTesty = -1 * np.ones( len(negTestX) )                                                            
                                                                                                        
    trainX =  np.concatenate((posTrainX, negTrainX), axis=0)                                            
    trainX = trainX.astype(float)                                                                       
    trainy =  np.concatenate((posTrainy, negTrainy))                                                    
                                                                                                        
    testX =  np.concatenate((posTestX, negTestX), axis=0)                                               
    testX = testX.astype(float)                                                                         
    testy =  np.concatenate((posTesty, negTesty))                                                       
                                                                                                        
    start = time.time()                                                                                 
    y_p, y_p_tr = predict_from_features(trainX, trainy, testX, SVMReg)                                  
    # np.savetxt("predict.csv", y_p, delimiter=',')                                                     
    # y_p_tr = predict_from_features(trainX, trainy, testX, max_iter)                                   
    end = time.time()                                                                                   
    recallT = 0.0                                                                                       
    recallF = 0.0                                                                                       
    try:                                                                                                
        for i in range(len(testy)):                                                                     
            if (y_p[i] == testy[i] and y_p[i]>0):                                                       
                recallT += 1                                                                            
            if (y_p[i] == testy[i] and y_p[i]<0):                                                       
                recallF += 1                                                                            
        recall = [recallT/(testy>0).sum(),recallF/(testy<0).sum()]                                      
    except:                                                                                             
        recall = []                                                                                     
                                                                                                        
    train_acc = np.mean(y_p_tr == trainy)                                                                                                                                                                    
    return np.mean(y_p == testy), end-start, recall, y_p, train_acc                                     

def RunAndTest( pos_img, neg_img, anchor_img, pos_list, neg_list, pos_img_Te, ne_img_Te, pos_img_Va, neg_img_Va, margin=1000, keepProb = 0.2, learningRate=1e-4, iter_time=200, batch_size = 16, SVMReg = 1):
                                                                                                                                                                                                             
    sess = tf.InteractiveSession()                                                                                                                                                                           
    width = 96                                                                                                                                                                                               
    height = 96                                                                                                                                                                                              
    channel = 3                                                                                                                                                                                              
                                                                                                                                                                                                             
    anchor_input = tf.placeholder(tf.float32, [None, width, height, channel])                                                                                                                                
    positive_input = tf.placeholder(tf.float32, [None, width, height, channel])                                                                                                                              
    negative_input = tf.placeholder(tf.float32, [None, width, height, channel])                                                                                                                              
    global_step = tf.Variable(0, trainable=False)                                                                                                                                                            
                                                                                                                                                                                                             
                                                                                                                                                                                                             
    # Initialize variables                                                                                                                                                                                   
    W_conv1 = weight_variable([5, 5, 3, 32])                                                                                                                                                                 
    b_conv1 = bias_variable([32])                                                                                                                                                                            
                                                                                                                                                                                                             
    W_conv2 = weight_variable([5, 5, 32, 64])                                                                                                                                                                
    b_conv2 = weight_variable([64])                                                                                                                                                                          
                                                                                                                                                                                                             
    W_conv3 = weight_variable([3, 3, 64, 128])                                                                                                                                                               
    b_conv3 = weight_variable([128])                                                                                                                                                                         
                                                                                                                                                                                                             
    W_conv4 = weight_variable([3, 3, 128, 256])                                                                                                                                                              
    b_conv4 = weight_variable([256])                                                                                                                                                                         
                                                                                                                                                                                                             
    W_conv5 = weight_variable([3, 3, 256, 256])                                                                                                                                                              
    b_conv5 = weight_variable([256])                                                                                                                                                                         
                                                                                                                                                                                                             
    W_fcl = weight_variable([3 * 3 * 256, 1024])                                                                                                                                                             
    b_fcl = weight_variable([1024])                                                                                                                                                                          
    fcl = []                                                                                                                                                                                                 
    flat_list = []                                                                                                                                                                                           
    for train_img in [anchor_input, positive_input, negative_input]:                                                                                                                                         
        # CNN1                                                                                                                                                                                               
        h_conv1 = tf.nn.relu(tf.nn.conv2d(train_img, W_conv1, [1,1,1,1], 'SAME') + b_conv1)                                                                                                                  
        h_pool1 = max_pool_2x2(h_conv1)                                                                                                                                                                      
                                                                                                                                                                                                             
        # CNN2                                                                                                                                                                                               
        h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, [1,1,1,1], 'SAME') + b_conv2)                                                                                                                    
        h_pool2 = max_pool_2x2(h_conv2)                                                                                                                                                                      
                                                                                                                                                                                                             
        # CNN3                                                                                                                                                                                               
        h_conv3 = tf.nn.relu(tf.nn.conv2d(h_pool2, W_conv3, [1,1,1,1], 'SAME') + b_conv3)                                                                                                                    
        h_pool3 = max_pool_2x2(h_conv3)                                                                                                                                                                      
                                                                                                                                                                                                             
        # CNN4                                                                                                                                                                                               
        h_conv4 = tf.nn.relu(tf.nn.conv2d(h_pool3, W_conv4, [1,1,1,1], 'SAME') + b_conv4)                                                                                                                    
        h_pool4 = max_pool_2x2(h_conv4)                                                                                                                                                                      
                                                                                                                                                                                                             
        # CNN5                                                                                                                                                                                               
        h_conv5 = tf.nn.relu(tf.nn.conv2d(h_pool4, W_conv5, [1,1,1,1], 'SAME') + b_conv5)                                                                                                                    
        h_pool5 = avg_pool_2x2(h_conv5)                                                                                                                                                                      
                                                                                                                                                                                                             
        # flatten the CNN                                                                                                                                                                                    
        h_pool5_flat = tf.reshape(h_pool5, [-1, 3 * 3 * 256])                                                                                                                                                
                                                                                                                                                                                                             
        h_fcl = tf.nn.relu(tf.matmul(h_pool5_flat, W_fcl) + b_fcl)                                                                                                                                           
        h_fcl = tf.nn.dropout(h_fcl, keep_prob=keepProb)                                                                                                                                                     
                                                                                                                                                                                                             
        fcl.append(h_fcl)                                                                                                                                                                                    
        flat_list.append(h_pool5_flat)                                                                                                                                                                       
                                                                                                                                                                                                             
    anchor_image = fcl[0]                                                                                                                                                                                    
    positive_image = fcl[1]                                                                                                                                                                                  
    negative_image = fcl[2]                                                                                                                                                                                  
    d_pos = tf.reduce_sum(tf.square(anchor_image - positive_image), 1)                                                                                                                                       
    d_neg = tf.reduce_sum(tf.square(anchor_image - negative_image), 1)                                                                                                                                       
                                                                                                                                                                                                             
    triplet_loss_val = tf.maximum(0., margin + d_pos - d_neg)                                                                                                                                                
    triplet_loss = tf.reduce_mean(triplet_loss_val)                                                                                                                                                          
                                                                                                                                                                                                             
    decay_lr = tf.train.exponential_decay(learningRate, global_step, 100, 0.975, staircase=True)                                                                                                             
    train_step = tf.train.AdamOptimizer(decay_lr).minimize(triplet_loss, global_step=global_step)                                                                                                            
                                                                                                                                                                                                             
    sess.run(tf.global_variables_initializer())                                                                                                                                                              
    print( "start AdamOptimizer..." )                                                                                                                                                                                    
                                                                                                                                                                                                             
    # Iteration begins                                                                                                                                                                                       
    for i in range(iter_time):                                                                                                                                                                               
        # Select train batch                                                                                                                                                                                 
        batch_anchor = anchor_img[random.sample(pos_list, batch_size)]                                                                                                                                       
        batch_pos = pos_img[random.sample(pos_list, batch_size)]                                                                                                                                             
        batch_neg = neg_img[random.sample(neg_list, batch_size)]                                                                                                                                             
                                                                                                                                                                                                             
        # Train a step                                                                                                                                                                                       
        train_step.run(feed_dict={anchor_input: batch_anchor, positive_input: batch_pos, negative_input: batch_neg})                                                                                         
                                                                                                                                                                                                             
        lossVal = triplet_loss.eval( feed_dict={anchor_input: batch_anchor, positive_input: batch_pos, negative_input: batch_neg} )                                                                          
        print ("lossVal: ", lossVal)                                                                                                                                                                           
                                                                                                                                                                                                             
        if i is not 1 and i % 5000 is 1:                                                                                                                                                                     
            print ("iteration at: " + str(i))                                                                                                                                                                 
            print ("Start Evaluate the feature...")                                                                                                                                                          
                                                                                                                                                                                                             
            # Train data features                                                                                                                                                                            
            # positive Train Features                                                                                                                                                                        
            featuresRes_pos_Tr = fcl[1].eval( feed_dict={anchor_input: pos_img, positive_input: pos_img, negative_input: neg_img })                                                                          
            # negative Train Features                                                                                                                                                                        
            featuresRes_neg_Tr = fcl[2].eval( feed_dict={anchor_input: pos_img, positive_input: pos_img, negative_input: neg_img})                                                                           
                                                                                                                                                                                                             
            # positive validation features                                                                                                                                                                   
            featuresRes_pos_Va = fcl[1].eval( feed_dict={anchor_input: pos_img_Va, positive_input: pos_img_Va, negative_input: neg_img_Va} )                                                                 
            # negative validation features                                                                                                                                                                   
            featuresRes_neg_Va = fcl[2].eval( feed_dict={anchor_input: pos_img_Va, positive_input: pos_img_Va, negative_input: neg_img_Va})                                                                  
                                                                                                                                                                                                             
            print ("start SVM...")                                                                                                                                                                          
            Testacc, SVMtime, Recall, y_p, train_acc = doSVM(featuresRes_pos_Tr, featuresRes_neg_Tr, featuresRes_pos_Va, featuresRes_neg_Va, SVMReg)                                                         
            logging.info("At iteration: " + str(i) + " loss value: " + str(lossVal) + " recall: " + str(Recall) + " Train Accuracy: " + str(train_acc) + ", Test Accuracy: " + str(Testacc) )                

In [None]:
margins  = [1, 10]
batchSizes = [16,32]
dropOutProbs = [0.01, 0.2]
learnRates = [5e-4, 2e-4]
SVMRegularizations = [0.1,1]
iterationTime = 10001
for i in range(100):
    margin = random.choice(margins)
    batch_size = random.choice(batchSizes)
    dropOutProb = random.choice(dropOutProbs)
    learnRate = random.choice(learnRates)
    SVMreg = random.choice(SVMRegularizations)
    logging.info( "time: " + str(i) + "margin: " + str(margin) + " batchSize: " + str(batch_size) + " dropOut: " + str(dropOutProb) + " learnRate: " + str(learnRate) + " SVM REG: "  + str(SVMreg) )
    testErr = RunAndTest(pos_img_Tr, neg_img_Tr, anchor_img_Tr, pos_list_Tr, neg_list_Tr, pos_img_Te, neg_img_Te, pos_img_Va, neg_img_Va, margin=margin, keepProb=dropOutProb, learningRate=learnRate, iter_time= iterationTime, batch_size = batch_size, SVMReg = SVMreg)

## CNN with loss function result analysis
The result from CNN network can achieve 66.8% accuracy, it demonstrates the effectiveness of the feature selection by using triplet loss to train the convolutional network, but it is still not as good as expected, through we have paid lots of efforts on tuning the parameters and optimizing the network structure. We have the summaries that the reasons might be:

1. Structure of network: The convolution network is not deep and wide enough to extract best features.  With the network only have 5 convolutional layers and 1 fully connected layer, the best features are not easily extracted.

2. Parameters tuning: Selecting the right parameters actually takes lots of more time and resources. We do have the implemented the stochastic validation to find the good parameters, but it is not enough. The choice set of each parameter are come from our experiments and experience, in reality we need to try much more different parameters. Also, we need to do more runs and validations to find the best parametres.

3. Network Optimization: There are also lots of things to tune in the network. e.g. In convolutional layers, the size of kernel, the pooling methods, the activation function, the number of channels etc. In whole network, the number of convolutional or fully connected layers, the number of neurons in fully connected layers, etc.

To achieve the best performance there are lots of work to do, unfortunately, at this time we are not able to do all of them due to time and resources limit. We implemented the simplified network to indicates the effectiveness of this approach, and we do think the capability of this approaches once it got well trained and tuned.