# ECBM E4040 - Assignment 2- Task 5: Kaggle Open-ended Competition

Kaggle is a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data.

1. Train a custom model for the bottle dataset classification problem. You are free to use any methods taught in the class or found by yourself on the Internet (ALWAYS provide reference to the source). General training methods include:
    * Dropout
    * Batch normalization
    * Early stopping
    * l1-norm & l2-norm penalization

## Train your model here

In [1]:
import tensorflow as tf
import numpy as np
from PIL import Image
import os
from scipy.misc import imresize

train_path = os.getcwd() + '/data/train_128'
test_path = os.getcwd() + '/data/test_128'

y_train = []
X_train = []

for folder in os.listdir(train_path):
    train_sub_path = train_path + '/' + folder
    y_value = int(folder)
    
    for file in os.listdir(train_sub_path):
        file_path = train_sub_path + '/' + file
        img = Image.open(file_path)
        X_train.append(imresize(np.array(img), (32,32,3)))
        y_train.append(y_value)
        
X_train = np.array(X_train)
y_train = np.array(y_train)

num_training = 13500
num_validation = 1500

rand_ind = np.random.choice(X_train.shape[0], num_training, replace=False)
X_train_rn = X_train[rand_ind]
y_train_rn = y_train[rand_ind]

non_rand = np.array(list(set(range(X_train.shape[0]))-set(rand_ind)))
X_val_rn = X_train[non_rand]
y_val_rn = y_train[non_rand]

In [2]:
from ecbm4040.neuralnets.kaggle import my_training
tf.reset_default_graph()
my_training(X_train_rn, y_train_rn, X_val_rn, y_val_rn, 
         conv_featmap=[256, 64],
         fc_units=[1024],
         conv_kernel_size=[5, 5],
         pooling_size=[2, 2],
         l2_norm=0.01,
         seed=235,
         learning_rate=0.001,
         epoch=20,
         batch_size=245,
         verbose=False,
         pre_trained_model=None)

Building my LeNet. Parameters: 
conv_featmap=[256, 64]
fc_units=[1024]
conv_kernel_size=[5, 5]
pooling_size=[2, 2]
l2_norm=0.01
seed=235
learning_rate=0.001
number of batches for training: 55
0.001
epoch 1 
epoch 2 
Best validation accuracy! iteration:100 accuracy: 71.66666666666667%
epoch 3 
epoch 4 
Best validation accuracy! iteration:200 accuracy: 74.86666666666667%
epoch 5 
epoch 6 
Best validation accuracy! iteration:300 accuracy: 77.06666666666666%
epoch 7 
epoch 8 
Best validation accuracy! iteration:400 accuracy: 77.66666666666667%
epoch 9 
epoch 10 
Best validation accuracy! iteration:500 accuracy: 78.26666666666667%
epoch 11 
Best validation accuracy! iteration:600 accuracy: 79.33333333333333%
epoch 12 
epoch 13 
Best validation accuracy! iteration:700 accuracy: 80.66666666666667%
epoch 14 
epoch 15 
Best validation accuracy! iteration:800 accuracy: 82.0%
epoch 16 
epoch 17 
epoch 18 
epoch 19 
epoch 20 
Traning ends. The best valid accuracy is 82.0. Model named lenet_1509755

In [4]:
X_test = []
extension = ".png"
#path_to_image_folder = ''#Wherever you have your images
num_test_samples = 3500 #Ideally you could count the elements in the folder
img_names = [str(idx)+extension for idx in range(num_test_samples)]
#print(img_names)
for img in img_names:
    file_path = test_path + '/' + img
    img = Image.open(file_path)
    X_test.append(imresize(np.array(img), (32,32,3)))

## Save your best model

In [6]:
# YOUR CODE HERE
from ecbm4040.neuralnets.cnn_jupyter_tensorboard import show_graph 
tf.reset_default_graph()

with tf.Session() as sess: 
    saver = tf.train.import_meta_graph('model/lenet_1509755228.meta')
    saver.restore(sess, tf.train.latest_checkpoint('model/'))
    graph = tf.get_default_graph()

    idx = 0
    tf_input = graph.get_operations()[idx].name+':0'
    x = graph.get_tensor_by_name(tf_input)
    # Same procedure for y
    tf_output = "evaluate/ArgMax:0"
    y = graph.get_tensor_by_name(tf_output)
    # Make prediciton
    y_out = sess.run(y, feed_dict={x: X_test[:]})

INFO:tensorflow:Restoring parameters from model/lenet_1509755228


## Generate .csv file for Kaggle

In [7]:
# The following code snippet can be used to generate your prediction .csv file.

import csv
with open('predicted.csv','w') as csvfile:
    fieldnames = ['Id','label']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()    
    for index,l in enumerate(y_out):
        filename = str(index)+'.png'
        label = str(l)
        writer.writerow({'Id': filename, 'label': label})