diff --git a/README.md b/README.md index b95f966..c605203 100644 --- a/README.md +++ b/README.md @@ -1,139 +1,160 @@ -
-overview -
- -# Deep Classification - -## updates -- 9/26/2017: provide [subset of dataset](https://drive.google.com/drive/folders/0B3fKFm-j0RqeWGdXZUNRUkpybU0?usp=sharing), separated into train/test set -- 9/27/2017: in this homework, we only evaluat the performance of object classification. You can use other label for multi-task learning, etc. - -## Brief -* ***+2 extra credit of the whole semester*** -* Due: Oct. 5, 11:59pm. -* Required files: results/index.md, and code/ -* [Project reference](http://aliensunmin.github.io/project/handcam/) +# 邱浩翰 (105061607) +#Project 5: Deep Classification ## Overview +
+overview +
-Recently, the technological advance of wearable devices has led to significant interests in recognizing human behaviors in daily life (i.e., uninstrumented environment). Among many devices, egocentric camera systems have drawn significant attention, since the camera is aligned with the field-of-view of wearer, it naturally captures what a person sees. These systems have shown great potential in recognizing daily activities(e.g., making meals, watching TV, etc.), estimating hand poses, generating howto videos, etc. - -Despite many advantages of egocentric camera systems, there exists two main issues which are much less discussed. Firstly, hand localization is not solved especially for passive camera systems. Even for active camera systems like Kinect, hand localization is challenging when two hands are interacting or a hand is interacting with an object. Secondly, the limited field-of-view of an egocentric camera implies that hands will inevitably move outside the images sometimes. - -HandCam (Fig. 1), a novel wearable camera capturing activities of hands, for recognizing human behaviors. HandCam has two main advantages over egocentric systems : (1) it avoids the need to detect hands and manipulation regions; (2) it observes the activities of hands almost at all time. - -## Requirement - -- Python -- [TensorFlow](https://github.com/tensorflow/tensorflow) - -## Data - -### Introduction - -This is a [dataset](https://drive.google.com/drive/folders/0BwCy2boZhfdBdXdFWnEtNWJYRzQ) recorded by hand camera system. - -The camera system consist of three wide-angle cameras, two mounted on the left and right wrists to -capture hands (referred to as HandCam) and one mounted on the head (referred to as HeadCam). - -The dataset consists of 20 sets of video sequences (i.e., each set includes two HandCams and one -HeadCam synchronized videos) captured in three scenes: a small office, a mid-size lab, and a large home.) - -We want to classify some kinds of hand states including free v.s. active (i.e., hands holding objects or not), -object categories, and hand gestures. At the same time, a synchronized video has two sequence need to be labeled, -the left hand states and right hand states. - -For each classification task (i.e., free vs. active, object categories, or hand gesture), there are forty -sequences of data. We split the dataset into two parts, half for training, half for testing. The object instance is totally separated into training and testing. - -### Zip files - -`frames.zip` contains all the frames sample from the original videos by 6fps. - -`labels.zip` conatins the labels for all frames. - -FA : free vs. active (only 0/1) - -obj: object categories (24 classes, including free) - -ges: hand gesture (13 gestures, including free) - - -### Details of obj. and ges. - -``` -Obj = { 'free':0, - 'computer':1, - 'cellphone':2, - 'coin':3, - 'ruler':4, - 'thermos-bottle':5, - 'whiteboard-pen':6, - 'whiteboard-eraser':7, - 'pen':8, - 'cup':9, - 'remote-control-TV':10, - 'remote-control-AC':11, - 'switch':12, - 'windows':13, - 'fridge':14, - 'cupboard':15, - 'water-tap':16, - 'toy':17, - 'kettle':18, - 'bottle':19, - 'cookie':20, - 'book':21, - 'magnet':22, - 'lamp-switch':23} - -Ges= { 'free':0, - 'press'1, - 'large-diameter':2, - 'lateral-tripod':3, - 'parallel-extension':4, - 'thumb-2-finger':5, - 'thumb-4-finger':6, - 'thumb-index-finger':7, - 'precision-disk':8, - 'lateral-pinch':9, - 'tripod':10, - 'medium-wrap':11, - 'light-tool':12} -``` - -## Writeup +The project is related to +* handcam object classification +* VGG16 +* Reference to: + +code +>https://github.com/kevin28520/My-TensorFlow-tutorials + +vgg16 +>https://arxiv.org/abs/1409.1556 +## Implementation +1. Load data + * image & label + ``` + for dirPath, dirNames, fileNames in os.walk(label_path): + for i in range(len(train_label_file)): + train_labels = np.hstack((train_labels, np.load(label_path + train_label_file[i], mmap_mode='r'))) + + for dirPath, dirNames, fileNames in os.walk(label_path): + for i in range(len(test_label_file)): + test_labels = np.hstack((test_labels, np.load(label_path + test_label_file[i], mmap_mode='r'))) + + for i in range(len(train_image_file)): + for dirPath, dirNames, fileNames in os.walk(train_image_path + train_image_file[i]): + for f in fileNames: + train_images.append(os.path.join(dirPath, f)) + + for i in range(len(test_image_file)): + for dirPath, dirNames, fileNames in os.walk(test_image_path + test_image_file[i]): + for f in fileNames: + test_images.append(os.path.join(dirPath, f)) + ``` + + +2. Training + * Optimizer + ``` + loss_func = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y_logits, labels= y_label))#tools.loss(y_logits, y_label) + optimizer = tf.train.AdamOptimizer(LR).minimize(loss_func) + + ``` + +3. Testing + * Accuracy_evaluation + ``` + correct_prediction = tf.equal(tf.argmax(y_logits, 1), tf.argmax(y_label, 1)) + accuracy = tf.cast(tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float32)), dtype=tf.float32)#tools.accuracy(y_logits, y_label) + + ``` + +4. Architecture + * VGG16 + ``` + def VGG16N(x, n_classes, is_pretrain=True): -You are required to implement a **deep-learning-based method** to recognize hand states (free vs. active hands, hand gestures, object categories). Moreover, You might need to further take advantage of both HandCam and HeadCam. You will have to compete the performance with your classmates, so try to use as many techniques as possible to improve. **Your score will based on the performance ranking.** - -For this project, and all other projects, you must do a project report in results folder using [Markdown](https://help.github.com/articles/markdown-basics). We provide you with a placeholder [index.md](./results/index.md) document which you can edit. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then, you will describe how to run your code and if your code depended on other packages. You also need to show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component). - -You should also include the precision-recall curve of your final classifier and any interesting variants of your algorithm. - -## Rubric - - -## Get start & hand in -* Publicly fork version (+2 extra points) - - [Fork the homework](https://education.github.com/guide/forks) to obtain a copy of the homework in your github account - - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally - - Commit and push your local code to your github repo - - Once you are done, submit your homework by [creating a pull request](https://help.github.com/articles/creating-a-pull-request) - -* [Privately duplicated version](https://help.github.com/articles/duplicating-a-repository) - - Make a bare clone - - mirror-push to new repo - - [make new repo private](https://help.github.com/articles/making-a-private-repository-public) - - [add aliensunmin as collaborator](https://help.github.com/articles/adding-collaborators-to-a-personal-repository) - - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally - - Commit and push your local code to your github repo - - I will clone your repo after the due date - -## Credits -Assignment designed by Cheng-Sheng Chan. Contents in this handout are from Chan et al.. + with tf.name_scope('VGG16'): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool1'): + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool2'): + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool3'): + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool4'): + x = tools.pool('pool4', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool5'): + x = tools.pool('pool5', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #with tf.name_scope('batch_norm1'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #with tf.name_scope('batch_norm2'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + ``` + +6. Using pretrained weights or not + ``` + pre_trained_weights = 'vgg16.npy' + tools.load_with_skip(pre_trained_weights, sess, ['fc6', 'fc7', 'fc8']) + ``` + +7. Parameters + * BATCH_SIZE = 12 + * INPUT_WIDTH = 224 + * INPUT_HEIGHT = 224 + * EPOCH = 50 + * LR = 10**(-4) # learning rate + * NUM_CLASS = 24 + +## Installation +* import VGG and tools (for network) +* Set the dataset directory +* switch train or test with comment + ``` + #for test : comment line261~321 + ''' + # strat trainng + startTime = time() + init = tf.global_variables_initializer() + + + with tf.Session() as sess: + sess.run(init) + . + . + . + # duration calculating + duration = time()-startTime + print('duration = ', duration) + #### End of training #### + ''' + ``` + +### Results + +| Learning Rate |Loss| Testing Accurancy | +| --- | --- | --- | +| 0.001 | 2.67885 | 52.61% | + + + +* Training time + +> about 12 hours with GTX 1080 Ti diff --git a/README_files/overview.png b/README_files/overview.png index 5aaea3a..3dca57d 100644 Binary files a/README_files/overview.png and b/README_files/overview.png differ diff --git a/code/VGG.py b/code/VGG.py new file mode 100644 index 0000000..b3fc7fc --- /dev/null +++ b/code/VGG.py @@ -0,0 +1,102 @@ + +import tensorflow as tf +import tools + +#%% +def VGG16(x, n_classes, is_pretrain=True): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + + + + + +#%% TO get better tensorboard figures! + +def VGG16N(x, n_classes, is_pretrain=True): + + with tf.name_scope('VGG16'): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool1'): + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool2'): + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool3'): + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool4'): + x = tools.pool('pool4', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool5'): + x = tools.pool('pool5', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #with tf.name_scope('batch_norm1'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #with tf.name_scope('batch_norm2'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + + + +#%% + + + + + + + + diff --git a/code/tools.py b/code/tools.py new file mode 100644 index 0000000..d36d248 --- /dev/null +++ b/code/tools.py @@ -0,0 +1,256 @@ + +import tensorflow as tf +import numpy as np + + +#%% +def conv(layer_name, x, out_channels, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=True): + '''Convolution op wrapper, use RELU activation after convolution + Args: + layer_name: e.g. conv1, pool1... + x: input tensor, [batch_size, height, width, channels] + out_channels: number of output channels (or comvolutional kernels) + kernel_size: the size of convolutional kernel, VGG paper used: [3,3] + stride: A list of ints. 1-D of length 4. VGG paper used: [1, 1, 1, 1] + is_pretrain: if load pretrained parameters, freeze all conv layers. + Depending on different situations, you can just set part of conv layers to be freezed. + the parameters of freezed layers will not change when training. + Returns: + 4D tensor + ''' + + in_channels = x.get_shape()[-1] + with tf.variable_scope(layer_name): + w = tf.get_variable(name='weights', + trainable=is_pretrain, + shape=[kernel_size[0], kernel_size[1], in_channels, out_channels], + initializer=tf.contrib.layers.xavier_initializer()) # default is uniform distribution initialization + b = tf.get_variable(name='biases', + trainable=is_pretrain, + shape=[out_channels], + initializer=tf.constant_initializer(0.0)) + x = tf.nn.conv2d(x, w, stride, padding='SAME', name='conv') + x = tf.nn.bias_add(x, b, name='bias_add') + x = tf.nn.relu(x, name='relu') + return x + +#%% +def pool(layer_name, x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True): + '''Pooling op + Args: + x: input tensor + kernel: pooling kernel, VGG paper used [1,2,2,1], the size of kernel is 2X2 + stride: stride size, VGG paper used [1,2,2,1] + padding: + is_max_pool: boolen + if True: use max pooling + else: use avg pooling + ''' + if is_max_pool: + x = tf.nn.max_pool(x, kernel, strides=stride, padding='SAME', name=layer_name) + else: + x = tf.nn.avg_pool(x, kernel, strides=stride, padding='SAME', name=layer_name) + return x + +#%% +def batch_norm(x): + '''Batch normlization(I didn't include the offset and scale) + ''' + epsilon = 1e-3 + batch_mean, batch_var = tf.nn.moments(x, [0]) + x = tf.nn.batch_normalization(x, + mean=batch_mean, + variance=batch_var, + offset=None, + scale=None, + variance_epsilon=epsilon) + return x + +#%% +def FC_layer(layer_name, x, out_nodes): + '''Wrapper for fully connected layers with RELU activation as default + Args: + layer_name: e.g. 'FC1', 'FC2' + x: input feature map + out_nodes: number of neurons for current FC layer + ''' + shape = x.get_shape() + if len(shape) == 4: + size = shape[1].value * shape[2].value * shape[3].value + else: + size = shape[-1].value + + with tf.variable_scope(layer_name): + w = tf.get_variable('weights', + shape=[size, out_nodes], + initializer=tf.contrib.layers.xavier_initializer()) + b = tf.get_variable('biases', + shape=[out_nodes], + initializer=tf.constant_initializer(0.0)) + flat_x = tf.reshape(x, [-1, size]) # flatten into 1D + + x = tf.nn.bias_add(tf.matmul(flat_x, w), b) + x = tf.nn.relu(x) + return x + +#%% +def loss(logits, labels): + '''Compute loss + Args: + logits: logits tensor, [batch_size, n_classes] + labels: one-hot labels + ''' + with tf.name_scope('loss') as scope: + cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels,name='cross-entropy') + loss = tf.reduce_mean(cross_entropy, name='loss') + tf.summary.scalar(scope+'/loss', loss) + return loss + +#%% +def accuracy(logits, labels): + """Evaluate the quality of the logits at predicting the label. + Args: + logits: Logits tensor, float - [batch_size, NUM_CLASSES]. + labels: Labels tensor, + """ + with tf.name_scope('accuracy') as scope: + correct = tf.equal(tf.arg_max(logits, 1), tf.arg_max(labels, 1)) + correct = tf.cast(correct, tf.float32) + accuracy = tf.reduce_mean(correct)*100.0 + tf.summary.scalar(scope+'/accuracy', accuracy) + return accuracy + + + +#%% +def num_correct_prediction(logits, labels): + """Evaluate the quality of the logits at predicting the label. + Return: + the number of correct predictions + """ + correct = tf.equal(tf.arg_max(logits, 1), tf.arg_max(labels, 1)) + correct = tf.cast(correct, tf.int32) + n_correct = tf.reduce_sum(correct) + return n_correct + + + +#%% +def optimize(loss, learning_rate, global_step): + '''optimization, use Gradient Descent as default + ''' + with tf.name_scope('optimizer'): + optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) + #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) + train_op = optimizer.minimize(loss, global_step=global_step) + return train_op + + + + +#%% +def load(data_path, session): + data_dict = np.load(data_path, encoding='latin1').item() + + keys = sorted(data_dict.keys()) + for key in keys: + with tf.variable_scope(key, reuse=True): + for subkey, data in zip(('weights', 'biases'), data_dict[key]): + session.run(tf.get_variable(subkey).assign(data)) + + +#%% +def test_load(): + data_path = './vgg16.npy' + + data_dict = np.load(data_path, encoding='latin1').item() + keys = sorted(data_dict.keys()) + for key in keys: + weights = data_dict[key][0] + biases = data_dict[key][1] + print('\n') + print(key) + print('weights shape: ', weights.shape) + print('biases shape: ', biases.shape) + + +#%% +def load_with_skip(data_path, session, skip_layer): + data_dict = np.load(data_path, encoding='latin1').item() + for key in data_dict: + if key not in skip_layer: + with tf.variable_scope(key, reuse=True): + for subkey, data in zip(('weights', 'biases'), data_dict[key]): + session.run(tf.get_variable(subkey).assign(data)) + + +#%% +def print_all_variables(train_only=True): + """Print all trainable and non-trainable variables + without tl.layers.initialize_global_variables(sess) + + Parameters + ---------- + train_only : boolean + If True, only print the trainable variables, otherwise, print all variables. + """ + # tvar = tf.trainable_variables() if train_only else tf.all_variables() + if train_only: + t_vars = tf.trainable_variables() + print(" [*] printing trainable variables") + else: + try: # TF1.0 + t_vars = tf.global_variables() + except: # TF0.12 + t_vars = tf.all_variables() + print(" [*] printing global variables") + for idx, v in enumerate(t_vars): + print(" var {:3}: {:15} {}".format(idx, str(v.get_shape()), v.name)) + +#%% + + + + + + + + +##***** the followings are just for test the tensor size at diferent layers *********## + +#%% +def weight(kernel_shape, is_uniform = True): + ''' weight initializer + Args: + shape: the shape of weight + is_uniform: boolen type. + if True: use uniform distribution initializer + if False: use normal distribution initizalizer + Returns: + weight tensor + ''' + w = tf.get_variable(name='weights', + shape=kernel_shape, + initializer=tf.contrib.layers.xavier_initializer()) + return w + +#%% +def bias(bias_shape): + '''bias initializer + ''' + b = tf.get_variable(name='biases', + shape=bias_shape, + initializer=tf.constant_initializer(0.0)) + return b + +#%% + + + + + + + + + + \ No newline at end of file diff --git a/code/training_and_val_new.py b/code/training_and_val_new.py new file mode 100644 index 0000000..f2789c1 --- /dev/null +++ b/code/training_and_val_new.py @@ -0,0 +1,176 @@ +import os +import os.path + +import numpy as np +import tensorflow as tf + +import input_data +import VGG +import tools + +# +''' +train_image_path = "/Disk2/cedl/handcam/frames/train/" +test_image_path = "/Disk2/cedl/hancam/frames/test/" +label_path = "/Disk2/cedl/hancam/labels/" +''' +IMG_W = 224 +IMG_H = 224 +N_CLASSES = 24 +BATCH_SIZE = 16 +learning_rate = 0.001 +MAX_STEP = 25000 +IS_PRETRAIN = True +CAPACITY = 20000 + + +# Training +def train(): + + pre_trained_weights = 'vgg16.npy' + data_dir = './train' + #val_data_dir = './test' + train_log_dir = './logs/train/' + val_log_dir = './logs/val/' + + with tf.name_scope('input'): + #train + tra_image_list, tra_label_list = input_data.read_data(data_dir=data_dir, + is_train=True) + + tra_image_batch, tra_label_batch = input_data.get_batch(tra_image_list, + tra_label_list, + IMG_W, + IMG_H, + BATCH_SIZE, + CAPACITY) + ''' + #validation + tra_image_list, tra_label_list = input_data.read_files(data_dir=val_data_dir, + is_train=True) + val_image_batch, val_label_batch = input_data.read_decode(data_dir=val_data_dir, + is_train=False, + batch_size= BATCH_SIZE, + shuffle=False) + ''' + x = tf.placeholder(tf.float32, shape=[BATCH_SIZE, IMG_W, IMG_H, 3]) + y_ = tf.placeholder(tf.int16, shape=[BATCH_SIZE, N_CLASSES]) + + logits = VGG.VGG16N(x, N_CLASSES, IS_PRETRAIN) + loss = tools.loss(logits, y_) + accuracy = tools.accuracy(logits, y_) + + my_global_step = tf.Variable(0, name='global_step', trainable=False) + train_op = tools.optimize(loss, learning_rate, my_global_step) + + saver = tf.train.Saver(tf.global_variables()) + summary_op = tf.summary.merge_all() + + init = tf.global_variables_initializer() + sess = tf.Session() + sess.run(init) + + # load the parameter file, assign the parameters, skip the specific layers + tools.load_with_skip(pre_trained_weights, sess, ['fc6','fc7','fc8']) + + + coord = tf.train.Coordinator() + threads = tf.train.start_queue_runners(sess=sess, coord=coord) + #tra_summary_writer = tf.summary.FileWriter(train_log_dir, sess.graph) + #val_summary_writer = tf.summary.FileWriter(val_log_dir, sess.graph) + + try: + for step in np.arange(MAX_STEP): + if coord.should_stop(): + break + + tra_images,tra_labels = sess.run([tra_image_batch, tra_label_batch]) + _, tra_loss, tra_acc = sess.run([train_op, loss, accuracy], + feed_dict={x:tra_images, y_:tra_labels}) + if step % 50 == 0 or (step + 1) == MAX_STEP: + print ('Step: %d, loss: %.4f, accuracy: %.4f%%' % (step, tra_loss, tra_acc)) + #summary_str = sess.run(summary_op) + #tra_summary_writer.add_summary(summary_str, step) + ''' + if step % 200 == 0 or (step + 1) == MAX_STEP: + val_images, val_labels = sess.run([val_image_batch, val_label_batch]) + val_loss, val_acc = sess.run([loss, accuracy], + feed_dict={x:val_images,y_:val_labels}) + print('** Step %d, val loss = %.2f, val accuracy = %.2f%% **' %(step, val_loss, val_acc)) + + summary_str = sess.run(summary_op) + val_summary_writer.add_summary(summary_str, step) + ''' + if step % 2000 == 0 or (step + 1) == MAX_STEP: + checkpoint_path = os.path.join(train_log_dir, 'model.ckpt') + saver.save(sess, checkpoint_path, global_step=step) + + except tf.errors.OutOfRangeError: + print('Done training -- epoch limit reached') + finally: + coord.request_stop() + + coord.join(threads) + sess.close() + + + + + +''' +import math +def evaluate(): + with tf.Graph().as_default(): + +# log_dir = 'C://Users//kevin//Documents//tensorflow//VGG//logsvgg//train//' + log_dir = 'C:/Users/kevin/Documents/tensorflow/VGG/logs/train/' + test_dir = './/data//cifar-10-batches-bin//' + n_test = 12776 + + images, labels = input_data.read_data(data_dir=test_dir, + is_train=False, + batch_size= BATCH_SIZE, + shuffle=False) + + logits = VGG.VGG16N(images, N_CLASSES, IS_PRETRAIN) + correct = tools.num_correct_prediction(logits, labels) + saver = tf.train.Saver(tf.global_variables()) + + with tf.Session() as sess: + + print("Reading checkpoints...") + ckpt = tf.train.get_checkpoint_state(log_dir) + if ckpt and ckpt.model_checkpoint_path: + global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] + saver.restore(sess, ckpt.model_checkpoint_path) + print('Loading success, global_step is %s' % global_step) + else: + print('No checkpoint file found') + return + + coord = tf.train.Coordinator() + threads = tf.train.start_queue_runners(sess = sess, coord = coord) + + try: + print('\nEvaluating......') + num_step = int(math.floor(n_test / BATCH_SIZE)) + num_sample = num_step*BATCH_SIZE + step = 0 + total_correct = 0 + while step < num_step and not coord.should_stop(): + batch_correct = sess.run(correct) + total_correct += np.sum(batch_correct) + step += 1 + print('Total testing samples: %d' %num_sample) + print('Total correct predictions: %d' %total_correct) + print('Average accuracy: %.2f%%' %(100*total_correct/num_sample)) + except Exception as e: + coord.request_stop(e) + finally: + coord.request_stop() + coord.join(threads) +''' +# +train() + + diff --git a/hw1/README.md b/hw1/README.md new file mode 100644 index 0000000..b95f966 --- /dev/null +++ b/hw1/README.md @@ -0,0 +1,139 @@ +
+overview +
+ +# Deep Classification + +## updates +- 9/26/2017: provide [subset of dataset](https://drive.google.com/drive/folders/0B3fKFm-j0RqeWGdXZUNRUkpybU0?usp=sharing), separated into train/test set +- 9/27/2017: in this homework, we only evaluat the performance of object classification. You can use other label for multi-task learning, etc. + +## Brief +* ***+2 extra credit of the whole semester*** +* Due: Oct. 5, 11:59pm. +* Required files: results/index.md, and code/ +* [Project reference](http://aliensunmin.github.io/project/handcam/) + + +## Overview + + +Recently, the technological advance of wearable devices has led to significant interests in recognizing human behaviors in daily life (i.e., uninstrumented environment). Among many devices, egocentric camera systems have drawn significant attention, since the camera is aligned with the field-of-view of wearer, it naturally captures what a person sees. These systems have shown great potential in recognizing daily activities(e.g., making meals, watching TV, etc.), estimating hand poses, generating howto videos, etc. + +Despite many advantages of egocentric camera systems, there exists two main issues which are much less discussed. Firstly, hand localization is not solved especially for passive camera systems. Even for active camera systems like Kinect, hand localization is challenging when two hands are interacting or a hand is interacting with an object. Secondly, the limited field-of-view of an egocentric camera implies that hands will inevitably move outside the images sometimes. + +HandCam (Fig. 1), a novel wearable camera capturing activities of hands, for recognizing human behaviors. HandCam has two main advantages over egocentric systems : (1) it avoids the need to detect hands and manipulation regions; (2) it observes the activities of hands almost at all time. + +## Requirement + +- Python +- [TensorFlow](https://github.com/tensorflow/tensorflow) + +## Data + +### Introduction + +This is a [dataset](https://drive.google.com/drive/folders/0BwCy2boZhfdBdXdFWnEtNWJYRzQ) recorded by hand camera system. + +The camera system consist of three wide-angle cameras, two mounted on the left and right wrists to +capture hands (referred to as HandCam) and one mounted on the head (referred to as HeadCam). + +The dataset consists of 20 sets of video sequences (i.e., each set includes two HandCams and one +HeadCam synchronized videos) captured in three scenes: a small office, a mid-size lab, and a large home.) + +We want to classify some kinds of hand states including free v.s. active (i.e., hands holding objects or not), +object categories, and hand gestures. At the same time, a synchronized video has two sequence need to be labeled, +the left hand states and right hand states. + +For each classification task (i.e., free vs. active, object categories, or hand gesture), there are forty +sequences of data. We split the dataset into two parts, half for training, half for testing. The object instance is totally separated into training and testing. + +### Zip files + +`frames.zip` contains all the frames sample from the original videos by 6fps. + +`labels.zip` conatins the labels for all frames. + +FA : free vs. active (only 0/1) + +obj: object categories (24 classes, including free) + +ges: hand gesture (13 gestures, including free) + + +### Details of obj. and ges. + +``` +Obj = { 'free':0, + 'computer':1, + 'cellphone':2, + 'coin':3, + 'ruler':4, + 'thermos-bottle':5, + 'whiteboard-pen':6, + 'whiteboard-eraser':7, + 'pen':8, + 'cup':9, + 'remote-control-TV':10, + 'remote-control-AC':11, + 'switch':12, + 'windows':13, + 'fridge':14, + 'cupboard':15, + 'water-tap':16, + 'toy':17, + 'kettle':18, + 'bottle':19, + 'cookie':20, + 'book':21, + 'magnet':22, + 'lamp-switch':23} + +Ges= { 'free':0, + 'press'1, + 'large-diameter':2, + 'lateral-tripod':3, + 'parallel-extension':4, + 'thumb-2-finger':5, + 'thumb-4-finger':6, + 'thumb-index-finger':7, + 'precision-disk':8, + 'lateral-pinch':9, + 'tripod':10, + 'medium-wrap':11, + 'light-tool':12} +``` + +## Writeup + +You are required to implement a **deep-learning-based method** to recognize hand states (free vs. active hands, hand gestures, object categories). Moreover, You might need to further take advantage of both HandCam and HeadCam. You will have to compete the performance with your classmates, so try to use as many techniques as possible to improve. **Your score will based on the performance ranking.** + +For this project, and all other projects, you must do a project report in results folder using [Markdown](https://help.github.com/articles/markdown-basics). We provide you with a placeholder [index.md](./results/index.md) document which you can edit. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then, you will describe how to run your code and if your code depended on other packages. You also need to show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component). + +You should also include the precision-recall curve of your final classifier and any interesting variants of your algorithm. + +## Rubric + + +## Get start & hand in +* Publicly fork version (+2 extra points) + - [Fork the homework](https://education.github.com/guide/forks) to obtain a copy of the homework in your github account + - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally + - Commit and push your local code to your github repo + - Once you are done, submit your homework by [creating a pull request](https://help.github.com/articles/creating-a-pull-request) + +* [Privately duplicated version](https://help.github.com/articles/duplicating-a-repository) + - Make a bare clone + - mirror-push to new repo + - [make new repo private](https://help.github.com/articles/making-a-private-repository-public) + - [add aliensunmin as collaborator](https://help.github.com/articles/adding-collaborators-to-a-personal-repository) + - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally + - Commit and push your local code to your github repo + - I will clone your repo after the due date + +## Credits +Assignment designed by Cheng-Sheng Chan. Contents in this handout are from Chan et al.. diff --git a/hw1/README_files/overview.png b/hw1/README_files/overview.png new file mode 100644 index 0000000..5aaea3a Binary files /dev/null and b/hw1/README_files/overview.png differ diff --git a/hw1/code/VGG.py b/hw1/code/VGG.py new file mode 100644 index 0000000..b3fc7fc --- /dev/null +++ b/hw1/code/VGG.py @@ -0,0 +1,102 @@ + +import tensorflow as tf +import tools + +#%% +def VGG16(x, n_classes, is_pretrain=True): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + + + + + +#%% TO get better tensorboard figures! + +def VGG16N(x, n_classes, is_pretrain=True): + + with tf.name_scope('VGG16'): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool1'): + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool2'): + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool3'): + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool4'): + x = tools.pool('pool4', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool5'): + x = tools.pool('pool5', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #with tf.name_scope('batch_norm1'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #with tf.name_scope('batch_norm2'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + + + +#%% + + + + + + + + diff --git a/hw1/code/tools.py b/hw1/code/tools.py new file mode 100644 index 0000000..d36d248 --- /dev/null +++ b/hw1/code/tools.py @@ -0,0 +1,256 @@ + +import tensorflow as tf +import numpy as np + + +#%% +def conv(layer_name, x, out_channels, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=True): + '''Convolution op wrapper, use RELU activation after convolution + Args: + layer_name: e.g. conv1, pool1... + x: input tensor, [batch_size, height, width, channels] + out_channels: number of output channels (or comvolutional kernels) + kernel_size: the size of convolutional kernel, VGG paper used: [3,3] + stride: A list of ints. 1-D of length 4. VGG paper used: [1, 1, 1, 1] + is_pretrain: if load pretrained parameters, freeze all conv layers. + Depending on different situations, you can just set part of conv layers to be freezed. + the parameters of freezed layers will not change when training. + Returns: + 4D tensor + ''' + + in_channels = x.get_shape()[-1] + with tf.variable_scope(layer_name): + w = tf.get_variable(name='weights', + trainable=is_pretrain, + shape=[kernel_size[0], kernel_size[1], in_channels, out_channels], + initializer=tf.contrib.layers.xavier_initializer()) # default is uniform distribution initialization + b = tf.get_variable(name='biases', + trainable=is_pretrain, + shape=[out_channels], + initializer=tf.constant_initializer(0.0)) + x = tf.nn.conv2d(x, w, stride, padding='SAME', name='conv') + x = tf.nn.bias_add(x, b, name='bias_add') + x = tf.nn.relu(x, name='relu') + return x + +#%% +def pool(layer_name, x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True): + '''Pooling op + Args: + x: input tensor + kernel: pooling kernel, VGG paper used [1,2,2,1], the size of kernel is 2X2 + stride: stride size, VGG paper used [1,2,2,1] + padding: + is_max_pool: boolen + if True: use max pooling + else: use avg pooling + ''' + if is_max_pool: + x = tf.nn.max_pool(x, kernel, strides=stride, padding='SAME', name=layer_name) + else: + x = tf.nn.avg_pool(x, kernel, strides=stride, padding='SAME', name=layer_name) + return x + +#%% +def batch_norm(x): + '''Batch normlization(I didn't include the offset and scale) + ''' + epsilon = 1e-3 + batch_mean, batch_var = tf.nn.moments(x, [0]) + x = tf.nn.batch_normalization(x, + mean=batch_mean, + variance=batch_var, + offset=None, + scale=None, + variance_epsilon=epsilon) + return x + +#%% +def FC_layer(layer_name, x, out_nodes): + '''Wrapper for fully connected layers with RELU activation as default + Args: + layer_name: e.g. 'FC1', 'FC2' + x: input feature map + out_nodes: number of neurons for current FC layer + ''' + shape = x.get_shape() + if len(shape) == 4: + size = shape[1].value * shape[2].value * shape[3].value + else: + size = shape[-1].value + + with tf.variable_scope(layer_name): + w = tf.get_variable('weights', + shape=[size, out_nodes], + initializer=tf.contrib.layers.xavier_initializer()) + b = tf.get_variable('biases', + shape=[out_nodes], + initializer=tf.constant_initializer(0.0)) + flat_x = tf.reshape(x, [-1, size]) # flatten into 1D + + x = tf.nn.bias_add(tf.matmul(flat_x, w), b) + x = tf.nn.relu(x) + return x + +#%% +def loss(logits, labels): + '''Compute loss + Args: + logits: logits tensor, [batch_size, n_classes] + labels: one-hot labels + ''' + with tf.name_scope('loss') as scope: + cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels,name='cross-entropy') + loss = tf.reduce_mean(cross_entropy, name='loss') + tf.summary.scalar(scope+'/loss', loss) + return loss + +#%% +def accuracy(logits, labels): + """Evaluate the quality of the logits at predicting the label. + Args: + logits: Logits tensor, float - [batch_size, NUM_CLASSES]. + labels: Labels tensor, + """ + with tf.name_scope('accuracy') as scope: + correct = tf.equal(tf.arg_max(logits, 1), tf.arg_max(labels, 1)) + correct = tf.cast(correct, tf.float32) + accuracy = tf.reduce_mean(correct)*100.0 + tf.summary.scalar(scope+'/accuracy', accuracy) + return accuracy + + + +#%% +def num_correct_prediction(logits, labels): + """Evaluate the quality of the logits at predicting the label. + Return: + the number of correct predictions + """ + correct = tf.equal(tf.arg_max(logits, 1), tf.arg_max(labels, 1)) + correct = tf.cast(correct, tf.int32) + n_correct = tf.reduce_sum(correct) + return n_correct + + + +#%% +def optimize(loss, learning_rate, global_step): + '''optimization, use Gradient Descent as default + ''' + with tf.name_scope('optimizer'): + optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) + #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) + train_op = optimizer.minimize(loss, global_step=global_step) + return train_op + + + + +#%% +def load(data_path, session): + data_dict = np.load(data_path, encoding='latin1').item() + + keys = sorted(data_dict.keys()) + for key in keys: + with tf.variable_scope(key, reuse=True): + for subkey, data in zip(('weights', 'biases'), data_dict[key]): + session.run(tf.get_variable(subkey).assign(data)) + + +#%% +def test_load(): + data_path = './vgg16.npy' + + data_dict = np.load(data_path, encoding='latin1').item() + keys = sorted(data_dict.keys()) + for key in keys: + weights = data_dict[key][0] + biases = data_dict[key][1] + print('\n') + print(key) + print('weights shape: ', weights.shape) + print('biases shape: ', biases.shape) + + +#%% +def load_with_skip(data_path, session, skip_layer): + data_dict = np.load(data_path, encoding='latin1').item() + for key in data_dict: + if key not in skip_layer: + with tf.variable_scope(key, reuse=True): + for subkey, data in zip(('weights', 'biases'), data_dict[key]): + session.run(tf.get_variable(subkey).assign(data)) + + +#%% +def print_all_variables(train_only=True): + """Print all trainable and non-trainable variables + without tl.layers.initialize_global_variables(sess) + + Parameters + ---------- + train_only : boolean + If True, only print the trainable variables, otherwise, print all variables. + """ + # tvar = tf.trainable_variables() if train_only else tf.all_variables() + if train_only: + t_vars = tf.trainable_variables() + print(" [*] printing trainable variables") + else: + try: # TF1.0 + t_vars = tf.global_variables() + except: # TF0.12 + t_vars = tf.all_variables() + print(" [*] printing global variables") + for idx, v in enumerate(t_vars): + print(" var {:3}: {:15} {}".format(idx, str(v.get_shape()), v.name)) + +#%% + + + + + + + + +##***** the followings are just for test the tensor size at diferent layers *********## + +#%% +def weight(kernel_shape, is_uniform = True): + ''' weight initializer + Args: + shape: the shape of weight + is_uniform: boolen type. + if True: use uniform distribution initializer + if False: use normal distribution initizalizer + Returns: + weight tensor + ''' + w = tf.get_variable(name='weights', + shape=kernel_shape, + initializer=tf.contrib.layers.xavier_initializer()) + return w + +#%% +def bias(bias_shape): + '''bias initializer + ''' + b = tf.get_variable(name='biases', + shape=bias_shape, + initializer=tf.constant_initializer(0.0)) + return b + +#%% + + + + + + + + + + \ No newline at end of file diff --git a/hw1/code/training_and_val_new.py b/hw1/code/training_and_val_new.py new file mode 100644 index 0000000..f2789c1 --- /dev/null +++ b/hw1/code/training_and_val_new.py @@ -0,0 +1,176 @@ +import os +import os.path + +import numpy as np +import tensorflow as tf + +import input_data +import VGG +import tools + +# +''' +train_image_path = "/Disk2/cedl/handcam/frames/train/" +test_image_path = "/Disk2/cedl/hancam/frames/test/" +label_path = "/Disk2/cedl/hancam/labels/" +''' +IMG_W = 224 +IMG_H = 224 +N_CLASSES = 24 +BATCH_SIZE = 16 +learning_rate = 0.001 +MAX_STEP = 25000 +IS_PRETRAIN = True +CAPACITY = 20000 + + +# Training +def train(): + + pre_trained_weights = 'vgg16.npy' + data_dir = './train' + #val_data_dir = './test' + train_log_dir = './logs/train/' + val_log_dir = './logs/val/' + + with tf.name_scope('input'): + #train + tra_image_list, tra_label_list = input_data.read_data(data_dir=data_dir, + is_train=True) + + tra_image_batch, tra_label_batch = input_data.get_batch(tra_image_list, + tra_label_list, + IMG_W, + IMG_H, + BATCH_SIZE, + CAPACITY) + ''' + #validation + tra_image_list, tra_label_list = input_data.read_files(data_dir=val_data_dir, + is_train=True) + val_image_batch, val_label_batch = input_data.read_decode(data_dir=val_data_dir, + is_train=False, + batch_size= BATCH_SIZE, + shuffle=False) + ''' + x = tf.placeholder(tf.float32, shape=[BATCH_SIZE, IMG_W, IMG_H, 3]) + y_ = tf.placeholder(tf.int16, shape=[BATCH_SIZE, N_CLASSES]) + + logits = VGG.VGG16N(x, N_CLASSES, IS_PRETRAIN) + loss = tools.loss(logits, y_) + accuracy = tools.accuracy(logits, y_) + + my_global_step = tf.Variable(0, name='global_step', trainable=False) + train_op = tools.optimize(loss, learning_rate, my_global_step) + + saver = tf.train.Saver(tf.global_variables()) + summary_op = tf.summary.merge_all() + + init = tf.global_variables_initializer() + sess = tf.Session() + sess.run(init) + + # load the parameter file, assign the parameters, skip the specific layers + tools.load_with_skip(pre_trained_weights, sess, ['fc6','fc7','fc8']) + + + coord = tf.train.Coordinator() + threads = tf.train.start_queue_runners(sess=sess, coord=coord) + #tra_summary_writer = tf.summary.FileWriter(train_log_dir, sess.graph) + #val_summary_writer = tf.summary.FileWriter(val_log_dir, sess.graph) + + try: + for step in np.arange(MAX_STEP): + if coord.should_stop(): + break + + tra_images,tra_labels = sess.run([tra_image_batch, tra_label_batch]) + _, tra_loss, tra_acc = sess.run([train_op, loss, accuracy], + feed_dict={x:tra_images, y_:tra_labels}) + if step % 50 == 0 or (step + 1) == MAX_STEP: + print ('Step: %d, loss: %.4f, accuracy: %.4f%%' % (step, tra_loss, tra_acc)) + #summary_str = sess.run(summary_op) + #tra_summary_writer.add_summary(summary_str, step) + ''' + if step % 200 == 0 or (step + 1) == MAX_STEP: + val_images, val_labels = sess.run([val_image_batch, val_label_batch]) + val_loss, val_acc = sess.run([loss, accuracy], + feed_dict={x:val_images,y_:val_labels}) + print('** Step %d, val loss = %.2f, val accuracy = %.2f%% **' %(step, val_loss, val_acc)) + + summary_str = sess.run(summary_op) + val_summary_writer.add_summary(summary_str, step) + ''' + if step % 2000 == 0 or (step + 1) == MAX_STEP: + checkpoint_path = os.path.join(train_log_dir, 'model.ckpt') + saver.save(sess, checkpoint_path, global_step=step) + + except tf.errors.OutOfRangeError: + print('Done training -- epoch limit reached') + finally: + coord.request_stop() + + coord.join(threads) + sess.close() + + + + + +''' +import math +def evaluate(): + with tf.Graph().as_default(): + +# log_dir = 'C://Users//kevin//Documents//tensorflow//VGG//logsvgg//train//' + log_dir = 'C:/Users/kevin/Documents/tensorflow/VGG/logs/train/' + test_dir = './/data//cifar-10-batches-bin//' + n_test = 12776 + + images, labels = input_data.read_data(data_dir=test_dir, + is_train=False, + batch_size= BATCH_SIZE, + shuffle=False) + + logits = VGG.VGG16N(images, N_CLASSES, IS_PRETRAIN) + correct = tools.num_correct_prediction(logits, labels) + saver = tf.train.Saver(tf.global_variables()) + + with tf.Session() as sess: + + print("Reading checkpoints...") + ckpt = tf.train.get_checkpoint_state(log_dir) + if ckpt and ckpt.model_checkpoint_path: + global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] + saver.restore(sess, ckpt.model_checkpoint_path) + print('Loading success, global_step is %s' % global_step) + else: + print('No checkpoint file found') + return + + coord = tf.train.Coordinator() + threads = tf.train.start_queue_runners(sess = sess, coord = coord) + + try: + print('\nEvaluating......') + num_step = int(math.floor(n_test / BATCH_SIZE)) + num_sample = num_step*BATCH_SIZE + step = 0 + total_correct = 0 + while step < num_step and not coord.should_stop(): + batch_correct = sess.run(correct) + total_correct += np.sum(batch_correct) + step += 1 + print('Total testing samples: %d' %num_sample) + print('Total correct predictions: %d' %total_correct) + print('Average accuracy: %.2f%%' %(100*total_correct/num_sample)) + except Exception as e: + coord.request_stop(e) + finally: + coord.request_stop() + coord.join(threads) +''' +# +train() + + diff --git a/hw1/results/index.md b/hw1/results/index.md new file mode 100644 index 0000000..96ce61c --- /dev/null +++ b/hw1/results/index.md @@ -0,0 +1,47 @@ +# Your Name (id) + +#Project 5: Deep Classification + +## Overview +The project is related to +> quote + + +## Implementation +1. One + * item + * item +2. Two + +``` +Code highlights +``` + +## Installation +* Other required packages. +* How to compile from source? + +### Results + + + + + + + + + + +
+ + + + +
+ + + + +
+ + diff --git a/results/index.md b/results/index.md index 96ce61c..e1d72ea 100644 --- a/results/index.md +++ b/results/index.md @@ -1,47 +1,163 @@ -# Your Name (id) +# 邱浩翰 (105061607) #Project 5: Deep Classification ## Overview + +
+overview +
+ The project is related to -> quote +* handcam object classification +* VGG16 +* Reference to: +code +>https://github.com/kevin28520/My-TensorFlow-tutorials +vgg16 +>https://arxiv.org/abs/1409.1556 ## Implementation -1. One - * item - * item -2. Two +1. Load data + * image & label + ``` + for dirPath, dirNames, fileNames in os.walk(label_path): + for i in range(len(train_label_file)): + train_labels = np.hstack((train_labels, np.load(label_path + train_label_file[i], mmap_mode='r'))) + + for dirPath, dirNames, fileNames in os.walk(label_path): + for i in range(len(test_label_file)): + test_labels = np.hstack((test_labels, np.load(label_path + test_label_file[i], mmap_mode='r'))) + + for i in range(len(train_image_file)): + for dirPath, dirNames, fileNames in os.walk(train_image_path + train_image_file[i]): + for f in fileNames: + train_images.append(os.path.join(dirPath, f)) + + for i in range(len(test_image_file)): + for dirPath, dirNames, fileNames in os.walk(test_image_path + test_image_file[i]): + for f in fileNames: + test_images.append(os.path.join(dirPath, f)) + ``` + + +2. Training + * Optimizer + ``` + loss_func = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y_logits, labels= y_label))#tools.loss(y_logits, y_label) + optimizer = tf.train.AdamOptimizer(LR).minimize(loss_func) + + ``` + +3. Testing + * Accuracy_evaluation + ``` + correct_prediction = tf.equal(tf.argmax(y_logits, 1), tf.argmax(y_label, 1)) + accuracy = tf.cast(tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float32)), dtype=tf.float32)#tools.accuracy(y_logits, y_label) + + ``` + +4. Architecture + * VGG16 + ``` + def VGG16N(x, n_classes, is_pretrain=True): + + with tf.name_scope('VGG16'): + + x = tools.conv('conv1_1', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv1_2', x, 64, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool1'): + x = tools.pool('pool1', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + x = tools.conv('conv2_1', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv2_2', x, 128, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool2'): + x = tools.pool('pool2', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + + x = tools.conv('conv3_1', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_2', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv3_3', x, 256, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool3'): + x = tools.pool('pool3', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + -``` -Code highlights -``` + x = tools.conv('conv4_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv4_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool4'): + x = tools.pool('pool4', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.conv('conv5_1', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_2', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + x = tools.conv('conv5_3', x, 512, kernel_size=[3,3], stride=[1,1,1,1], is_pretrain=is_pretrain) + with tf.name_scope('pool5'): + x = tools.pool('pool5', x, kernel=[1,2,2,1], stride=[1,2,2,1], is_max_pool=True) + + + x = tools.FC_layer('fc6', x, out_nodes=4096) + #with tf.name_scope('batch_norm1'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc7', x, out_nodes=4096) + #with tf.name_scope('batch_norm2'): + #x = tools.batch_norm(x) + x = tools.FC_layer('fc8', x, out_nodes=n_classes) + + return x + ``` + +6. Using pretrained weights or not + ``` + pre_trained_weights = 'vgg16.npy' + tools.load_with_skip(pre_trained_weights, sess, ['fc6', 'fc7', 'fc8']) + ``` + +7. Parameters + * BATCH_SIZE = 12 + * INPUT_WIDTH = 224 + * INPUT_HEIGHT = 224 + * EPOCH = 50 + * LR = 10**(-4) # learning rate + * NUM_CLASS = 24 ## Installation -* Other required packages. -* How to compile from source? +* import VGG and tools (for network) +* Set the dataset directory +* switch train or test with comment + ``` + #for test : comment line261~321 + ''' + # strat trainng + startTime = time() + init = tf.global_variables_initializer() + + with tf.Session() as sess: + sess.run(init) + . + . + . + # duration calculating + duration = time()-startTime + print('duration = ', duration) + #### End of training #### + ''' + ``` + ### Results - - - - - - - - - -
- - - - -
- - - - -
+| Learning Rate |Loss| Testing Accurancy | +| --- | --- | --- | +| 0.001 | 2.67885 | 52.61% | + + + +* Training time + +> about 12 hours with GTX 1080 Ti +