
# Problem Set 3


By Xide Xia  with help of Brian Kulis, Kate Saenko, Ali Siahkamari, and Kun He.


This assignment will introduce you to:
1. Building and training a convolutional network
2. Saving snapshots of your trained model
3. Reloading weights from a saved model
4. Fine-tuning a pre-trained network
5. Visualizations using Tensorboard

This code has been tested and should for Python 3.5 and 2.7 with tensorflow. You can update to recent tensorflow version just by doing `pip install tensorflow`,  or `pip install tensorflow-gpu` if you want to use GPU.

**Note:** This notebook contains problem descriptions and demo/starter code. However, you're welcome to implement and submit .py files directly, if that's easier for you. Starter .py files are provided in the same `pset4/` directory.

**Warning:** The gpu queue on SCC may be long when the deadline comes. Please start your homework early.

## Part 0: Tutorials

You will find these TensorFlow tutorials on CNNs useful:
 - [Deep MNIST for experts](https://www.tensorflow.org/get_started/mnist/pros)
 - [Convolutional Neural Networks](https://www.tensorflow.org/tutorials/deep_cnn)
 
Note that there are many ways to implement the same thing in TensorFlow, for example, both tf.nn and tf.layers provide convolutional layers but with slightly different interfaces. You will need to read the documentation of the functions provided below to understand how they work.

Also, you can run your experiments on SCC if you want to use GPU. You will find the SCC tutorial helpful: - [SCC tutorials](http://rcs.bu.edu/classes/DeepLearning/)

## Part 1: Building and Training a ConvNet on SVHN
(25 points)

First we provide demo code that trains a convolutional network on the [SVHN Dataset](http://ufldl.stanford.edu/housenumbers/).. 

You will need to download   __Format 2__ from the link above.
- Create a directory named `svhn_mat/` in the working directory. Or, you can create it anywhere you want, but change the path in `svhn_dataset_generator` to match it.
- Download `train_32x32.mat` and `test_32x32.mat` to this directory.
- `extra_32x32.mat` is NOT needed.
- You may find the `wget` command useful for downloading on linux. 



The following defines a generator for the SVHN Dataset, yielding the next batch every time next is invoked.

In [48]:
import copy
import os
import math
import numpy as np
import scipy
import scipy.io

from six.moves import range

import read_data

@read_data.restartable
def svhn_dataset_generator(dataset_name, batch_size):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    path = './svhn_mat/' # path to the SVHN dataset you will download in Q1.1
    file_name = '%s_32x32.mat' % dataset_name
    file_dict = scipy.io.loadmat(os.path.join(path, file_name))
    X_all = file_dict['X'].transpose((3, 0, 1, 2))
    y_all = file_dict['y']
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    y_all_padded[y_all_padded == 10] = 0
    
    for slice_i in range(int(math.ceil(data_len / batch_size))):
        idx = slice_i * batch_size
        X_batch = X_all_padded[idx:idx + batch_size]
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch, y_batch

In [49]:
import tensorflow as tf

# The following defines a simple CovNet Model.
def SVHN_net_v0(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu,
            name='conv1')
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu,
            name ='conv2')
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits


def apply_classification_loss(model_function):
    with tf.Graph().as_default() as g:
        with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            trainer = tf.train.AdamOptimizer(learning_rate=0.001)
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), axis=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss}
    
    return model_dict

### Q1.1 Training SVHN Net
(2 points)

Now we train the SVHN_net_v0 net on Format 2 of the SVHN Dataset.  

**Note:** training will take a while, so you might want to use GPU.

In [50]:
def train_model(model_dict, dataset_generators, epoch_n, print_every):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('epoch {:d} iter {:d}, loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))

In [5]:
dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}
    
model_dict = apply_classification_loss(SVHN_net_v0)
train_model(model_dict, dataset_generators, epoch_n=50, print_every=20)

epoch 0 iter 0, loss: 57.970, accuracy: 0.079
epoch 0 iter 20, loss: 2.282, accuracy: 0.186
epoch 0 iter 40, loss: 2.243, accuracy: 0.196
epoch 0 iter 60, loss: 2.241, accuracy: 0.196
epoch 0 iter 80, loss: 2.238, accuracy: 0.196
epoch 0 iter 100, loss: 2.235, accuracy: 0.196
epoch 0 iter 120, loss: 2.234, accuracy: 0.196
epoch 0 iter 140, loss: 2.235, accuracy: 0.196
epoch 0 iter 160, loss: 2.233, accuracy: 0.196
epoch 0 iter 180, loss: 2.229, accuracy: 0.195
epoch 0 iter 200, loss: 2.216, accuracy: 0.209
epoch 0 iter 220, loss: 2.208, accuracy: 0.200
epoch 0 iter 240, loss: 2.192, accuracy: 0.219
epoch 0 iter 260, loss: 2.172, accuracy: 0.221
epoch 0 iter 280, loss: 2.150, accuracy: 0.222
epoch 1 iter 0, loss: 2.138, accuracy: 0.237
epoch 1 iter 20, loss: 2.113, accuracy: 0.248
epoch 1 iter 40, loss: 2.036, accuracy: 0.301
epoch 1 iter 60, loss: 1.704, accuracy: 0.436
epoch 1 iter 80, loss: 1.540, accuracy: 0.513
epoch 1 iter 100, loss: 1.458, accuracy: 0.531
epoch 1 iter 120, loss: 

### Q1.2 Understanding the CNN Architecture
(7 points)

Explain the definition of the following terms. What is the corresponding setting in our SVHN net? Are there any other choices?

  - Stride
  - Padding
  - Non-linearity
  - Pooling
  - Optimizer
  - Learning rate
  - Loss function

**[Double click here to add your answer]**

**Stride and Padding**

On the conv layers, we need to operate the filters. There are 2 paremeters that we can control to modify the conv layer behaviors. They are 'Stride' and 'Padding'. The amount by which the filter shifts is the stride. By setting stride to 1, the filter convolves around the input volume by shifting one unit at a time. If we use three 5*5*3 filters in a 32*32*3 input, we will get an output of 28*28*3. There will be a spatial dimensions decreasion. If we want to preserve as much information about the original input volume, we can apply a size 2 zerro padding to the input. So that after the 5*5*3 padding with stride = 1, we can still get an output of 32*32*3.

**Non-linearity**

Non-linearity or activation function is very important for the CNN. For example, when we using linear classification to compute the scores for different categories, we will use formula $s = Wx$. However, when we use neural network, we will chose a formula like $s = W_2 \max(0, W_1x)$ ,where $\max(0,-)$ is a non-linearity. Without it, the two matrices could be collapsed to a single matrix, and therefore the predicted class scores would again be a linear function of the input. Every activation function (or non-linearity) takes a single number and performs a certain fixed mathematical operation on it. Commly used non-linearity: sigmoid, tanh, Relu, Leaky Relu, Maxout and so on.

**Pooling**

Sometimes you can choose to add Pooling Layers to your CNN. It is also known as a downsampling layer. For example, using maxpooling, it basically takes a filter (normally of size 2x2) and a stride of the same length. It then applies it to the input volume and outputs the maximum number in every subregion that the filter convolves around. The purpose of using pooling is to reduce the comuptation cost and deal with overfitting.

**Learning Rate**

Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient. new_weight = existing_weight — learning_rate * gradient.

**Loss Function**

After a score function mapped the raw image pixels to class scores. A loss function is a function measured the quality of a particular set of parameters based on how well the induced scores agreed with the ground truth labels in the training data.

**Optimizer**

The Goal of optimization is to find the parameters for the lowest loss. Opitimizer defined specific gradient descent ways.

**SVHN**

Stride: Of conv layers should be default as 1, Of Pooling layer should be 2. You can set the stride to other values but to make sure the result of the conv to be integers, it usually sets as 1. And the stride of pooling usually equals to the size of the pooling filter.

Padding: Here is Padding = 'same'. By setting it to 'Same' means that we want the output size is equal to the input size. Since our input size is 32*32*3, and the filter size is 5*5*3, to get the same size output, we need zero padding of size 2. You can also use Padding = 'valid', it means no padding at all.

Non-linearity: From 'activation=tf.nn.relu' we can get that we used Relu activation function here. We can also use sigmoid, tanh, Leaky Relu, Maxout and so on.

Pooling: pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)
         We chose maxpooling here. We can also use 'average pooling' and 'mean pooling'.

Opitimizer: tf.train.AdamOptimizer(learning_rate=0.001) We using AdamOptimizer here. And we can also choose GD, SGD, NAdam, Adam with Momentum and so on.

Learning Rate: learning_rate=0.001

Loss function: losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict), sparse softmax cross entropy.


### Q1.3 SVHN Net Variations
(16 points)

Now we vary the structure of the network. To keep things simple, we still use  two identical conv layers, but vary their parameters. 

Report the final test accuracy on 3 different number of filters, 3 different size of kernels, 3 different number of strides, and 3 different dimension of final fully connected layer. Each time when you vary one parameter, keep the other fixed at the original value. Explain the results.

|# of Filter|Accuracy|
|--|-------------------------------|
| / | / |
| / | / |
| / | / |

|Kernel size|Accuracy|
|--|-------------------------------|
| / | / |
| / | / |
| / | / |

|Stride|Accuracy|
|--|-------------------------------|
| / | / |
| / | / |
| / | / |

|FC size|Accuracy|
|--|-------------------------------|
| / | / |
| / | / |
| / | / |

A template for one sample modification is given below. 

**Note:** you're welcome to decide how many training epochs to use, if that gets you the same results but faster.

In [8]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=16,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=16, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=50)

epoch 0 iter 0, loss: 98.486, accuracy: 0.184
epoch 0 iter 50, loss: 2.245, accuracy: 0.200
epoch 0 iter 100, loss: 2.218, accuracy: 0.212
epoch 0 iter 150, loss: 2.133, accuracy: 0.255
epoch 0 iter 200, loss: 1.534, accuracy: 0.503
epoch 0 iter 250, loss: 1.158, accuracy: 0.645
epoch 1 iter 0, loss: 1.023, accuracy: 0.694
epoch 1 iter 50, loss: 0.932, accuracy: 0.725
epoch 1 iter 100, loss: 0.830, accuracy: 0.757
epoch 1 iter 150, loss: 0.831, accuracy: 0.755
epoch 1 iter 200, loss: 0.782, accuracy: 0.774
epoch 1 iter 250, loss: 0.757, accuracy: 0.781
epoch 2 iter 0, loss: 0.704, accuracy: 0.801
epoch 2 iter 50, loss: 0.718, accuracy: 0.796
epoch 2 iter 100, loss: 0.713, accuracy: 0.798
epoch 2 iter 150, loss: 0.683, accuracy: 0.805
epoch 2 iter 200, loss: 0.681, accuracy: 0.807
epoch 2 iter 250, loss: 0.655, accuracy: 0.815
epoch 3 iter 0, loss: 0.683, accuracy: 0.811
epoch 3 iter 50, loss: 0.680, accuracy: 0.812
epoch 3 iter 100, loss: 0.658, accuracy: 0.816
epoch 3 iter 150, loss: 

In [5]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=64,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=64, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=50)

epoch 0 iter 0, loss: 112.212, accuracy: 0.092
epoch 0 iter 50, loss: 2.242, accuracy: 0.195
epoch 0 iter 100, loss: 2.220, accuracy: 0.219
epoch 0 iter 150, loss: 2.121, accuracy: 0.254
epoch 0 iter 200, loss: 2.096, accuracy: 0.268
epoch 0 iter 250, loss: 2.026, accuracy: 0.287
epoch 1 iter 0, loss: 2.046, accuracy: 0.280
epoch 1 iter 50, loss: 1.958, accuracy: 0.308
epoch 1 iter 100, loss: 2.173, accuracy: 0.211
epoch 1 iter 150, loss: 1.942, accuracy: 0.315
epoch 1 iter 200, loss: 1.980, accuracy: 0.299
epoch 1 iter 250, loss: 1.906, accuracy: 0.316
epoch 2 iter 0, loss: 1.885, accuracy: 0.335
epoch 2 iter 50, loss: 1.867, accuracy: 0.340
epoch 2 iter 100, loss: 2.227, accuracy: 0.196
epoch 2 iter 150, loss: 2.224, accuracy: 0.196
epoch 2 iter 200, loss: 2.224, accuracy: 0.196
epoch 2 iter 250, loss: 2.222, accuracy: 0.196
epoch 3 iter 0, loss: 2.224, accuracy: 0.196
epoch 3 iter 50, loss: 2.224, accuracy: 0.196
epoch 3 iter 100, loss: 2.224, accuracy: 0.196
epoch 3 iter 150, loss:

In [6]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=32,  # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=32, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 161.557, accuracy: 0.111
epoch 0 iter 70, loss: 2.245, accuracy: 0.196
epoch 0 iter 140, loss: 2.232, accuracy: 0.196
epoch 0 iter 210, loss: 2.225, accuracy: 0.196
epoch 0 iter 280, loss: 2.222, accuracy: 0.196
epoch 1 iter 0, loss: 2.223, accuracy: 0.196
epoch 1 iter 70, loss: 2.219, accuracy: 0.197
epoch 1 iter 140, loss: 2.221, accuracy: 0.197
epoch 1 iter 210, loss: 2.215, accuracy: 0.199
epoch 1 iter 280, loss: 2.215, accuracy: 0.199
epoch 2 iter 0, loss: 2.212, accuracy: 0.199
epoch 2 iter 70, loss: 2.216, accuracy: 0.196
epoch 2 iter 140, loss: 2.202, accuracy: 0.206
epoch 2 iter 210, loss: 2.190, accuracy: 0.223
epoch 2 iter 280, loss: 2.148, accuracy: 0.248
epoch 3 iter 0, loss: 2.147, accuracy: 0.241
epoch 3 iter 70, loss: 2.205, accuracy: 0.204
epoch 3 iter 140, loss: 2.147, accuracy: 0.244
epoch 3 iter 210, loss: 2.120, accuracy: 0.279
epoch 3 iter 280, loss: 1.382, accuracy: 0.555
epoch 4 iter 0, loss: 1.359, accuracy: 0.572
epoch 4 iter 70, loss: 1.

In [8]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=32,  # number of filters
            kernel_size=[7, 7],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=32, # number of filters
            kernel_size=[7, 7],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 109.549, accuracy: 0.159
epoch 0 iter 70, loss: 2.167, accuracy: 0.238
epoch 0 iter 140, loss: 1.648, accuracy: 0.468
epoch 0 iter 210, loss: 1.317, accuracy: 0.586
epoch 0 iter 280, loss: 1.141, accuracy: 0.649
epoch 1 iter 0, loss: 1.123, accuracy: 0.658
epoch 1 iter 70, loss: 1.059, accuracy: 0.678
epoch 1 iter 140, loss: 0.986, accuracy: 0.707
epoch 1 iter 210, loss: 0.961, accuracy: 0.718
epoch 1 iter 280, loss: 0.933, accuracy: 0.728
epoch 2 iter 0, loss: 0.928, accuracy: 0.731
epoch 2 iter 70, loss: 0.945, accuracy: 0.720
epoch 2 iter 140, loss: 0.877, accuracy: 0.748
epoch 2 iter 210, loss: 0.867, accuracy: 0.751
epoch 2 iter 280, loss: 0.872, accuracy: 0.748
epoch 3 iter 0, loss: 0.850, accuracy: 0.756
epoch 3 iter 70, loss: 0.853, accuracy: 0.751
epoch 3 iter 140, loss: 0.823, accuracy: 0.766
epoch 3 iter 210, loss: 0.835, accuracy: 0.762
epoch 3 iter 280, loss: 0.817, accuracy: 0.769
epoch 4 iter 0, loss: 0.822, accuracy: 0.765
epoch 4 iter 70, loss: 0.

In [4]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=2,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=2,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 23.178, accuracy: 0.076
epoch 0 iter 70, loss: 2.093, accuracy: 0.281
epoch 0 iter 140, loss: 1.610, accuracy: 0.465
epoch 0 iter 210, loss: 1.280, accuracy: 0.598
epoch 0 iter 280, loss: 1.092, accuracy: 0.660
epoch 1 iter 0, loss: 1.098, accuracy: 0.667
epoch 1 iter 70, loss: 1.011, accuracy: 0.696
epoch 1 iter 140, loss: 0.911, accuracy: 0.731
epoch 1 iter 210, loss: 0.953, accuracy: 0.717
epoch 1 iter 280, loss: 0.881, accuracy: 0.737
epoch 2 iter 0, loss: 0.893, accuracy: 0.737
epoch 2 iter 70, loss: 0.870, accuracy: 0.749
epoch 2 iter 140, loss: 0.808, accuracy: 0.765
epoch 2 iter 210, loss: 0.810, accuracy: 0.765
epoch 2 iter 280, loss: 0.807, accuracy: 0.765
epoch 3 iter 0, loss: 0.795, accuracy: 0.768
epoch 3 iter 70, loss: 0.787, accuracy: 0.773
epoch 3 iter 140, loss: 0.772, accuracy: 0.780
epoch 3 iter 210, loss: 0.768, accuracy: 0.779
epoch 3 iter 280, loss: 0.758, accuracy: 0.783
epoch 4 iter 0, loss: 0.741, accuracy: 0.785
epoch 4 iter 70, loss: 0.7

In [6]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=3,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=3,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 5.854, accuracy: 0.078
epoch 0 iter 70, loss: 2.155, accuracy: 0.233
epoch 0 iter 140, loss: 1.807, accuracy: 0.384
epoch 0 iter 210, loss: 1.414, accuracy: 0.532
epoch 0 iter 280, loss: 1.183, accuracy: 0.625
epoch 1 iter 0, loss: 1.168, accuracy: 0.635
epoch 1 iter 70, loss: 1.074, accuracy: 0.661
epoch 1 iter 140, loss: 0.978, accuracy: 0.707
epoch 1 iter 210, loss: 0.928, accuracy: 0.723
epoch 1 iter 280, loss: 1.030, accuracy: 0.688
epoch 2 iter 0, loss: 0.925, accuracy: 0.724
epoch 2 iter 70, loss: 0.895, accuracy: 0.736
epoch 2 iter 140, loss: 0.870, accuracy: 0.746
epoch 2 iter 210, loss: 0.826, accuracy: 0.759
epoch 2 iter 280, loss: 0.916, accuracy: 0.726
epoch 3 iter 0, loss: 0.831, accuracy: 0.759
epoch 3 iter 70, loss: 0.839, accuracy: 0.758
epoch 3 iter 140, loss: 0.851, accuracy: 0.753
epoch 3 iter 210, loss: 0.780, accuracy: 0.773
epoch 3 iter 280, loss: 0.851, accuracy: 0.750
epoch 4 iter 0, loss: 0.775, accuracy: 0.775
epoch 4 iter 70, loss: 0.79

In [7]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=100, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 63.278, accuracy: 0.094
epoch 0 iter 70, loss: 2.236, accuracy: 0.188
epoch 0 iter 140, loss: 2.148, accuracy: 0.251
epoch 0 iter 210, loss: 2.089, accuracy: 0.270
epoch 0 iter 280, loss: 2.055, accuracy: 0.270
epoch 1 iter 0, loss: 2.041, accuracy: 0.273
epoch 1 iter 70, loss: 2.006, accuracy: 0.293
epoch 1 iter 140, loss: 1.682, accuracy: 0.444
epoch 1 iter 210, loss: 1.329, accuracy: 0.584
epoch 1 iter 280, loss: 1.145, accuracy: 0.650
epoch 2 iter 0, loss: 1.119, accuracy: 0.660
epoch 2 iter 70, loss: 1.138, accuracy: 0.656
epoch 2 iter 140, loss: 1.068, accuracy: 0.680
epoch 2 iter 210, loss: 0.990, accuracy: 0.702
epoch 2 iter 280, loss: 0.864, accuracy: 0.745
epoch 3 iter 0, loss: 0.925, accuracy: 0.727
epoch 3 iter 70, loss: 0.856, accuracy: 0.750
epoch 3 iter 140, loss: 0.804, accuracy: 0.766
epoch 3 iter 210, loss: 0.839, accuracy: 0.753
epoch 3 iter 280, loss: 0.758, accuracy: 0.782
epoch 4 iter 0, loss: 0.794, accuracy: 0.767
epoch 4 iter 70, loss: 0.7

In [8]:
def my_SVHN_net(x_):    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            strides=1,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            strides=1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=1000, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}    

modified_model_dict = apply_classification_loss(my_SVHN_net)
train_model(modified_model_dict, dataset_generators, epoch_n=100, print_every=70)

epoch 0 iter 0, loss: 95.843, accuracy: 0.062
epoch 0 iter 70, loss: 2.236, accuracy: 0.196
epoch 0 iter 140, loss: 2.178, accuracy: 0.224
epoch 0 iter 210, loss: 2.130, accuracy: 0.254
epoch 0 iter 280, loss: 2.102, accuracy: 0.278
epoch 1 iter 0, loss: 2.105, accuracy: 0.267
epoch 1 iter 70, loss: 2.055, accuracy: 0.290
epoch 1 iter 140, loss: 2.029, accuracy: 0.295
epoch 1 iter 210, loss: 2.012, accuracy: 0.303
epoch 1 iter 280, loss: 1.898, accuracy: 0.347
epoch 2 iter 0, loss: 2.329, accuracy: 0.292
epoch 2 iter 70, loss: 1.970, accuracy: 0.310
epoch 2 iter 140, loss: 1.983, accuracy: 0.306
epoch 2 iter 210, loss: 1.937, accuracy: 0.326
epoch 2 iter 280, loss: 1.950, accuracy: 0.320
epoch 3 iter 0, loss: 1.976, accuracy: 0.306
epoch 3 iter 70, loss: 1.923, accuracy: 0.332
epoch 3 iter 140, loss: 1.261, accuracy: 0.609
epoch 3 iter 210, loss: 1.100, accuracy: 0.666
epoch 3 iter 280, loss: 1.074, accuracy: 0.669
epoch 4 iter 0, loss: 1.028, accuracy: 0.689
epoch 4 iter 70, loss: 0.9

**[Double click here to add your answer]**

From 1.1 we can get that: strides = 1 kernel_size = [5,5] # of filter = 32 and units = 500 and finally the accuraccy = 0.822.

By changing filter = 16, we can get accuraccy = 0.853.

By changing filter = 64, we can get accuraccy = 0.853.
 
By changing kernel—size = [3,3] we can get the accuraccy = 0.832.

By changing kernel—size = [7,7] we can get the accuraccy = 0.806

By changing the strides = 2 we can get accuraccy = 0.817.

By changing the strides = 3 we can get accuraccy = 0.792.

By changing the unites = 100 we can get accuraccy = 0.858.

By changing the units = 1000 we can get accuraccy = 0.856.

## Part 2: Saving and Reloading Model Weights
(25 points)

In this section you learn to save the weights of a trained model, and to load the weights of a saved model. This is really useful when we would like to load an already trained model in order to continue training or to fine-tune it. Often times we save “snapshots” of the trained model as training progresses in case the training is interrupted, or in case we would like to fall back to an earlier model, this is called snapshot saving.


### Q2.1 Defining another network
(10 points)

Define a network with a slightly different structure in `def cnn_expanded(x_)` below. `cnn_expanded` is an expanded version of `cnn_model`. 
It should have: 
- followed by one additional convolutional layer, and 
- followed by one additional pooling layer.

The last fully-connected layer will stay the same.

In [9]:
# Define the new model 
def cnn_expanded(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv3 = tf.layers.conv2d(
            inputs=pool2,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)  
    
    pool3 = tf.layers.max_pooling2d(inputs=conv3, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    pool_flat = tf.contrib.layers.flatten(pool3, scope='pool2flat')
    
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

### Q2.2 Saving and Loading Weights
(15 points)

`new_train_model()` below has two additional parameters `save_model=False, load_model=False` than `train_model` defined previously. Modify `new_train_model()` such that it would 
- save weights after the training is complete if `save_model` is `True`, and
- load weights on start-up before training if `load_model` is `True`.

*Hint:*  take a look at the docs for `tf.train.Saver()` here: https://www.tensorflow.org/api_docs/python/tf/train/Saver#__init__. You probably will be specifying the first argument `var_list` in `cnn_expanded` to accomplish this question.

**Note:** you're welcome to decide how many training epochs to use, if that gets you the same results but faster.

In [13]:
#### Modify this:
def new_train_model(model_dict, dataset_generators, epoch_n, print_every,
                    save_model=False, load_model=False):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver()
        if load_model:
            ## -- ! code required 
            saver.restore(sess,'checkpoint_dir/MyModel')
            print('Model loaded')
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('iteration {:d} {:d}\t loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))
                    
        if save_model:
            ## -- ! code required 
            saver.save(sess, 'checkpoint_dir/MyModel')
            print('Model saved')
            
cnn_expanded_dict = apply_classification_loss(cnn_expanded)

In [14]:
### Hint: call the saver like this: tf.train.Saver(var_list)
### where var_list is a list of TF variables you want to save
new_train_model(cnn_expanded_dict, dataset_generators, epoch_n=100, print_every=70, save_model=True)

iteration 0 0	 loss: 26.153, accuracy: 0.119
iteration 0 70	 loss: 2.223, accuracy: 0.198
iteration 0 140	 loss: 2.198, accuracy: 0.214
iteration 0 210	 loss: 2.081, accuracy: 0.269
iteration 0 280	 loss: 1.930, accuracy: 0.320
iteration 1 0	 loss: 1.922, accuracy: 0.324
iteration 1 70	 loss: 1.778, accuracy: 0.374
iteration 1 140	 loss: 1.615, accuracy: 0.450
iteration 1 210	 loss: 1.597, accuracy: 0.453
iteration 1 280	 loss: 1.288, accuracy: 0.593
iteration 2 0	 loss: 1.208, accuracy: 0.623
iteration 2 70	 loss: 1.069, accuracy: 0.674
iteration 2 140	 loss: 0.974, accuracy: 0.704
iteration 2 210	 loss: 0.880, accuracy: 0.735
iteration 2 280	 loss: 0.871, accuracy: 0.742
iteration 3 0	 loss: 0.856, accuracy: 0.744
iteration 3 70	 loss: 0.815, accuracy: 0.755
iteration 3 140	 loss: 0.804, accuracy: 0.763
iteration 3 210	 loss: 0.774, accuracy: 0.767
iteration 3 280	 loss: 0.754, accuracy: 0.776
iteration 4 0	 loss: 0.770, accuracy: 0.769
iteration 4 70	 loss: 0.727, accuracy: 0.783
it

In [16]:
### Hint: call the saver like this: tf.train.Saver(var_list)
### where var_list is a list of TF variables you want to load from the checkpoint 
new_train_model(cnn_expanded_dict, dataset_generators, epoch_n=10, print_every=70, load_model=True)

INFO:tensorflow:Restoring parameters from checkpoint_dir/MyModel
Model loaded
iteration 0 0	 loss: 3.107, accuracy: 0.860
iteration 0 70	 loss: 3.127, accuracy: 0.858
iteration 0 140	 loss: 3.119, accuracy: 0.857
iteration 0 210	 loss: 3.238, accuracy: 0.853
iteration 0 280	 loss: 3.312, accuracy: 0.841
iteration 1 0	 loss: 3.273, accuracy: 0.850
iteration 1 70	 loss: 3.054, accuracy: 0.856
iteration 1 140	 loss: 3.073, accuracy: 0.856
iteration 1 210	 loss: 3.278, accuracy: 0.856
iteration 1 280	 loss: 2.958, accuracy: 0.851
iteration 2 0	 loss: 3.082, accuracy: 0.854
iteration 2 70	 loss: 3.232, accuracy: 0.857
iteration 2 140	 loss: 3.320, accuracy: 0.855
iteration 2 210	 loss: 3.348, accuracy: 0.859
iteration 2 280	 loss: 3.175, accuracy: 0.851
iteration 3 0	 loss: 3.222, accuracy: 0.854
iteration 3 70	 loss: 3.142, accuracy: 0.854
iteration 3 140	 loss: 3.193, accuracy: 0.848
iteration 3 210	 loss: 3.359, accuracy: 0.857
iteration 3 280	 loss: 3.347, accuracy: 0.849
iteration 4 0	

## Part 3: Fine-tuning a Pre-trained Network on CIFAR-10
(20 points)

[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) is another popular benchmark for image classification.
We provide you with modified verstion of the file cifar10.py from [https://github.com/Hvass-Labs/TensorFlow-Tutorials](https://github.com/Hvass-Labs/TensorFlow-Tutorials).


In [17]:
import read_cifar10 as cf10

We also provide a generator for the CIFAR-10 Dataset, yielding the next batch every time next is invoked.

In [18]:
@read_data.restartable
def cifar10_dataset_generator(dataset_name, batch_size, restrict_size=1000):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    X_all_unrestricted, y_all = (cf10.load_training_data() if dataset_name == 'train'
                                 else cf10.load_test_data())
    
    actual_restrict_size = restrict_size if dataset_name == 'train' else int(1e10)
    X_all = X_all_unrestricted[:actual_restrict_size]
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    
    for slice_i in range(math.ceil(data_len / batch_size)):
        idx = slice_i * batch_size
        #X_batch = X_all_padded[idx:idx + batch_size]
        X_batch = X_all_padded[idx:idx + batch_size]*255  
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch.astype(np.uint8), y_batch.astype(np.uint8)

cifar10_dataset_generators = {
    'train': cifar10_dataset_generator('train', 1000),
    'test': cifar10_dataset_generator('test', -1)
}


### Q3.1 Fine-tuning
Let's fine-tune SVHN net on **1000 examples** from CIFAR-10. 
Compare test accuracies of the following scenarios: 
  - Train from scratch on the 1000 CIFAR-10 examples
  - Fine-tuning a pretrained SVHN net (trained on SVHN dataset) on 1000 exampes from CIFAR-10. Use `new_train_model()` defined above to load SVHN net weights, but train on the CIFAR-10 examples.
  
**Note:** you're welcome to decide how many training epochs to use, if that gets you the same results but faster.

**Important:** please do not change the `restrict_size=1000` parameter.

In [20]:
cnn_expanded_dict = apply_classification_loss(cnn_expanded)

## train a model from scratch
new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=100, 
                print_every=10)

iteration 0 0	 loss: 35.235, accuracy: 0.094
iteration 1 0	 loss: 34.882, accuracy: 0.105
iteration 2 0	 loss: 31.594, accuracy: 0.090
iteration 3 0	 loss: 26.190, accuracy: 0.130
iteration 4 0	 loss: 19.750, accuracy: 0.113
iteration 5 0	 loss: 12.925, accuracy: 0.108
iteration 6 0	 loss: 8.208, accuracy: 0.112
iteration 7 0	 loss: 5.369, accuracy: 0.122
iteration 8 0	 loss: 4.191, accuracy: 0.113
iteration 9 0	 loss: 3.790, accuracy: 0.107
iteration 10 0	 loss: 3.399, accuracy: 0.106
iteration 11 0	 loss: 3.103, accuracy: 0.110
iteration 12 0	 loss: 2.857, accuracy: 0.118
iteration 13 0	 loss: 2.638, accuracy: 0.130
iteration 14 0	 loss: 2.489, accuracy: 0.133
iteration 15 0	 loss: 2.408, accuracy: 0.119
iteration 16 0	 loss: 2.360, accuracy: 0.120
iteration 17 0	 loss: 2.328, accuracy: 0.126
iteration 18 0	 loss: 2.310, accuracy: 0.131
iteration 19 0	 loss: 2.290, accuracy: 0.132
iteration 20 0	 loss: 2.269, accuracy: 0.143
iteration 21 0	 loss: 2.256, accuracy: 0.154
iteration 22 0

In [23]:
## fine-tuning SVHN Net using Cifar-10 weights saved in Q2
new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=100, 
                print_every=10, load_model=True)

INFO:tensorflow:Restoring parameters from checkpoint_dir/MyModel
Model loaded
iteration 0 0	 loss: 28.455, accuracy: 0.098
iteration 1 0	 loss: 14.205, accuracy: 0.107
iteration 2 0	 loss: 6.367, accuracy: 0.105
iteration 3 0	 loss: 4.516, accuracy: 0.108
iteration 4 0	 loss: 4.076, accuracy: 0.105
iteration 5 0	 loss: 3.162, accuracy: 0.112
iteration 6 0	 loss: 2.445, accuracy: 0.101
iteration 7 0	 loss: 2.341, accuracy: 0.100
iteration 8 0	 loss: 2.330, accuracy: 0.100
iteration 9 0	 loss: 2.325, accuracy: 0.100
iteration 10 0	 loss: 2.323, accuracy: 0.100
iteration 11 0	 loss: 2.323, accuracy: 0.100
iteration 12 0	 loss: 2.329, accuracy: 0.100
iteration 13 0	 loss: 2.399, accuracy: 0.100
iteration 14 0	 loss: 2.322, accuracy: 0.100
iteration 15 0	 loss: 2.320, accuracy: 0.100
iteration 16 0	 loss: 2.320, accuracy: 0.100
iteration 17 0	 loss: 2.319, accuracy: 0.100
iteration 18 0	 loss: 2.319, accuracy: 0.100
iteration 19 0	 loss: 2.318, accuracy: 0.100
iteration 20 0	 loss: 2.318, a

## Part 4: TensorBoard Visualization
(30 points)

[TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) is a very helpful tool for visualization of neural networks. 

Present at least one visualization for each of the following:
  - Filters
  - Loss
  - Accuracy
  - Feature map  

Modify code you have wrote above to also have summary writers. To  run tensorboard, the command is `tensorboard --logdir=path/to/your/log/directory`.

Please notice that there may be some difficulty to run the tensorboard on SCC and you may want to run it locally.

In [None]:
# Filter, loss, accuracy, and feature map visualizations

def visualize():
    log_dir = 'Tensorboard'
    dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
    }
    model_dict = apply_classification_loss(SVHN_net_v0)
    #train_model(model_dict, dataset_generators, epoch_n=50, print_every=20)
    
    with model_dict['graph'].as_default(), tf.Session() as sess:
        #filters of each conv layer
        filters1 = tf.get_default_graph().get_tensor_by_name('conv1/kernel:0')
        filters1_t = tf.transpose(filters1, [3, 0, 1, 2])
        tf.summary.image('conv1/filters1', filters1_t, max_outputs=1)
        
        #filters2 = tf.get_default_graph().get_tensor_by_name('conv2/kernel:0')
        #filters2_t = tf.transpose(filters2, [3, 0, 1, 2])
        #tf.summary.image('conv2/filters2', filters2_t, max_outputs=1)
        #loss and accuracy
        tf.summary.scalar('loss', model_dict['loss'])
        tf.summary.scalar('accuracy', model_dict['accuracy'])
        #feature map
        feature1 = tf.gather(tf.get_default_graph().get_tensor_by_name('conv1/Relu:0'),[0])
        features_t = tf.transpose(feature1, [3, 1, 2, 0])
        tf.summary.image('conv1/feature1', features_t, max_outputs = 1)
        
        feature2 = tf.gather(tf.get_default_graph().get_tensor_by_name('conv2/Relu:0'),[0])
        features_t = tf.transpose(feature2, [3, 1, 2, 0])
        tf.summary.image('conv1/feature2', features_t, max_outputs=1)
        
        img = tf.get_default_graph().get_tensor_by_name('Placeholder:0')
        img = tf.gather(img, [0])
        tf.summary.image('input', img, max_outputs=1)
        
        merged = tf.summary.merge_all()
        
        test_writer = tf.summary.FileWriter(log_dir+'/test', sess.graph)
        
        sess.run(tf.global_variables_initializer())

        
        for epoch_i in range(20):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % 287 == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                                         
                        # Vistualize testing
                        summary_test = sess.run(merged, feed_dict=test_feed_dict)
                        test_writer.add_summary(summary_test, epoch_i)
                        
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('iteration {:d} {:d}\t loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))
    

            
visualize()

iteration 0 0	 loss: 141.749, accuracy: 0.159
iteration 1 0	 loss: 2.239, accuracy: 0.196
iteration 2 0	 loss: 2.210, accuracy: 0.196
iteration 3 0	 loss: 2.208, accuracy: 0.196
iteration 4 0	 loss: 2.225, accuracy: 0.196
iteration 5 0	 loss: 2.194, accuracy: 0.194
iteration 6 0	 loss: 2.089, accuracy: 0.263
iteration 7 0	 loss: 1.005, accuracy: 0.692
iteration 8 0	 loss: 0.870, accuracy: 0.739
iteration 9 0	 loss: 0.804, accuracy: 0.761
iteration 10 0	 loss: 0.695, accuracy: 0.802
iteration 11 0	 loss: 0.712, accuracy: 0.809
iteration 12 0	 loss: 0.783, accuracy: 0.796
iteration 13 0	 loss: 0.795, accuracy: 0.801
iteration 14 0	 loss: 0.779, accuracy: 0.816
iteration 15 0	 loss: 0.809, accuracy: 0.816
iteration 16 0	 loss: 0.829, accuracy: 0.814


## Part 5: Bonus
(20 points)

### Q5.1 SVHN Net ++
Improve the accuracy of SVHN Net beyond that of the provided demo: SVHN Net ++. Report your result and explain why it is improved. (The best result will get the most bonus points!)

In [9]:
def SVHN_plusplus(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=64,  # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=128, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv3 = tf.layers.conv2d(
            inputs=pool2,
            filters=256,  # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu)  
    
    pool3 = tf.layers.max_pooling2d(inputs=conv3, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv4 = tf.layers.conv2d(
            inputs=pool3,
            filters=256,  # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.leaky_relu)  
    
    pool4 = tf.layers.max_pooling2d(inputs=conv3, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    pool_flat = tf.contrib.layers.flatten(pool4, scope='pool2flat')
    
    dense1 = tf.layers.dense(inputs=pool_flat, units=2000, activation=tf.nn.relu)
    dense2 = tf.layers.dense(inputs=dense1, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense2, units=10)
    return logits


dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}

SVHN_plusplus_dict = apply_classification_loss(SVHN_plusplus)
train_model(SVHN_plusplus_dict, dataset_generators, epoch_n=100, print_every=280)

epoch 0 iter 0, loss: 143.839, accuracy: 0.196
epoch 0 iter 280, loss: 0.794, accuracy: 0.762
epoch 1 iter 0, loss: 0.748, accuracy: 0.777
epoch 1 iter 280, loss: 0.584, accuracy: 0.829
epoch 2 iter 0, loss: 0.565, accuracy: 0.834
epoch 2 iter 280, loss: 0.533, accuracy: 0.843
epoch 3 iter 0, loss: 0.495, accuracy: 0.858
epoch 3 iter 280, loss: 0.529, accuracy: 0.843
epoch 4 iter 0, loss: 0.462, accuracy: 0.871
epoch 4 iter 280, loss: 0.463, accuracy: 0.871
epoch 5 iter 0, loss: 0.489, accuracy: 0.872
epoch 5 iter 280, loss: 0.495, accuracy: 0.866
epoch 6 iter 0, loss: 0.526, accuracy: 0.872
epoch 6 iter 280, loss: 0.539, accuracy: 0.863
epoch 7 iter 0, loss: 0.557, accuracy: 0.867
epoch 7 iter 280, loss: 0.545, accuracy: 0.874
epoch 8 iter 0, loss: 0.684, accuracy: 0.849
epoch 8 iter 280, loss: 0.627, accuracy: 0.868
epoch 9 iter 0, loss: 0.594, accuracy: 0.875
epoch 9 iter 280, loss: 0.631, accuracy: 0.871
epoch 10 iter 0, loss: 0.639, accuracy: 0.866
epoch 10 iter 280, loss: 0.597, 

The accuraccy of my SVHN_plusplus is 0.890. I tried to make the network deeper by adding 1 more conv layer, 1 more maxpooling layer and 1 more fully connected layer. I chose to use the $3*3$ conv kernel and increasing the number of filters in the deeper conv layers. I adjust the network according to the development of CNN networks: LeNet, Alexnet, VGG16, RestNet. Increase the accuracy by using deeper network and increasing the non-linearity at the same time avoid  the gradient problems.