# Deep learning HW2
### Name: Bangguo Wang            
### NetID: bangguo2

## Problem 1: 
#### (1)
In the forward propogation with dropout, from the layer $L_l$ to layer $L_{l+1}$, </br > 
$$Z_{l+1}=\frac{W_{l+1}(A_{l}*D_{l})}{keep\_prob}+B_{l+1}$$, where W and B are the parameters, A is the input matrix and D is the dropout matrix.
so if we know the Derivative of $Z_{l+1}: dZ_{l+1}= \frac{\partial Loss}{\partial Z_{l+1}},$ 
then we can develop, $$dA_{l}=\frac{\partial Z_{l+1}}{\partial A_{l}} \times dZ_{l+1} = \frac{W_{l+1}^T*D_{l}}{keep\_prob} \times dZ_{l+1}$$
and because in the forward propogation, $A_{l}=g(Z_{l})$, where g(*) is an activation function,
so $Z_{l}=g^{-1}(A_{l})$,therefore, $$dZ_{l}=\frac{\partial g^{-1}(A_{l})}{\partial Z_{l}}*dA_{l}$$
Therefore, by the values of $dZ_{l}$,we can get the partical derivative of $W_l$ and $B_l$:
$$dW_{l}=dZ_{l}A^{T}_{l-1}$$,$$dB_{l}=dZ_{l}$$  
#### (2)
State the stochastic gradient descent algorithm for this neural network:  
1. Denote the batch size is m, so every time we train the model,we will randomly choose m samples from the train data set as the input
2. Do the forward proprogation with dropout process and calculate the loss (the average of the m samples' loss)
3. Do the backward propagation with dropout, and calculate the average of the m samples' derivative for the perspective of $W$ and $B$
4. Update the parameters $W$ and $B$. 
5. Continue 1~4 steps as describe above,until it reaches the maxinum iterations. 

Specifically,when the m equals to 1, which means in every iteration, just randomly choose one sample to calculate the process described above, then this algorithm is stochastic gradient descent algorithm.

## Problem 2:

From the experiments, without dropout, the test accuracy is 0.9883. 
With dropout, the test accuracy is 0.9932. 
The codes are as follows:

In [4]:
import tensorflow as tf
import numpy as np
import h5py
from sklearn.preprocessing import OneHotEncoder
import random

# load MNIST data
MNIST_data = h5py.File('MNISTdata.hdf5', 'r')
x_train = np.float32(MNIST_data['x_train'][:])
y_train = np.int32(np.array(MNIST_data['y_train'][:, 0]))
x_test = np.float32(MNIST_data['x_test'][:])
y_test = np.int32(np.array(MNIST_data['y_test'][:, 0]))
MNIST_data.close()


# organize the data
X_train = x_train.reshape([-1,28,28,1])
X_test = x_test.reshape([-1,28,28,1])

encoder = OneHotEncoder()
Y_train = encoder.fit_transform(y_train.reshape(-1,1)).todense()
Y_test = encoder.transform(y_test.reshape(-1,1)).todense()




def initialize_W(shape):
    initial = tf.truncated_normal(shape,stddev = np.sqrt(2./(shape[0]+shape[1])))
    W = tf.Variable(initial)
    return W
def initialize_b(shape):
    initial = tf.constant(0.0,shape = shape)
    b = tf.Variable(initial)
    return b


X = tf.placeholder(tf.float32,[None,28,28,1])
# conv layer1
W1 = initialize_W([5,5,1,32])
b1 = initialize_b([32])

conv_1 = tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + b1
print(conv_1)
A_1 = tf.nn.relu(conv_1)
pool_1 = tf.nn.max_pool(A_1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print(pool_1)

# conv layer2
W2 = initialize_W([3,3,32,64])
b2 = initialize_b([64])
conv_2 = tf.nn.conv2d(pool_1, W2, strides=[1, 1, 1, 1], padding='SAME') + b2
A_2 = tf.nn.relu(conv_2)
pool_2 = tf.nn.max_pool(A_2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# conv layer3
W3 = initialize_W([3,3,64,128])
b3 = initialize_b([128])
conv_3 = tf.nn.conv2d(pool_2, W3, strides=[1, 1, 1, 1], padding='SAME') + b3
A_3 = tf.nn.relu(conv_3)
pool_3 = tf.nn.max_pool(A_3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# dropout
keep_prob = tf.placeholder("float")
D3 = tf.nn.dropout(pool_3, keep_prob)

# full connected layer
flat_layer = tf.reshape(pool_3, [-1,2048])
W4 = tf.Variable(tf.truncated_normal([2048,50],stddev = np.sqrt(1./2048)))
b4 = tf.Variable(tf.constant(0.0,shape = [50]))
Z4 = tf.add(tf.matmul(flat_layer,W4),b4)
A4 = tf.nn.relu(Z4)

# dropout
D4 = tf.nn.dropout(A4, keep_prob)

# softmax layer
W5 = tf.Variable(tf.truncated_normal([50,10],stddev = np.sqrt(1./50)))
b5 = tf.Variable(tf.constant(0.0,shape = [10]))
Z5 = tf.add(tf.matmul(D4,W5),b5)
A5 = tf.nn.softmax(Z5)

# loss
Y = tf.placeholder(tf.float32,[None,10])
loss = -tf.reduce_mean(Y*tf.log(tf.clip_by_value(A5,1e-11,1.0)))

# train
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss)

y_pred = tf.arg_max(A5,1)
bool_pred = tf.equal(tf.arg_max(Y,1),y_pred)
accuracy = tf.reduce_mean(tf.cast(bool_pred,tf.float32))


# session
batch_size = 32
sample_size = np.shape(x_train)[0]
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    epoch = 15
    for i in range(epoch):
        index = np.arange(sample_size)
        random.shuffle(index)
        for j in range(sample_size // batch_size):
            start = j * batch_size
            end = min((j + 1) * batch_size, sample_size)
            X_batch = X_train[index[start:end]]
            Y_batch = Y_train[index[start:end]]
            train_step.run(feed_dict = {X: X_batch, Y: Y_batch,keep_prob:0.75})
            if (j%100 == 0):
                print(j)
        if i % 1 == 0:
            train_accuracy = accuracy.eval(feed_dict={X: X_batch, Y: Y_batch, keep_prob:1.0})
            print("Epoch %d, Train accuracy is %g"%(i+1, train_accuracy))

    test_accuracy = accuracy.eval(feed_dict={X: X_test, Y: Y_test,keep_prob:1.0})
    print("Test accuracy is %g" % (test_accuracy))

Tensor("add_3:0", shape=(?, 28, 28, 32), dtype=float32)
Tensor("MaxPool_3:0", shape=(?, 14, 14, 32), dtype=float32)
0


KeyboardInterrupt: 

In [2]:
X_train.shape

(60000, 28, 28, 1)

## Problem 3:

The test accuracy for different layers and different keep_probability, without data augmentation: 

|  | layers = 6 | layers = 10 | layers = 15 
| :------:| :------: | :------: | :------: |
| keep_prob=0.4 | 0.732812 | 0.757812 | 0.760312
| keep_prob=0.7 | 0.742812 | 0.735313 | 0.737187 


The test accuracy for different layers and different keep_probability, with data augmentation: 

|  | layers = 6 | layers = 10 | layers = 15 
| :------:| :------: | :------: | :------: |
| keep_prob=0.4 | 0.811875 | 0.885625 | 0.892188
| keep_prob=0.7 | 0.834688 | 0.865937 | 0.849687   
 
  
One of the codes for the mode.py is as follows:

In [5]:
import tensorflow as tf


def conv(x, w, b, stride, name):
    with tf.variable_scope('conv'):
        return tf.nn.conv2d(x,
                            filter=w,
                            strides=[1, stride, stride, 1],
                            padding='SAME',
                            name=name) + b


######## after 30k iterations (batch_size=64)
# with data augmentation (flip, brightness, contrast) ~81.0%
# without data augmentation 69.6%
def cifar10_conv(X, keep_prob, reuse=False):
    with tf.variable_scope('cifar10_conv'):
        if reuse:
            tf.get_variable_scope().reuse_variables()

        batch_size = tf.shape(X)[0]
        K1 = 32
        K2 = 32
        K3 = 32
        K4 = 48
        K5 = 48
        K6 = 80
        K7 = 80
        K8 = 80
        K9 = 80
        K10 = 80
        K11 = 128
        K12 = 128
        K13 = 128
        K14 = 128
        K15 = 128
        T = 73728
        K16 = 500
        
        W1 = tf.get_variable('D_W1', [3, 3, 3, K1], initializer=tf.contrib.layers.xavier_initializer())
        print(W1)
        B1 = tf.get_variable('D_B1', [K1], initializer=tf.constant_initializer())
        conv1 = conv(X, W1, B1, stride=1, name='conv1')
        bn1 = tf.nn.relu(tf.contrib.layers.batch_norm(conv1))

        W2 = tf.get_variable('D_W2', [3, 3, K1, K2], initializer=tf.contrib.layers.xavier_initializer())
        B2 = tf.get_variable('D_B2', [K2], initializer=tf.constant_initializer())
        conv2 = conv(bn1, W2, B2, stride=1, name='conv2')
        bn2 = tf.nn.relu(tf.contrib.layers.batch_norm(conv2))

        W3 = tf.get_variable('D_W3', [3, 3, K2, K3], initializer=tf.contrib.layers.xavier_initializer())
        B3 = tf.get_variable('D_B3', [K3], initializer=tf.constant_initializer())
        conv3 = conv(bn2, W3, B3, stride=1, name='conv3')
        bn3 = tf.nn.relu(tf.contrib.layers.batch_norm(conv3))
        
        W4 = tf.get_variable('D_W4', [3, 3, K3, K4], initializer=tf.contrib.layers.xavier_initializer())
        B4 = tf.get_variable('D_B4', [K4], initializer=tf.constant_initializer())
        conv4 = conv(bn3, W4, B4, stride=1, name='conv4')
        bn4 = tf.nn.relu(tf.contrib.layers.batch_norm(conv4))
        
        W5 = tf.get_variable('D_W5', [3, 3, K4, K5], initializer=tf.contrib.layers.xavier_initializer())
        B5 = tf.get_variable('D_B5', [K5], initializer=tf.constant_initializer())
        conv5 = conv(bn4, W5, B5, stride=1, name='conv5')
        bn5 = tf.nn.relu(tf.contrib.layers.batch_norm(conv5))
        pooled5 = tf.nn.max_pool(bn5, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1], padding='SAME')
        d5 = tf.nn.dropout(pooled5, keep_prob)
        
        W6 = tf.get_variable('D_W6', [3, 3, K5, K6], initializer=tf.contrib.layers.xavier_initializer())
        B6 = tf.get_variable('D_B6', [K6], initializer=tf.constant_initializer())
        conv6 = conv(d5, W6, B6, stride=1, name='conv6')
        bn6 = tf.nn.relu(tf.contrib.layers.batch_norm(conv6))
        
        W7 = tf.get_variable('D_W7', [3, 3, K6, K7], initializer=tf.contrib.layers.xavier_initializer())
        B7 = tf.get_variable('D_B7', [K7], initializer=tf.constant_initializer())
        conv7 = conv(bn6, W7, B7, stride=1, name='conv7')
        bn7 = tf.nn.relu(tf.contrib.layers.batch_norm(conv7))
        
        W8 = tf.get_variable('D_W8', [3, 3, K7, K8], initializer=tf.contrib.layers.xavier_initializer())
        B8 = tf.get_variable('D_B8', [K8], initializer=tf.constant_initializer())
        conv8 = conv(bn7, W8, B8, stride=1, name='conv8')
        bn8 = tf.nn.relu(tf.contrib.layers.batch_norm(conv8))
        
        W9 = tf.get_variable('D_W9', [3, 3, K8, K9], initializer=tf.contrib.layers.xavier_initializer())
        B9 = tf.get_variable('D_B9', [K9], initializer=tf.constant_initializer())
        conv9 = conv(bn8, W9, B9, stride=1, name='conv9')
        bn9 = tf.nn.relu(tf.contrib.layers.batch_norm(conv9))
        
        W10 = tf.get_variable('D_W10', [3, 3, K9, K10], initializer=tf.contrib.layers.xavier_initializer())
        B10 = tf.get_variable('D_B10', [K10], initializer=tf.constant_initializer())
        conv10 = conv(bn9, W10, B10, stride=1, name='conv10')
        bn10 = tf.nn.relu(tf.contrib.layers.batch_norm(conv10))
        pooled10 = tf.nn.max_pool(bn10, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1], padding='SAME')
        d10 = tf.nn.dropout(pooled10, keep_prob)
        
        
        W11 = tf.get_variable('D_W11', [3, 3, K10, K11], initializer=tf.contrib.layers.xavier_initializer())
        B11 = tf.get_variable('D_B11', [K11], initializer=tf.constant_initializer())
        conv11 = conv(d10, W11, B11, stride=1, name='conv11')
        bn11 = tf.nn.relu(tf.contrib.layers.batch_norm(conv11))
        
        W12 = tf.get_variable('D_W12', [3, 3, K11, K12], initializer=tf.contrib.layers.xavier_initializer())
        B12 = tf.get_variable('D_B12', [K12], initializer=tf.constant_initializer())
        conv12 = conv(bn11, W12, B12, stride=1, name='conv12')
        bn12 = tf.nn.relu(tf.contrib.layers.batch_norm(conv12))
        
        W13 = tf.get_variable('D_W13', [3, 3, K12, K13], initializer=tf.contrib.layers.xavier_initializer())
        B13 = tf.get_variable('D_B13', [K13], initializer=tf.constant_initializer())
        conv13 = conv(bn12, W13, B13, stride=1, name='conv13')
        bn13 = tf.nn.relu(tf.contrib.layers.batch_norm(conv13))
        
        W14 = tf.get_variable('D_W14', [3, 3, K13, K14], initializer=tf.contrib.layers.xavier_initializer())
        B14 = tf.get_variable('D_B14', [K14], initializer=tf.constant_initializer())
        conv14 = conv(bn13, W14, B14, stride=1, name='conv14')
        bn14 = tf.nn.relu(tf.contrib.layers.batch_norm(conv14))
        
        W15 = tf.get_variable('D_W15', [3, 3, K14, K15], initializer=tf.contrib.layers.xavier_initializer())
        B15 = tf.get_variable('D_B15', [K15], initializer=tf.constant_initializer())
        conv15 = conv(bn14, W15, B15, stride=1, name='conv15')
        bn15 = tf.nn.relu(tf.contrib.layers.batch_norm(conv15))
        pooled15 = tf.nn.max_pool(bn15, ksize=[1, 2, 2, 1], strides=[1, 1, 1, 1], padding='SAME')
        d15 = tf.nn.dropout(pooled15, keep_prob)
        
        
        flat = tf.reshape(d15, [batch_size, T])
    
        W16 = tf.get_variable('D_W16', [T, K16], initializer=tf.contrib.layers.xavier_initializer())
        B16 = tf.get_variable('D_B16', [K16], initializer=tf.constant_initializer())
        M16 = tf.matmul(flat, W16) + B16
        bn16 = tf.nn.relu(tf.contrib.layers.batch_norm(M16))
        d16 = tf.nn.dropout(bn16, keep_prob)
        
        
        W17 = tf.get_variable('D_W17', [K16, 10], initializer=tf.contrib.layers.xavier_initializer())
        B17 = tf.get_variable('D_B17', [10], initializer=tf.constant_initializer())
        M17 = tf.matmul(d16, W17) + B17
        output = tf.nn.softmax(M17)
        
        return output