# SYS ENG 6213 - Deep Learning and Advanced Neural Networks 

The dataset you will be using here is CIFAR-10 which has 60000 images of shape 32x32x3 and 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). For now, we will concentrate on tensorboard and activation functions.  

In [None]:
# import necessary libraries
import tensorflow as tf
import numpy as np
import datetime
import matplotlib.pyplot as plt
# import necessary functions
from utilities import *

In [None]:
# load the data
maybe_download_and_extract()
train_,tr_target,_,_ = load_CIFAR10_data()

In [None]:
#view a few examples
img_idx = np.random.choice(40000, 9, replace=False)
sample = train_[img_idx].reshape(9, 3, 32, 32).transpose(0,2,3,1).astype("uint8")
fig, axes1 = plt.subplots(3,3,figsize=(3,3))
i=0
for j in range(3):
    for k in range(3):
        #i = np.random.choice(range(len(sample)),replace=False)
        axes1[j][k].set_axis_off()
        axes1[j][k].imshow(sample[i])
        i=i+1
plt.show()

The below block has all the values initialized for the neural network model. Assign following names to the values:
1. 'X to input
2. 'Y' to labels
3. 'w1' to weights of input layer
4. 'b1' to bias of input layer
5. 'w2' to weights of output layer
6. 'b2' to bias of output layer

In addition to naming the values, we can also build histograms for values in a graph using **tf.summary.histogram(name_for_histogram,value)**. This is useful to understand how values such as weights/biases change during training. Using the function given, generate summaries for weights as follows:
1. 'W_input' for input layer weights
2. 'W_output' for output layer weights

In [None]:
# Declare the necessary values which will be used for training.
input_size = 32*32*3
hidden_size = 75
batch_size = 200
learning_rate = 7.5e-4
parameters = {}

### Add names to values below ###
x = tf.placeholder(tf.float32, shape=(None,input_size))
y = tf.placeholder(tf.float32,shape=(None,10))
parameters['w1'] = tf.Variable(tf.random_normal([input_size,hidden_size]))
parameters['b1'] = tf.Variable(tf.random_normal([hidden_size]))
parameters['w2'] = tf.Variable(tf.random_normal([hidden_size,10]))
parameters['b2'] = tf.Variable(tf.random_normal([10]))
#### End of naming values ####    

### generate histograms for weights ###

#### End of histogram code code ####

parameters_for_later_use = parameters # you will use this for question 2

Now that you named all the values in the graph, the next step is to generate scopes to cluseter operations together for better visualizations. The syntax for building a scope is: **with tf.name_scope(scope_name):**

The **two_layered_nn** function below contains the forward pass of the model. Complete the code for forward pass and assign the following scopes:
1. 'input_layer' to {hi = w1.x+b1 and ho = ReLu(hi)}
2. 'output_layer' to {scores = w2.ho+b2}


In [None]:
# Write a function for forward pass
def two_layered_nn(data,hidden_size,parameters):
    # Forward pass steps
    ### Type your code here ####
    
    #### End of your code ####
        return(scores)

The train function trains the two layered neural network.
Complete the function as per the following specifications:
1. use softmax and cross entropy for cost
2. use stochastic gradient descent for learning
3. Since cost is scalar, use **tf.summary.scalar(name_for_plot,value)** to build a plot for cost.
4. Build a plot for accuracy in a similar manner as cost is also a scalar.
5. The **tf.summary.merge_all()** initializes all the summaries defined in the graph. Since there is no need to generate summaries for all iterations, we decided to generate for every 10 iterations. Your job is to run a session on the output given by tf.summary.merge_all() and add it to log_writer using **writer_name.add_summary(session_name,iteration)**

In [None]:
# function to train the neural network
def train(x,y,learning_rate,batch_size,hidden_size,train_,tr_target):
    
    scores = two_layered_nn(x,hidden_size,parameters)
    
    # Type code for both cost and accuracy
    ### Start your code here ###
    
    #### End of your code ####
     
    
    # Start a session to run the graph
    with tf.Session() as sess:
        log_writer = tf.summary.FileWriter("logs/",sess.graph) # defining the writer for tensorboard
        merged = tf.summary.merge_all() # initialize all summaries
        sess.run(tf.initialize_all_variables()) # initialize all variables
        
        for it in range(1000): 
            loss_history = []
            # generate random batches
            idx = np.random.choice(40000, batch_size, replace=True)
            ex = train_[idx]
            ey = tr_target[idx]  
            # run a session on optimizer and cost
            _,c = sess.run([optimizer,cost],feed_dict = {x:ex,y:ey}) 
            
            # run a session on variable 'merged' to get summaries
            if it%10 == 0:
                ### Type your code here ###
                
                #### End of your code ####
                log_writer.flush()
                
            loss_history.append(c)
            if it%50 == 0:
                print('iteration ',it, 'completed out of 1000, loss: ',c)
        log_writer.close()
    

In [None]:
# Call the train function to buil and execute the graph
start_time = datetime.datetime.now()
train(x,y,learning_rate,batch_size,hidden_size,train_,tr_target)
print('Total execution time: '+ str(datetime.datetime.now() - start_time))

Now go to command prompt and type **tensorboard --logdir=path --debug**.
It will give you a local host link which you can copy and open in a browser (chrome preferred over firefox).
You can now view the following:
1. model under graphs tab
2. plots under scalars tab
3. histograms under histograms tab

#### Q2. Explore activation functions

So far, you have been using ReLu activation function. In lecture#4, we discussed about other activation functions like sigmoid, tanh and variations of ReLu like PReLu. A few of them are shown below:

![Activation Functions](activations.png)

Even though sigmoid and tanh are very famous activation functions, when it comes to deep learning models, ReLu or its variants are preferred. There are two main reasons for this (second being the most important one):
1. Relu (including its variants) is mostly linear and hence it is faster than other fuctions such as tanh/sigmoids which need exponentials
2. Gradients passing through ReLu do not get saturated.

In order to show that Relu is faster, modify your activation function for the code in Q1 to **sigmoid** and use the weights & biases stored in **parameters_for_later_use** dictionary to train the model. You will observe that the time taken will be few seconds more for sigmoid. When we deal with neural networks with hundreds of layers, this difference will be somewhat prominent.

In [None]:
### Run the code for Q1 using sigmoid function here ###




The next advantage which is more important than speed is that gradients passing through ReLu do no get saturated. In order to understand what it means, complete the **sigmoid_backward** function. Then execute the following function calls given in next block.

Note: The sigmoid function for an input $x$ is given as $s(x) = \frac{1}{1+e^{-x}}$. The derivative of sigmoid is given as $\frac{ds(x)}{dx}= s(x)(1-s(x))$. The proof can be found at [sigmoid derivation](http://www.ai.mit.edu/courses/6.892/lecture8-html/sld015.htm)

In [None]:
def sigmoid_backward(grad):
    """
    Function to calculate backward pass for sigmoid
    Inputs: 
    grad: numpy array of any shape
    Outputs:
    out: derivative of grad
    """
    ### Type your code here ###
    
    #### End of your code ####
    return(out)

In [None]:
grad=np.asarray([[0.98, 0.96, 0.31, 0],[1, 0.58, 0.01, 0.92]])
sigmoid_backward(grad)

Notice that the gradient values which were given as input to sigmoid_backward are mostly at the extremes i.e. near 0 or near 1 except for 0.31 and 0.58. 