# Deep Neural Networks in TensorFlow

![image.png](attachment:image.png)

### Linear Model 

1. Complexity
For N inputs
and K outputs
the total parameters to use are (N+1)K.
But in general we might want to have many many more parameters

2. Linear models are Linear
So the interactions between input are limited. Linear models wont be able to model complex interactions efficiently

3. Numerically Linear models are stable
Small changes in Input can never yield big changes in Output

4. Linear operations are efficient
Big matrix multiplications can be done with GPU

5. Derivates are Stable
Derivates are constant

We want to keep parameters inside big linear functions, but we want to be able to model a non linear function. So there is a need to introduce non-linearity.

### Simplest Non Linear Function : RELU

![image.png](attachment:image.png)

A Rectified linear unit (ReLU) is type of activation function that is defined as f(x) = max(0, x). The function returns 0 if x is negative, otherwise it returns x. TensorFlow provides the ReLU function as tf.nn.relu().

Adding a hidden layer to a network allows it to model more complex functions. Also, using a non-linear activation function on the hidden layer lets it model non-linear functions. So we take logistic classifier, and insert a RELU in the middle.

In [4]:
## Example

# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])
hidden_layer = tf.add(tf.matmul(features,weights[0]),biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
logits = tf.add(tf.matmul(hidden_layer,weights[1]),biases[1])
softmax = tf.nn.softmax(logits)

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    print(sess.run(softmax))
    

[[  3.45562175e-02   9.65443790e-01]
 [  5.00000000e-01   5.00000000e-01]
 [  6.60679859e-07   9.99999285e-01]]


### Regularization to prevent overfitting

1. Early Stopping : 
Check performance on validation set, and terminate once the performance stops improving.
2. Regularization : 
Applying constraints on the network. L2 regularization penalizes large weights.
3. Dropout : 
Randomly set half of the activations (data flowing through network) to zero. Helps the model to learn redundant representations. Acts like an ensemble of networks to improve the performance at the end.

``` python
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

```

The tf.nn.dropout() function takes in two parameters:

+ hidden_layer: the tensor to which you would like to apply dropout
+ keep_prob: the probability of keeping (i.e. not dropping) any given unit

keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

+ During training, a good starting value for keep_prob is 0.5.
+ During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.


    

### Implementation of a Hidden Layer on MNIST Dataset

In [20]:
import tensorflow as tf
import math

In [7]:
# Load the data

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".",one_hot=True,reshape=False)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting .\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting .\train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting .\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting .\t10k-labels-idx1-ubyte.gz


In [21]:
# Learning Parameters

learning_rate = 0.01
training_epochs = 100
batch_size = 128
display_step = 1

n_input = 784
n_classes = 10

In [22]:
# Hidden Layer parameters

n_hidden_layer = 256 #Also called width of a layer

In [23]:
# Weights and Biases

weights = {
    'hidden_layer' : tf.Variable(tf.random_normal([n_input,n_hidden_layer])),
    'output_layer' : tf.Variable(tf.random_normal([n_hidden_layer,n_classes]))
}

biases = { 
    'hidden_layer' : tf.Variable(tf.random_normal([n_hidden_layer])),
    'output_layer' : tf.Variable(tf.random_normal([n_classes]))  
}

In [24]:
# Input

# tf Graph input
x = tf.placeholder("float",[None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

#The MNIST data is made up of 28px by 28px images with a single channel. The tf.reshape() function above reshapes the 28px 
# by 28px matrices in x into row vectors of 784px.

In [25]:
# Multilayer Perceptron

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['output_layer']), biases['output_layer'])

In [26]:
# Define loss and optimizer

cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)
    

In [30]:
# Session

# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
            
