## Intent to use this jupyter for myself, quick refresher / class-notes for course material studied for  Neural Networks from Udacity [ & not to reproduce any material as such ..]

This guide DOES NOT go in depth explaining fundamentals, & may appear be in arbitraty sequence to follow, as it is aligned to class labs. One should go through proper course material prior using this guide. I highly recommend Udacity nanodegree course(s) as right medium to learn this subject.

Content Credit: Udacity

## Hello World!

In [1]:
import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

b'Hello World!'


## Let's understand what is going under the hood

**Tensor**

In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:

1. A is a 0-dimensional int32 tensor <br>
A = tf.constant(1234) 
2. B is a 1-dimensional int32 tensor <br>
B = tf.constant([123,456,789]) 
3. C is a 2-dimensional int32 tensor <br>
C = tf.constant([ [123,456,789], [222,333,444] ])

tf.constant() is one of many TensorFlow operations you will use in this lesson. The tensor returned by tf.constant() is called a constant tensor, because the value of the tensor never changes.

**Session**

TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process which you learned about in the MiniFlow lesson. Let’s take the TensorFlow code you ran and turn that into a graph:

<img src="images/session.png" alt="Drawing" style="width: 500px;"/>

A "TensorFlow Session", as shown above, is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.

>with tf.Session() as sess:<br>
>>output = sess.run(hello_constant)<br>
>>print(output)

The code has already created the tensor, hello_constant, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, sess, using tf.Session. The sess.run() function then evaluates the tensor and returns the results.

After you run the above, you will see the following printed out:

>'Hello World!'


What if you want to use a non-constant? This is where tf.placeholder() and feed_dict come into place.

** tf.placeholder() **

Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you'll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()!


tf.placeholder() returns a tensor that gets its value from data passed to the tf.session.run() function, allowing you to set the input right before the session runs.

In [3]:
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})
    print(output)

Hello World


Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. The above example shows the tensor x being set to the string "Hello, world". It's also possible to set more than one tensor using feed_dict as shown below.

In [5]:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output = sess.run(y, feed_dict={x: 'Test String', y: 123, z: 45.67})
    print (output)

123


Note: If the data passed to the feed_dict doesn’t match the tensor type and can’t be cast into the tensor type, you’ll get the error “ValueError: invalid literal for...”.

** Let's look at Arithmetic Functions **

In [21]:
x = tf.add(5, 2)  # 7
y = tf.subtract(10, 4) # 6
z = tf.multiply(2, 5)  # 10
x = tf.divide(6.0,2.0) #3

**Converting types**
It may be necessary to convert between types to make certain operators work together. For example, if you tried the following, it would fail with an exception:


tf.subtract(tf.constant(2.0),tf.constant(1))  

// Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:


That's because the constant 1 is an integer but the constant 2.0 is a floating point value and subtract expects them to match.


In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the 2.0 to an integer before subtracting, like so, will give the correct result:



In [8]:
tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1

<tf.Tensor 'Sub_1:0' shape=() dtype=int32>

In [18]:
# Another Example 

import tensorflow as tf

x = tf.constant(10)
y = tf.constant(2)
s = tf.divide(x,y)
#z = tf.subtract(s,1)  This works as well but better way is
z = tf.subtract(s, tf.cast(tf.constant(1), tf.float64))

with tf.Session() as sess:
    output = sess.run(z)
    print(output)

4.0


**Linear functions in TensorFlow**

The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as

y = xW + b

Here, W is a matrix of the weights connecting two layers. The output y, the input x, and the biases b are all vectors.


**Weights and Bias in TensorFlow**
The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where tf.Variable class comes in.

tf.Variable()


x = tf.Variable(5)

The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.


**Initialization**
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)


The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.


Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You'll learn more about this in the next lesson, when you study gradient descent.


Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You'll use the tf.truncated_normal() function to generate random numbers from a normal distribution.

    tf.truncated_normal()
    n_features = 120
    n_labels = 5
    weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))


The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.


Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.


    tf.zeros()
    n_labels = 5
    bias = tf.Variable(tf.zeros(n_labels))


The tf.zeros() function returns a tensor with all zeros.

** Linear Classifier Quiz **

<img src="images/mnist-012.png" alt="Drawing" style="width: 500px;"/>


You'll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you'll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.

<img src="images/weights-0-1-2.png" alt="Drawing" style="width: 500px;"/>

The images above are trained weights for each label (0, 1, and 2). The weights display the unique properties of each digit they have found. Complete this quiz to train your own weights using the MNIST dataset.



In [31]:
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
    return weights


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    # TODO: Return biases
    return tf.Variable(tf.zeros(n_labels))

def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    return tf.add(tf.matmul(input,w),b)

In [35]:
import tensorflow as tf
# Sandbox Solution
# Note: You can't run code in this tab
from tensorflow.examples.tutorials.mnist import input_data
#from quiz import get_weights, get_biases, linear


def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        # In simple words we are just getting for first n numbers
        # if n_labels is 3, we we will data for numbers 0, 1, 2
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels (i.e. first 3 numbers or classes)
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    ## Initilaize Variables : Key Step els we we get crazy error:
    ## Attempting to use uninitialized value xxxxx...
    session.run(tf.global_variables_initializer())

    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Loss: 8.99665641784668


### Too much gone in above code, let's study key aspects ..

** TensorFlow Softmax **

The softmax function squashes it's inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It's the perfect function to use as the output activation for a network predicting multiple classes.

<img src="images/softmax-input-output.png" alt="Drawing" style="width: 500px;"/>

TensorFlow Softmax : We're using TensorFlow to build neural networks and, appropriately, there's a function for calculating softmax.

x = tf.nn.softmax([2.0, 1.0, 0.2])

In [53]:
# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf


def run():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)

    softmax = tf.nn.softmax(logits)

    with tf.Session() as sess:
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output

o = run()

print ("Softmax Output: ", o)

sum = 0
for i in range(len(o)):
    sum += o[i]

print ("Total of Softmax Outputs from Tensorflow, it shoud always add to 1 -- : ", sum)

Softmax Output:  [ 0.65900117  0.24243298  0.09856589]
Total of Softmax Outputs from Tensorflow, it shoud always add to 1 -- :  1.0000000447


## Remember Cross Entropy from NN Study??

If not, search here : https://github.com/anshoomehra/Deep-Learning/blob/master/Neural%20Networks%20in%20Nut%20Shell.ipynb

In summary: We take natura log of probablities to avoid multiplication errors with probabilities. Formula shown below, Something to keep in mind ** Higher the Probability results into Lower Cross Entropy i.e. higher values means higher ERROR Rate n vice-versa** 

<img src="images/cross-entropy-diagram.png" alt="Drawing" style="width: 500px;"/>

To create a cross entropy function in TensorFlow, you'll need to use two new functions:

    tf.reduce_sum()
    tf.log()

**Reduce Sum**
    x = tf.reduce_sum([1, 2, 3, 4, 5])  # 15

The tf.reduce_sum() function takes an array of numbers and sums them together.

** Natural Log **
    
    x = tf.log(100.0)  # 4.60517

This function does exactly what you would expect it to do. tf.log() takes the natural log of a number.

So let's code example doing the same .. 

In [61]:
import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# Multiply y [mentioned as one-hot] with natural_log(y_hat) [mentioned as softmax_data]
# Sum all the output from above compution to get overall Cross-Entropy tell us overall error
ce = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as sess:
    output = sess.run(ce, feed_dict={softmax:softmax_data,one_hot:one_hot_data})
    print (output)

0.356675


** Mini-batching ** 

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [99]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

print ("Train Feature Shape: ", train_features.shape)

print ("Train Label Shape: ", train_labels.shape)

print ("Weights Shape: ", weights.shape)

print ("Bias Shape: ", bias.shape)

print ("Total Memory Needed in MB (float is 32 bits wisch is 4 bytes):",\
    (float((train_features.shape[0] * train_features.shape[1] * 4)) +\
    float(train_labels.shape[0] * train_labels.shape[1] * 4) +\
    float(str(weights.shape[0] * weights.shape[1] * 4)) +\
    float(str((bias.shape[0] * 4))))/1000000, "MB" )



Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Train Feature Shape:  (55000, 784)
Train Label Shape:  (55000, 10)
Weights Shape:  (784, 10)
Bias Shape:  (10,)
Total Memory Needed in MB (float is 32 bits wisch is 4 bytes): 174.7114 MB


The total memory space required for the inputs, weights and bias is around 174 megabytes, which isn't that much memory. You could train this whole dataset on most CPUs and GPUs.

But larger datasets that you'll use in the future measured in gigabytes or more. It's possible to purchase more memory, but it's expensive. A Titan X GPU with 12 GB of memory costs over $1,000.

Instead, in order to run large models on your machine, you'll learn how to use mini-batching.

Let's look at how you implement mini-batching in TensorFlow.

**TensorFlow Mini-batching**
In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.

Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].

    # Features and Labels
    features = tf.placeholder(tf.float32, [None, n_input])
    labels = tf.placeholder(tf.float32, [None, n_classes])

What does None do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

In [100]:
import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    #print (len(features), len(labels))
    samples = len(features) 
    output_batches = []
    
    for start_i in range(0, samples , batch_size):
        end_i = start_i + batch_size
        output_batches.append([features[start_i:end_i], labels[start_i:end_i]])
        
        #batch = [features[start_i:end_i], labels[start_i:end_i]]
        #outout_batches.append(batch)
    return output_batches

In [101]:
#from quiz import batches
from pprint import pprint

# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

# PPrint prints data structures like 2d arrays, so they are easier to read
pprint(batches(3, example_features, example_labels))

[[[['F11', 'F12', 'F13', 'F14'],
   ['F21', 'F22', 'F23', 'F24'],
   ['F31', 'F32', 'F33', 'F34']],
  [['L11', 'L12'], ['L21', 'L22'], ['L31', 'L32']]],
 [[['F41', 'F42', 'F43', 'F44']], [['L41', 'L42']]]]


Let's use mini-batching to feed batches of MNIST features and labels into a linear model.

Set the batch size and run the optimizer over all the batches with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.



In [102]:
import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    outout_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
        
    return outout_batches


In [123]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
#from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    i = 0
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        if i % 4 ==0:
            print ("Processing Batch :", i+1)
            
        i += 1
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Processing Batch : 1
Processing Batch : 5
Processing Batch : 9
Processing Batch : 13
Processing Batch : 17
Processing Batch : 21
Processing Batch : 25
Processing Batch : 29
Processing Batch : 33
Processing Batch : 37
Processing Batch : 41
Processing Batch : 45
Processing Batch : 49
Processing Batch : 53
Processing Batch : 57
Processing Batch : 61
Processing Batch : 65
Processing Batch : 69
Processing Batch : 73
Processing Batch : 77
Processing Batch : 81
Processing Batch : 85
Processing Batch : 89
Processing Batch : 93
Processing Batch : 97
Processing Batch : 101
Processing Batch : 105
Processing Batch : 109
Processing Batch : 113
Processing Batch : 117
Processing Batch : 121
Processing Batch : 125
Processing Batch : 129
Processing Batch : 133
Processing B

** Epochs **
An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

In [125]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
#from helper import batches  # Helper function created in Mini-batching section


def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 100
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Epoch: 0    - Cost: 11.6     Valid Accuracy: 0.101
Epoch: 1    - Cost: 10.4     Valid Accuracy: 0.117
Epoch: 2    - Cost: 9.64     Valid Accuracy: 0.134
Epoch: 3    - Cost: 9.03     Valid Accuracy: 0.147
Epoch: 4    - Cost: 8.52     Valid Accuracy: 0.164
Epoch: 5    - Cost: 8.09     Valid Accuracy: 0.178
Epoch: 6    - Cost: 7.71     Valid Accuracy: 0.191
Epoch: 7    - Cost: 7.36     Valid Accuracy: 0.21 
Epoch: 8    - Cost: 7.04     Valid Accuracy: 0.227
Epoch: 9    - Cost: 6.74     Valid Accuracy: 0.244
Epoch: 10   - Cost: 6.46     Valid Accuracy: 0.26 
Epoch: 11   - Cost: 6.19     Valid Accuracy: 0.277
Epoch: 12   - Cost: 5.94     Valid Accuracy: 0.292
Epoch: 13   - Cost: 5.7      Valid Accuracy: 0.307
Epoch: 14   - Cost: 5.48     Valid Accuracy: 0.323
E

## Detailed Project in Tensorflow

https://github.com/anshoomehra/udacity-deep-learning/tree/master/intro-to-tensorflow

## Multilayer Neural Networks with Tensorflow

So we saw good amount of projects to do single layer networks above. Below we will use start dealing with more complexity adding more layers to out network, afterall that is what defines Deep Learning.

** ReLUs **

We studied ReLU activation function during NN class. We will use the same here as activation function for hidden layer.

ReLUs are like switches, by turning-off all negative weights, this helps make our network handle non-linearity, subseuqnetly  handling more complex functions.

So how do we achieve this in tensorflow ..

**TensorFlow ReLUs**
TensorFlow provides the ReLU function as tf.nn.relu(), as shown below.

    # Hidden Layer with ReLU activation function
    hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
    hidden_layer = tf.nn.relu(hidden_layer)

    output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)


The above code applies the tf.nn.relu() function to the hidden_layer, effectively turning off any negative weights and acting like an on/off switch. Adding additional layers, like the output layer, after an activation function turns the model into a nonlinear function. This nonlinearity allows the network to solve more complex problems.

<img src="images/relu-network.png" alt="Drawing" style="width: 500px;"/>

We've seen the linear function tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer']) before, also known as xw + b. Combining linear functions together using a ReLU will give you a two layer network.

In [128]:
### Simple Example of using ReLU Activation
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print session results
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits))

[[  5.11000013   8.44000053]
 [  0.           0.        ]
 [ 24.01000214  38.23999786]]


## Deep Neural Network in TensorFlow

You've seen how to build a logistic classifier using TensorFlow. Now you're going to see how to use the logistic classifier to build a deep neural network.

In [129]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

n_hidden_layer = 256 # layer number of features

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(c))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 47.498840332
Epoch: 0002 cost= 28.323375702
Epoch: 0003 cost= 22.000427246
Epoch: 0004 cost= 14.770172119
Epoch: 0005 cost= 13.637517929
Epoch: 0006 cost= 19.591865540
Epoch: 0007 cost= 13.236281395
Epoch: 0008 cost= 12.072938919
Epoch: 0009 cost= 9.566403389
Epoch: 0010 cost= 8.450783730
Epoch: 0011 cost= 6.087422371
Epoch: 0012 cost= 6.903230190
Epoch: 0013 cost= 7.663883686
Epoch: 0014 cost= 8.361234665
Epoch: 0015 cost= 5.720129967
Epoch: 0016 cost= 5.607900620
Epoch: 0017 cost= 9.265755653
Epoch: 0018 cost= 3.941402674
Epoch: 0019 cost= 6.534650326
Epoch: 

**Save and Restore TensorFlow Models**
Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!

Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver. This class provides the functionality to save any tf.Variable to your file system.

**Saving Variables**
Let's start with a simple example of saving weights and bias Tensors. For the first example you'll just save two variables. Later examples will save all the weights in a practical model.

In [133]:
import tensorflow as tf

# The file path to save the data
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
w = tf.Variable(tf.truncated_normal([2, 3]))
b = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(w))
    print('Bias:')
    print(sess.run(b))

    # Save the model
    saver.save(sess, save_file)

Weights:
[[-0.98569649  0.68323284 -1.91659582]
 [ 0.61796433  1.05859601  0.39688444]]
Bias:
[ 0.62646139 -0.83633977 -0.67695129]


The Tensors weights and bias are set to random values using the tf.truncated_normal() function. The values are then saved to the save_file location, "model.ckpt", using the tf.train.Saver.save() function. (The ".ckpt" extension stands for "checkpoint".)

If you're using TensorFlow 0.11.0RC1 or newer, a file called "model.ckpt.meta" will also be created. This file contains the TensorFlow graph.

**Loading Variables**
Now that the Tensor Variables are saved, let's load them back into a new model.

In [134]:
# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
w = tf.Variable(tf.truncated_normal([2, 3]))
b = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(w))
    print('Bias:')
    print(sess.run(b))

INFO:tensorflow:Restoring parameters from ./model.ckpt
Weight:
[[ 0.15486424 -0.3900055   1.08488607]
 [-1.51948488  0.14477894 -0.08510909]]
Bias:
[ 0.68925333  0.22952777  0.39401624]


You'll notice you still need to create the weights and bias Tensors in Python. The tf.train.Saver.restore() function loads the saved data into weights and bias.

Since tf.train.Saver.restore() sets all the TensorFlow Variables, you don't need to call tf.global_variables_initializer().

**Save a Trained Model**
Let's see how to train a model and save its weights.

First start with a model:

In [135]:
# Remove previous Tensors and Operations
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


In [136]:
import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

Epoch 0   - Validation Accuracy: 0.1242000013589859
Epoch 10  - Validation Accuracy: 0.24860000610351562
Epoch 20  - Validation Accuracy: 0.37940001487731934
Epoch 30  - Validation Accuracy: 0.48339998722076416
Epoch 40  - Validation Accuracy: 0.5432000160217285
Epoch 50  - Validation Accuracy: 0.5896000266075134
Epoch 60  - Validation Accuracy: 0.6223999857902527
Epoch 70  - Validation Accuracy: 0.6484000086784363
Epoch 80  - Validation Accuracy: 0.6732000112533569
Epoch 90  - Validation Accuracy: 0.6941999793052673
Trained Model Saved.


**Load a Trained Model**
Let's load the weights and bias from memory, then check the test accuracy.

In [137]:
saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

INFO:tensorflow:Restoring parameters from ./train_model.ckpt
Test Accuracy: 0.7236999869346619


**Loading the Weights and Biases into a New Model**

Sometimes you might want to adjust, or "finetune" a model that you have already trained and saved.

However, loading saved Variables directly into a modified model can generate errors. Let's go over how to avoid these problems.

**Naming Error**

TensorFlow uses a string identifier for Tensors and Operations called name. If a name is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name <Type>, and then give the name <Type>_<number> for the subsequent nodes. Let's see how this can affect loading a model with a different order of weights and bias:

In [138]:
import tensorflow as tf

# Remove the previous weights and bias
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

Save Weights: Variable:0
Save Bias: Variable_1:0
Load Weights: Variable_1:0
Load Bias: Variable:0
INFO:tensorflow:Restoring parameters from model.ckpt


InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= [3]
	 [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_1, save/RestoreV2_1)]]

Caused by op 'save/Assign_1', defined at:
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 478, in start
    self.io_loop.start()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-138-32cce566247b>", line 29, in <module>
    saver = tf.train.Saver()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1056, in __init__
    self.build()
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1086, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 270, in assign
    validate_shape=validate_shape)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
    use_locking=use_locking, name=name)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/anmehra/Desktop/Anconda_Py2.7/anaconda/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= [3]
	 [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_1, save/RestoreV2_1)]]


You'll notice that the name properties for weights and bias are different than when you saved the model. This is why the code produces the "Assign requires shapes of both tensors to match" error. The code saver.restore(sess, save_file) is trying to load weight data into bias and bias data into weights.

Instead of letting TensorFlow set the name property, let's set it manually:

In [139]:
import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0
Save Bias: bias_0:0
Load Weights: weights_0:0
Load Bias: bias_0:0
INFO:tensorflow:Restoring parameters from model.ckpt
Loaded Weights and Bias successfully.


## TensorFlow Dropout

<img src="images/dropout-node.jpeg" alt="Drawing" style="width: 500px;"/>

Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (artificial neurons) from the network, along with all of those units' incoming and outgoing connections. Figure 1 illustrates how dropout works.

TensorFlow provides the tf.nn.dropout() function, which you can use to implement dropout.

Let's look at an example of how to use tf.nn.dropout().

    keep_prob = tf.placeholder(tf.float32) # probability to keep units

    hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
    hidden_layer = tf.nn.relu(hidden_layer)
    hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

    logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The code above illustrates how to apply dropout to a neural network.

The tf.nn.dropout() function takes in two parameters:

1. hidden_layer: the tensor to which you would like to apply dropout
2. keep_prob: the probability of keeping (i.e. not dropping) any given unit


keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

** Let's Code!. **

** Note: Output will be different every time the code is run. This is caused by dropout randomizing the units it drops. **

In [140]:
# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))

[[  1.10000002   6.60000038]
 [  0.30800003   0.7700001 ]
 [ 43.30000305  48.15999985]]
