## Intro to TensorFlow

Check TensorFlow installation:

In [1]:
import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

Hello World!


* Values are encapsulated in tensors

* A session is an environment for running a graph

In [2]:
# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

When feeding non-constant data use `tf.placeholder()` and `feed_dict`.

`tf.placeholder()` returns a tensor that gets its value from data passed to the `tf.Session.run` function, allowing to set the input right before the session runs.

`feed_dict` is a parameter of `tf.Session.run` to set the _placeholder_ tensor.

In [6]:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})
    print output

Test String


In [12]:
# Math

a = tf.add(5, 2) # 7
s = tf.subtract(10, 4) # 6
m = tf.multiply(2, 5)  # 10

with tf.Session() as sess:
    print sess.run(a)
    print sess.run(s)
    print sess.run(m)

7
6
10


In [15]:

x = tf.constant(10)
y = tf.constant(2)
#z = tf.subtract(tf.divide(x,y), 1) 
z = tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1), tf.float64))

# TODO: Print z from a session
with tf.Session() as sess:
    print(sess.run(z))

4.0


## Logistic classifier

Linear matrix multiplication to generate predictions.

$$
y = xW + b
$$

Output are probabilities from softmax function.

The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified.

The `tf.Variable` class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. 

`tf.global_variables_initializer()` function initializes the state of all variable tensors.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it, it also prevents any one weight from overwhelming other weights. Use the `tf.truncated_normal()` function to generate random numbers from a normal distribution.

`weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))`

Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias.

`bias = tf.Variable(tf.zeros(n_labels))`

### Quiz

```

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    # TODO: Return biases
    return tf.Variable(tf.zeros(n_labels))


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    return tf.add(tf.matmul(input, w),b)

'''
-------------------------------------------------------------------
'''

from tensorflow.examples.tutorials.mnist import input_data
from quiz import get_weights, get_biases, linear


def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    # TODO: Initialize session variables
    session.run(tf.global_variables_initializer())
    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

```

## Softmax function

The output of the Softmax function is equivalent to a categorical probability. In Tensorflow use:

[`tf.nn.softmax()`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax)

### Quiz:

```
def run():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)
    
    # TODO: Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)    
    
    with tf.Session() as sess:
        # TODO: Feed in the logit data
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output
```

In [2]:
# One-hot encoding (using scikit-learn)

import numpy as np
from sklearn import preprocessing

# Example labels
labels = np.array([1,5,3,2,1,4,2,1,3])

# Create the encoder
lb = preprocessing.LabelBinarizer()

# Here the encoder finds the classes and assigns one-hot vectors 
lb.fit(labels)

# And finally, transform the labels into one-hot encoded vectors
lb.transform(labels)

array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0]])

## Crossentropy Loss Function

Creating Cross Entropy function using `tf.reduce_sum()` - sums numbers in array-, and `tf.log()` - natural log -

In [11]:


#softmax_data = [0.7, 0.2, 0.1]
#one_hot_data = [1.0, 0.0, 0.0]
softmax_data = [0.27, 0.11, 0.33, 0.10, 0.19]
one_hot_data = [0, 0, 0, 1, 0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# TODO: Print cross entropy from session
with tf.Session() as sess:
    print sess.run(tf.negative(tf.reduce_sum(tf.multiply(tf.log(softmax), one_hot))), 
                   feed_dict={softmax: softmax_data, one_hot: one_hot_data})


'''
ALTERNATIVE:

cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as sess:
    print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))
'''

# 0.356675

2.30259


'\nALTERNATIVE:\n\ncross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))\n\nwith tf.Session() as sess:\n    print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))\n'

## Minimize the loss function

Minimize the average cross-entropy.

## Stochastic Gradient Descent (SGD)

Gradient Descent doesn't scale well, so in practice SGD is used instead.

Use random batches of the data, instead of all the data, to determine the direction in the solution space. Less expensive computations but need much more of them. At the end this works better.

* Needs normalized inputs (mean = 0 and equal -small- variance)

* Initialize with random weights, also normalized with small variance

* Keep running average of the gradients and use it to rectify the direction that specifies the current batch. This is also called 'momentum'.

* Use of 'learning rate decay'. Reduce the step size as training progresses.

Always try to lower learning rate first when tuning SGD. ADAGRAD is one of the models that reduces hyperparameter tuning complexity.

## Minibatching

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch

In [3]:
# First time download of data
# from: https://www.tensorflow.org/get_started/mnist/beginners

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [4]:
import numpy as np

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

In [5]:
# Calculate the memory size of train_features, train_labels, weights, and bias in bytes. 
# (just calculate the memory required for the stored data)

print(train_features.nbytes)
print(train_labels.nbytes)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(weights).nbytes)
    print(sess.run(bias).nbytes)

172480000
2200000
31360
40


## TensorFlow minibatching

Create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7\*128 + 1\*104 = 1000)

```
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
```

The `None` dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.This setup allows to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.


In [7]:
feature_dim = [50000, 400]
labels_dim = [50000,10]
batch_size = 128

def total_batches(feature_dim, batch_size):
    #ref: https://stackoverflow.com/questions/2356501/how-do-you-round-up-a-number-in-python
    return int(feature_dim[0]/batch_size) + (feature_dim[0]%batch_size > 0)

print "How many batches are there?"
print total_batches(feature_dim, batch_size)

def last_batch_size(feature_dim, batch_size):
    return feature_dim[0] - (total_batches(feature_dim, batch_size) - 1) * batch_size

print "What is the last batch size?"
print last_batch_size(feature_dim, batch_size)

How many batches are there?
391
What is the last batch size?
80


In [11]:
# Mini-batching implementation

# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

def batches(batch_size, features, labels):
    assert len(features) == len(labels)
    batched_data = []
    total_samples = len(features)
    for init_index in range(0, total_samples, batch_size):
        cut_index = init_index + batch_size
        current_batch = [features[init_index:cut_index], labels[init_index:cut_index]]
        batched_data.append(current_batch)
    return batched_data


example_batches = batches(3, example_features, example_labels)

print example_batches

[[[['F11', 'F12', 'F13', 'F14'], ['F21', 'F22', 'F23', 'F24'], ['F31', 'F32', 'F33', 'F34']], [['L11', 'L12'], ['L21', 'L22'], ['L31', 'L32']]], [[['F41', 'F42', 'F43', 'F44']], [['L41', 'L42']]]]


In [12]:
learning_rate = 0.001


# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Test Accuracy: 0.0930999964476


The code in one cell:

```
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

def batches(batch_size, features, labels):
    assert len(features) == len(labels)
    batched_data = []
    total_samples = len(features)
    for init_index in range(0, total_samples, batch_size):
        cut_index = init_index + batch_size
        current_batch = [features[init_index:cut_index], labels[init_index:cut_index]]
        batched_data.append(current_batch)
    return batched_data

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))
```

## Epochs

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data.

In [14]:
def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))


#n_input = 784  # MNIST data input (img shape: 28*28)
#n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
#mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
#features = tf.placeholder(tf.float32, [None, n_input])
#labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
#weights = tf.Variable(tf.random_normal([n_input, n_classes]))
#bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
#logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 80
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Epoch: 0    - Cost: 11.8     Valid Accuracy: 0.076
Epoch: 1    - Cost: 10.6     Valid Accuracy: 0.0902
Epoch: 2    - Cost: 9.55     Valid Accuracy: 0.103
Epoch: 3    - Cost: 8.73     Valid Accuracy: 0.116
Epoch: 4    - Cost: 8.05     Valid Accuracy: 0.134
Epoch: 5    - Cost: 7.48     Valid Accuracy: 0.15 
Epoch: 6    - Cost: 6.99     Valid Accuracy: 0.166
Epoch: 7    - Cost: 6.57     Valid Accuracy: 0.186
Epoch: 8    - Cost: 6.21     Valid Accuracy: 0.2  
Epoch: 9    - Cost: 5.88     Valid Accuracy: 0.219
Epoch: 10   - Cost: 5.6      Valid Accuracy: 0.237
Epoch: 11   - Cost: 5.36     Valid Accuracy: 0.253
Epoch: 12   - Cost: 5.16     Valid Accuracy: 0.267
Epoch: 13   - Cost: 4.97     Valid Accuracy: 0.28 
Epoch: 14   - Cost: 4.81     Valid Accuracy: 0.298
Epoch: 15   - Cost: 4.67     Valid Accuracy: 0.318
Epoch: 16   - Cost: 4.54     Valid Accuracy: 0.333
Epoch: 17   - Cost: 4.42     Valid Accuracy: 0.35 
Epoch: 18   - Cost: 4.31     Valid Accuracy: 0.363
Epoch: 19   - Cost: 4.21     V