## TensorFlow

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

https://www.tensorflow.org/

#### The Computational Graph

You might think of TensorFlow Core programs as consisting of two discrete sections:

A computational graph is a series of TensorFlow operations arranged into a graph of nodes. Let's build a simple computational graph. Each node takes zero or more tensors as inputs and produces a tensor as an output. 

### Hello, world!
Try running the following code in your Python console to make sure you have TensorFlow properly installed. The console will print "Hello, world!" if TensorFlow is installed. 

One type of node is a constant. Like all TensorFlow constants, it takes no inputs, and it outputs a value it stores internally. We can create a string Tensor hello_constant.

In [1]:
import tensorflow as tf
# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')
print(hello_constant)

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

Tensor("Const:0", shape=(), dtype=string)
b'Hello World!'


Notice that printing the node does not output the value "Hello World" as you might expect. 

Below, there are 2 nodes with integer constants 10 and 20. When evaluated, they produce 10 and 20 respectively. To actually evaluate the nodes, we must run the computational graph within a __session__. A _session_ encapsulates the control and state of the TensorFlow runtime.

The following code creates a Session object and then invokes its run method to run enough of the computational graph to evaluate _a+b_.

In [3]:
a = tf.constant(10)
b = tf.constant(20)
with tf.Session() as sess:
    sum = sess.run(a+b)
    print(sum)

30


This can be applied to list constants as well

In [20]:
c = tf.constant([[1, 2, 3], [4, 5, 6]])
with tf.Session() as sess:
    product = sess.run(a*c)
    print(product)

[[10 20 30]
 [40 50 60]]


As it stands, this graph is not especially interesting because it always produces a constant result. A graph can be parameterized to accept external inputs, known as _placeholders_. A __placeholder__ is a promise to provide a value later.

We can evaluate this graph with placeholders by using the <code>feed_dict</code> parameter to specify Tensors that provide concrete values to these placeholders:

In [2]:
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})
    print(output)

Hello World


You cannot evaluate tensors of different types.

In [6]:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    print(sess.run(x, feed_dict={x: 'Test String', y: 20, z: 45.67}))
    print(sess.run(a + y, feed_dict={x: 'Test String', y: 20, z: 45.67}))
    print(sess.run([x, y], feed_dict={x: 'Test String', y: 20, z: 45.67}))
    print(sess.run(y + z, feed_dict={x: 'Test String', y: 20, z: 45.67}))

Test String
30
[array('Test String', dtype=object), array(20, dtype=int32)]


ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("Placeholder_6:0", dtype=float32)'

To cast a tensor of one type to another, say an integer tensor 1 to a float tensor 1.0, you must use <code>tf.cast()</code>

Tensorflow provides various methods to perform mathematical operations, like <code>tf.subtract</code>, <code>tf.divide</code> and <code>tf.matmul</code>

In [7]:
# Do 10/2 - 1 = 4 
x = tf.constant(10)
y = tf.constant(2)
z = tf.subtract(tf.divide(x, y), tf.cast(tf.constant(1), tf.float64))

with tf.Session() as sess:
    output = sess.run(z)
    print(output)

4.0


To apply the <a href=https://en.wikipedia.org/wiki/Softmax_function>Softmax function</a> to a list of values, you can use __<a href=https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/softmax><code>tf.nn.softmax</code></a>__

In [3]:
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)

softmax = tf.nn.softmax(logits)

with tf.Session() as sess:
    print(sess.run(softmax, feed_dict={logits: logit_data}))

[ 0.65900117  0.24243298  0.09856589]


You can use <code>tf.reduce_sum</code> to calculate <a href=https://en.wikipedia.org/wiki/Cross_entropy>cross entropy</a> for a function.

Tensorflow also provides a helper function __<a href=https://www.tensorflow.org/versions/master/api_docs/python/tf/losses/softmax_cross_entropy><code>tf.losses.softmax_cross_entropy</code></a>__ to get the cross entropy for a list of values.



In [8]:
softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as sess:
    print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))

0.356675


### Classification

You'll be using the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. 

In [10]:
from tensorflow.examples.tutorials.mnist import input_data

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))

def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    return tf.Variable(tf.zeros(n_labels))

def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    return tf.add(tf.matmul(input, w), b)

def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('./datasets/mnist', one_hot=True)

    # In order to make this run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels

In [11]:
# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

Extracting ./datasets/mnist/train-images-idx3-ubyte.gz
Extracting ./datasets/mnist/train-labels-idx1-ubyte.gz
Extracting ./datasets/mnist/t10k-images-idx3-ubyte.gz
Extracting ./datasets/mnist/t10k-labels-idx1-ubyte.gz


In [12]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())

    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {:>4.4f}'.format(l))

Loss: 6.0711


### Mini-Batching
Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

In [14]:
import math
from pprint import pprint

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    outout_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
        
    return outout_batches

example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44'],
    ['F51','F52','F53','F54']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42'],
    ['L51','L52']]

pprint(batches(3, example_features, example_labels))

[[[['F11', 'F12', 'F13', 'F14'],
   ['F21', 'F22', 'F23', 'F24'],
   ['F31', 'F32', 'F33', 'F34']],
  [['L11', 'L12'], ['L21', 'L22'], ['L31', 'L32']]],
 [[['F41', 'F42', 'F43', 'F44'], ['F51', 'F52', 'F53', 'F54']],
  [['L41', 'L42'], ['L51', 'L52']]]]


In [15]:
learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

In [16]:
# Import MNIST data
mnist = input_data.read_data_sets('./datasets/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Extracting ./datasets/mnist/train-images-idx3-ubyte.gz
Extracting ./datasets/mnist/train-labels-idx1-ubyte.gz
Extracting ./datasets/mnist/t10k-images-idx3-ubyte.gz
Extracting ./datasets/mnist/t10k-labels-idx1-ubyte.gz


In [17]:
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

In [18]:
with tf.Session() as sess:
    sess.run(init)
    
    # Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {:>4.3f}'.format(test_accuracy))

Test Accuracy: 0.085


The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times. 