# Tensorflow classification

## Weights and Bias in TensorFlow

The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where **tf.Variable** class comes in.

### tf.Variable() 

In [1]:
import tensorflow as tf

x = tf.Variable(5)

The **tf.Variable** class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the **tf.global_variables_initializer()** function to initialize the state of all the Variable tensors.

### Initialization

In [2]:
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

The **tf.global_variables_initializer()** call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above.

Using the **tf.Variable** class allows us to change the weights and bias, but an initial value needs to be chosen.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You'll learn more about this in the next lesson, when you study gradient descent.

Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You'll use the **tf.truncated_normal()** function to generate random numbers from a normal distribution.

### tf.truncated_normal()

In [3]:
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

In [4]:
print(weights)

Tensor("Variable_1/read:0", shape=(120, 5), dtype=float32)


The **tf.truncated_normal()** function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.


Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.

### tf.zeros()

In [6]:
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

The **tf.zeros()** function returns a tensor with all zeros.

## Linear Classifier Quiz 

You'll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you'll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.
Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.

Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.

The images above are trained weights for each label (0, 1, and 2). The weights display the unique properties of each digit they have found. Complete this quiz to train your own weights using the MNIST dataset.

### Instructions



    Open quiz.py.
        Implement get_weights to return a tf.Variable of weights
        Implement get_biases to return a tf.Variable of biases
        Implement xW + b in the linear function
    Open sandbox.py
        Initialize all weights

Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function instead of tf.multiply(). Don't forget that order matters in matrix multiplication, so tf.matmul(a,b) is not the same as tf.matmul(b,a).

### quiz.py

In [7]:
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
    return weights


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    # TODO: Return biases
    bias = tf.Variable(tf.zeros(n_labels))
    return bias


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    y = tf.add(tf.matmul(input, w), b)
    return y

### sandbox.py 

In [9]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#from quiz import get_weights, get_biases, linear


def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    # TODO: Initialize session variables
    init = tf.global_variables_initializer()
    
    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

PermissionDeniedError: /datasets

## Softmax

The next step is to assign a probability to each label, which you can then use to classify the data. Use the softmax function to turn your logits into probabilities.

We can do this by using the formula above, which uses the input of y values and the mathematical constant "e" which is approximately equal to 2.718. By taking "e" to the power of any real value we always get back a positive value, this then helps us scale when having negative y values. The summation symbol on the bottom of the divisor indicates that we add together all the e^(input y value) elements in order to get our calculated probability outputs.


### Quiz

For the next quiz, you'll implement a softmax(x) function that takes in x, a one or two dimensional array of logits.

In the one dimensional case, the array is just a single set of logits. In the two dimensional case, each column in the array is a set of logits. The softmax(x) function should return a NumPy array of the same shape as x.

In [12]:
# For example, given a one-dimensional array:
import numpy as np

# logits is a one-dimensional array with 3 elements
logits = [1.0, 2.0, 3.0]
# softmax will return a one-dimensional array with 3 elements
# print softmax(logits)

In [13]:
# Given a two-dimensional array where each column represents a set of logits:

# logits is a two-dimensional array
logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
# softmax will return a two-dimensional array with the same shape
# print softmax(logits)

In [32]:
import numpy as np


def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

logits = [3.0, 1.0, 0.2]
print(softmax(logits))

logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
print(softmax(logits))

[ 0.8360188   0.11314284  0.05083836]
[[ 0.09003057  0.00242826  0.01587624  0.33333333]
 [ 0.24472847  0.01794253  0.11731043  0.33333333]
 [ 0.66524096  0.97962921  0.86681333  0.33333333]]


## Tensorflow softmax

Now that you've built a softmax function from scratch, let's see how softmax is done in TensorFlow.

In [33]:
x = tf.nn.softmax([2.0, 1.0, 0.2])

Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.

Use the softmax function in the quiz below to return the softmax of the logits.

In [2]:
import tensorflow as tf

def run():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)
    
    # TODO: Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)   
    
    with tf.Session() as sess:
        # TODO: Feed in the logit data
        output = sess.run(softmax, feed_dict={logits : logit_data} )

    return output

print(run())

[ 0.65900117  0.24243298  0.09856589]


## Tensorflow Cross-Entropy 

In [4]:
import tensorflow as tf

x = tf.reduce_sum([1, 2, 3, 4, 5])  # 15
x = tf.log(100.)  # 4.60517

In [8]:
import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

# TODO: Print cross entropy from session
with tf.Session() as sess:
    output = sess.run(cross_entropy, feed_dict={softmax : softmax_data, one_hot : one_hot_data} )
    print(output)

0.356675


## Numerical stability issues 

In [9]:
a = 1000000000
for i in range(1000000):
    a = a + 1e-6
print(a - 1000000000) # from maths we expect 1.0

0.95367431640625


In [10]:
a = 1
for i in range(1000000):
    a = a + 1e-6
print(a - 1) # from maths we expect 1.0

0.9999999999177334
