`pip install tensorflow`

 - validate
https://www.tensorflow.org/install/install_mac#ValidateYourInstallation

 - CPU warning? https://github.com/lakshayg/tensorflow-build

Following along with MachineLearning @ Cal

 - notebook https://github.com/mlberkeley/IntroToTensorFlow/blob/master/Basic_Models-Student.ipynb

 - video https://www.youtube.com/watch?v=etNdsIfcMEQ

>A known warning appears.
It is ignorable: https://github.com/h5py/h5py/issues/995

In [1]:
# suppress warning
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=FutureWarning)
    import h5py

In [2]:
import tensorflow as tf

## Feed Forward Network on MNIST

Here we build a simple fully-connected network for MNIST. The network will have 2 hidden layers: 784 input neurons (28x28 shaped mnist), 2x layers with 256 hidden neurons , and 10 output neurons ( 1 for each digit)


Tensorflow provides a convenient interface for MNIST data. This makes it really easy to test your code on a dataset that is commonly used. The code below shows you how to read MNIST images and store the labels as one-hot vectors

#### Q: What is a Feed Forward Network? How is it different from other networks?
https://en.wikipedia.org/wiki/Feedforward_neural_network
https://en.wikipedia.org/wiki/Artificial_neural_network

> The first and simplest type of ANN. Information only moves forward through nodes (input, hidden, output)


#### Q: What makes a layer hidden? 
https://stats.stackexchange.com/a/63163

> The hidden layers' job is to transform the inputs into something that the output layer can use

#### Q: What is a neuron / hidden neuron?
https://en.wikipedia.org/wiki/Perceptron <- First Primitive neuron <br>
https://en.wikipedia.org/wiki/Artificial_neuron
https://en.wikipedia.org/wiki/Sigmoid_function <- special case of logistic function

> Elementary units in a NN, recieve one or more inputs. Sums its inputs to create an output, inputs are weghted. Outputs passed to activation function or transfer function. The transfer functions have a sigmoid shape. 


#### Q: What is a one-hot vector?
https://en.wikipedia.org/wiki/One-hot

> One-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low. A similar implementation in which all bits are '1' except one '0' is sometimes called one-cold.

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
MNIST = input_data.read_data_sets('../data/mnist', one_hot = True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../data/mnist/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../data/mnist/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting ../data/mnist/t10k-images-idx3-ubyte.gz
Extracting ../data/mnist/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


Create placeholders for X and Y.

 - Note that each MNIST image is 28x28. Additionally, the data will already be flattened into a 784 dimensional vector when we input it into the model
 - Each label is 10d - a vector element for every possible digit.
 - Make sure the shapes of the placeholders are defined so a variable number of images and labels can be fed in each batch. This is what index 0 manages. Just put None instead of a dimension in this piece of the net

In [4]:
with tf.name_scope('input'):

    # X is where the images go, taken from a 1-D vector len = 784
    # shape has dim1 = None b/c we want variable batch size
    X = tf.placeholder(tf.float32, shape = [None, 784])
    
    # y has dim2 = 10 b/c of the ten digitis (0 -> 9)
    # for 3 it will be zero everywhere and 1 in the threes position
    y = tf.placeholder(tf.float32, shape = [None, 10])


Create a weights variable and a biases variable of the appropriate shapes.

 - Initialize the weights variable from a truncated normal distribution using tf.truncated_normal(...) - this is better than setting weights to zero because it removes symmetry from backpropagation. Here's a more in depth discussion
 - The bias variable should also be set to a small value, such as 0.1. Do this by using tf.constant(...) and inputting the value and the appropriate shape
 - When you multiply the feature vector X and the weights variable, the result should be the same shape as the bias tensor so they can be added
 - Make sure to use tf.matmul() when multiplying matrices. Using * will multiply element wise
 
Declare each layer in the network and the final logits by:
 - Creating variables for weights and biases of the appropriate sizes
 - Applying ReLu on $X \cdot W + b$
Network Configurations:
 - First layer has 784 input features and 256 output features
 - Second layer has 256 input features and 256 output features
 - Third layer has 256 input features and 10 output features


#### Note: What does ReLu do?

> Something about nonlinearity

In [5]:
with tf.name_scope('network'):
    # First Layer (Hidden Layer)
    W1 = tf.Variable(tf.truncated_normal([784, 256], stddev = 0.1)) # weights
    b1 = tf.Variable(tf.constant(0.1, shape = [1, 256])) # bias
    layer1 = tf.nn.relu(tf.matmul(X, W1) + b1) # layer
    # + operator is broadcast addition
    
    # Second Layer (Hidden Layer)
    W2 = tf.Variable(tf.truncated_normal([256, 256], stddev = 0.1))
    b2 = tf.Variable(tf.constant(0.1, shape = [1, 256])) 
    layer2 = tf.nn.relu(tf.matmul(layer1, W2) + b2) 
    
    # Third Layer (Output Layer)
    W_out = tf.Variable(tf.truncated_normal([256, 10], stddev = 0.1))
    b_out = tf.Variable(tf.constant(0.1, shape = [1, 10])) 
    logits = tf.matmul(layer2, W_out) + b_out

#### Note: Didnt use relu on the third layer because we will be adding nonlinearlity in the next cell and dont want to do it twice

Compute the entropy using tf.nn.softmax_cross_entropy_with_logits. This will apply the softmax function to the logits before calculating the entropy. The loss as the mean over the entropy.

In [6]:
with tf.name_scope('cross_entropy_loss'):
    entropy = tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = y)
    loss = tf.reduce_mean(entropy)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



Declare the optimizer as the GradientDescentOptimizer with an appropriate learning rate. Set it to minimize the loss.
 - Note: When running the optimizer, if the loss is nan or increasing with each epoch, try decreasing the learning rate

In [7]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01).minimize(loss)

Compute the accuracy by:
 - using tf.equal on the predicted label and the true label
 - casting that to a float and computing the mean over all examples

In [8]:
Y_pred = tf.nn.softmax(logits)
y_pred_cls = tf.argmax(Y_pred, 1)
y_cls = tf.argmax(y, 1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(y_pred_cls, y_cls), tf.float32))

Create summaries for Tensorboard

In [9]:
tf.summary.scalar('loss', loss)
tf.summary.scalar('accuracy', accuracy)
tf.summary.histogram('Weight1', W1)
tf.summary.histogram('bias1', b1)

tf.summary.histogram('Weight2', W2)
tf.summary.histogram('bias2', b2)

tf.summary.histogram('Weightout', W_out)
tf.summary.histogram('biasout', b_out)

<tf.Tensor 'biasout:0' shape=() dtype=string>


Merge all the summaries together so they can called easily

In [10]:
summary_op = tf.summary.merge_all()

Start an Interactive Session and initialize all the global variables.
 - For each epoch, run the optimizer on each X,y pair and sum up the loss over all data points
 - Print the loss after each epoch
We set the batch size to 128 and epochs to 25. Feel free to play around with these variables. Additionally, every 5 epochs we calculate validation accuracy

While running this cell, run tensorboard in terminal and view in another browser tab (port 6006)

`tensorboard --logdir=logs/train3`

In [11]:
batch_size = 128
epochs = 30
sess = tf.InteractiveSession()
writer = tf.summary.FileWriter('logs/train3', graph = tf.get_default_graph())
sess.run(tf.global_variables_initializer())
n_nbatches = (int) (MNIST.train.num_examples / batch_size)

for i in range(epochs):
    total_loss = 0
    
    for batch in range(n_nbatches):
        X_batch, y_batch = MNIST.train.next_batch(batch_size)
        o, l, summary = sess.run([optimizer, loss, summary_op], feed_dict = {X: X_batch, y: y_batch})
        writer.add_summary(summary, i*n_nbatches + batch)
        total_loss += l
    print("Epoch {0}: {1}".format(i, total_loss))

Epoch 0: 395.8924269974232
Epoch 1: 183.04771149158478
Epoch 2: 150.46270795166492
Epoch 3: 133.85742942988873
Epoch 4: 123.47373594343662
Epoch 5: 114.7441264167428
Epoch 6: 107.80778257548809
Epoch 7: 102.40609376132488
Epoch 8: 97.07909911870956
Epoch 9: 92.80356278270483
Epoch 10: 88.9181501492858
Epoch 11: 84.59605491161346
Epoch 12: 81.6367072686553
Epoch 13: 78.58499363064766
Epoch 14: 75.24895284324884
Epoch 15: 73.02678813785315
Epoch 16: 70.46018522232771
Epoch 17: 67.78374381363392
Epoch 18: 66.3462533429265
Epoch 19: 63.916351579129696
Epoch 20: 62.017591655254364
Epoch 21: 60.29782583191991
Epoch 22: 58.39210659265518
Epoch 23: 56.858104076236486
Epoch 24: 55.28392601758242
Epoch 25: 53.64414206892252
Epoch 26: 52.65501984208822
Epoch 27: 51.16809895262122
Epoch 28: 49.9548333697021
Epoch 29: 48.54450298845768


In [12]:
X_batch, y_batch = MNIST.test.next_batch(MNIST.test.num_examples)
final_accuracy = sess.run(accuracy, feed_dict = {X: X_batch, y: y_batch})
final_accuracy

0.9621