Github repo for code: https://github.com/chiphuyen/tf-stanford-tutorials
* All code updated to Tensorflow 1.2 and Python 3 (on 7/11/17)

### Make a simple neural net

In [None]:
## NOTE: This code does not run. Full example here: https://github.com/angoodkind/Tensorboard_demo_tf_v_1.1.0
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

def init_weights(shape, name):
    return tf.Variable(tf.random_normal(shape, stddev=0.01), name=name)

# This network is the same as the previous one except with an extra hidden layer + dropout
def model(X, w_h, w_h2, w_o, p_keep_input, p_keep_hidden):
    ...

#Step 1 - Get Input Data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

#Step 2 - Create input and output placeholders for data
X = tf.placeholder("float", [None, 784], name="X")
Y = tf.placeholder("float", [None, 10], name="Y")

#Step 3 - Initialize weights
w_h = init_weights([784, 625], "w_h")
w_h2 = init_weights([625, 625], "w_h2")
w_o = init_weights([625, 10], "w_o")

#Step 4 - Add histogram summaries for weights
tf.summary.histogram("w_h_summ", w_h)
tf.summary.histogram("w_h2_summ", w_h2)
tf.summary.histogram("w_o_summ", w_o)

#Step 5 - Add dropout to input and hidden layers
p_keep_input = tf.placeholder("float", name="p_keep_input")
p_keep_hidden = tf.placeholder("float", name="p_keep_hidden")

#Step 6 - Create Model
py_x = model(X, w_h, w_h2, w_o, p_keep_input, p_keep_hidden)

#Step 7 Create cost function
with tf.name_scope("cost"):
    ...

#Step 8 Measure accuracy
with tf.name_scope("accuracy"):
    ...
    
#Step 9 Create a session
with tf.Session() as sess:
    # Step 10 create a log writer. run 'tensorboard --logdir=./logs/nn_logs'
    writer = tf.summary.FileWriter("./logs/nn_logs", sess.graph) # for 0.8
    merged = tf.summary.merge_all()

    # Step 11 you need to initialize all variables
    tf.initialize_all_variables().run()

    #Step 12 train the  model
    for i in range(100):
        for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end],
                                          p_keep_input: 0.8, p_keep_hidden: 0.5})
        summary, acc = sess.run([merged, acc_op], feed_dict={X: teX, Y: teY,
                                          p_keep_input: 1.0, p_keep_hidden: 1.0})
        writer.add_summary(summary, i)  # Write summary
        print(i, acc)                   # Report the accuracy

    

### In a terminal

In [None]:
tensorboard --logdir=./logs/nn_logs

![Cost-Accuracy](Images/cost-acc.png)

![Network-Example](Images/network-ex.png)

### Some TensorFlow tidbits
* TF seperates definitions of computation from their execution
    * Session 
        * executes operation
        * encapsulates the environment in which Operation objectd are executed and Tensor objects are evaluated
* Tensor = matrix
    * 0-D - scalar/number
    * 1-D - vector
    * 2-D - matrix
    
 ![data-flow](Images/data-flow.png)  
 
 * Run multiple graphs in one session; do not try to create multiple graphs
 
 * Tensors are not iterable
 
 * In many instances, TF types can be interchanged with NumPy types
 
 * Variables must be initialized before using, but function to initialize all
 
 * Can create placeholders, so computations can be run without values

In [None]:
# create a variable whose original value is 2
a = tf.Variable ( 2 , name = "scalar" )

# assign a * 2 to a and call that op a_times_two
a_times_two = a.assign (a * 2)
init = tf.global_variables_initializer()

with tf.Session() as sess: 
    sess.run(init)
    # have to initialize a, because a_times_two op depends on the value of a 
    sess.run( a_times_two )  # >> 4
    sess.run( a_times_two )  # >> 8
    sess.run( a_times_two )  # >> 16

## TensorFlow assigns a*2 to a every time a_times_two is fetched. ##

### Optimizers and loss functions
^ Gradient Descent(/Ascent)

Linear Prediction
* Given a feature vector *φ(x)* and a weight vector **w**, we define the prediction **score** to be their inner product. 
    * The score intuitively represents the degree to which the classification is positive or negative.
* The predictor is linear because the score is a linear function of **w**.
* In the context of binary classification with binary features, the score aggregates the contribution of each feature, weighted appropriate. 
    * We can think of each feature present as voting on the classification.
* What if we don't know **w**?

Loss function
* Def: A loss function Loss(x, y, **w**) quantifies how unhappy you would be if you used **w** to make a prediction on x when the correct output is y. It is the object we want to minimize.
* Key: need to set **w** to make global tradeoffs — not every example can be happy.

Gradients
* The gradient $$
abla_{w}$$TrainLoss(**w**) is the direction that increases the loss the most.
    * Tells us which direction to move in
    * Gradient descent
        * step size
        * number of iterations
        * SLOW - must go through every piece of data
            * Solution: Stochastic updates
            
    