<h4>Backprop Refactor</h4>

<p></p>
One of the goals in refactoring the backprop equations is to reduce the length of the backprop equations 
as we go deeper into the stages. Each stage should rely only on the terms before the current stage and the output
of the stage. 
<p></p>
Refactor the backprop equations into terms where each stage is consistent and we can plug in different
derivatives as both the layers and activation functions change. 
For 1 logistic neuron one weight update = $\Delta w_i = \eta \delta x_i$ 
<p></p>
where $\delta$ is:
<p></p>
$\delta = (y-\hat y)f'(h)$
<p></p>
where f'(h) refers to the derivative of the activation function. 
<p></p>
$h=\sum w_i x_i$
<p></p>
This refactor shows the change delta in the hidden layer weights are dependent on the derivative of the error function multiplied
by the derivative of the activation function and the output vlalues of the hidden layer. Evaluate the derivative
of the activation function using the values of the hidden layer output. 

In [9]:
#from Udacity DL Course lesson 2: week 12. Gradient Descent:The Code
#this is not gradient descent w/o the convergence or the epochs
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

def sigmoid_prime(x): 
    """
    # Derivative of the sigmoid function
    """
    return sigmoid(x) * (1 - sigmoid(x))


#the tf xor gate has 4 different inputs combos for 2 nodes
#it has to u
learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)

# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])

### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
###       fewer variable names than in the above sample code

# TODO: Calculate the node's linear combination of inputs and weights
h = np.dot(np.transpose(x),w)

# TODO: Calculate output of neural network
nn_output = sigmoid(h)

# TODO: Calculate error of neural network
error = np.subtract(y,nn_output)

# TODO: Calculate the error term
#       Remember, this requires the output gradient, which we haven't
#       specifically added a variable for.
error_term = error*sigmoid_prime(h)

# TODO: Calculate change in weights
del_w = learnrate*error_term*x

#this is vectorized? no this is completely useless

print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

Neural Network output:
0.689974481128
Amount of Error:
-0.189974481128
Change in Weights:
[-0.02031869 -0.04063738 -0.06095608 -0.08127477]


<h4>Tensorflow inputs</h4>
The inputs from TF come from a placeholder which allows values to be added later. Input data is streamed through 
a placeholder. Defining a placeholder requires a shape, type(float32), and name. If you don't assign a name the runtime
will assign a name. Easier to match the name with the assignment statement. 
<p></p>
a = tf.placeholder(tf.float32, shape=(2,2), name="a")
<p></p>
A tensorflow shape can be defined similar to a numpy shape but it can also take in None as an argument to shape. 




In [80]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import OneHotEncoder

def trans_for_ohe(labels):
    """Transform a flat list of labels to what one hot encoder needs."""
    print ("labels before reshape:",labels.shape)
    #can use newaxis. 
    a = np.array(labels).reshape(len(labels), -1)
    print("labels after reshape:",a.shape)
    return a


XOR_X=np.array([[0,0],[0,1],[1,0],[1,1]])
XOR_Y = np.array([0,1,1,0])
print("len:",len(XOR_X))
XOR_X = tf.placeholder(tf.float32, shape=[None,len(XOR_X[0])], name="x")


#test different onehot
enc = OneHotEncoder()
enc.fit(trans_for_ohe(XOR_Y))
XOR_T = enc.transform(trans_for_ohe(XOR_Y)).toarray()
print("XOR_T:",XOR_T)

#
yonehot = tf.one_hot(XOR_Y,1)
#print(yonehot)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    sess.run(tf.Print(yonehot,[yonehot]))
    sess.run(tf.shape(yonehot))
    print("shape XOR_X:",tf.shape(XOR_X))
    print("shape onehot:",tf.shape(yonehot))
    #yonehot.eval()

len: 4
labels before reshape: (4,)
labels after reshape: (4, 1)
labels before reshape: (4,)
labels after reshape: (4, 1)
XOR_T: [[ 1.  0.]
 [ 0.  1.]
 [ 0.  1.]
 [ 1.  0.]]
shape XOR_X: Tensor("Shape_21:0", shape=(2,), dtype=int32)
shape onehot: Tensor("Shape_22:0", shape=(2,), dtype=int32)


<h4>Gradient Descent</h4>
Given the equations to change the weights via backprop here is the 
addition of a learning rate and how to change the weight values given 
a loss function. This uses the The code below runs gradient descent with definitions: 
forward pass = $\hat y = f(\sum w_i x_i)$
<p></p>
error term = $\delta = (y - \hat y) \cdot f'(\sum w_i x_i)$
<p></p>
update the weight step $\Delta w_i = \Delta w_i + \delta x_i$
<p></p>
update the weights $w_i = w_i + \eta \frac{\Delta w_i}{m}$
<p></p>





In [10]:
#From Udacity DL Lesson 2 13: Implementing Gradient Descent. Run this first before cell below
import numpy as np
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix


In [14]:
#From Udacity DL Lesson 2 13: Implementing Gradient Descent. Run cell above to replace imports
import numpy as np
#from data_prep import features, targets, features_test, targets_test


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# TODO: We haven't provided the sigmoid_prime function like we did in
#       the previous lesson to encourage you to come up with a more
#       efficient solution. If you need a hint, check out the comments
#       in solution.py from the previous lecture.

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # Loop through all records, x is the input, y is the target

        # Note: We haven't included the h variable from the previous
        #       lesson. You can add it if you want, or you can calculate
        #       the h together with the output

        # TODO: Calculate the output
        output = sigmoid(np.dot(x,weights))

        # TODO: Calculate the error
        error = y-output

        # TODO: Calculate the error term
        error_term = error * output*(1-output)

        # TODO: Calculate the change in weights for this sample
        #       and add it to the total weight change
        del_w += error_term * x

    # TODO: Update weights using the learning rate and the average change in weights
    weights += learnrate * del_w/n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Train loss:  0.2627609385
Train loss:  0.209286194093
Train loss:  0.200842929081
Train loss:  0.198621564755
Train loss:  0.197798513967
Train loss:  0.197425779122
Train loss:  0.197235077462
Train loss:  0.197129456251
Train loss:  0.197067663413
Train loss:  0.197030058018
Prediction accuracy: 0.725


In [15]:
#From Udacity DL Lesson 2 14.: Multilayer Perceptron

import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(4)

weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))


# TODO: Make a forward pass through the network

hidden_layer_in = np.dot(X, weights_input_to_hidden)
hidden_layer_out = sigmoid(hidden_layer_in) 

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = np.dot(hidden_layer_out, weights_hidden_to_output)
output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')
print(output_layer_out)






Hidden-layer Output:
[ 0.41492192  0.42604313  0.5002434 ]
Output-layer Output:
[ 0.49815196  0.48539772]


In [81]:
#From Udacity DL class lecture 2 15 Backpropagation


#second version... modify for input placeholders
XOR_X = [[0, 0], [0, 1], [1, 0], [1, 1]]  # Features
XOR_Y = [0, 1, 1, 0]  # Class labels

x = tf.placeholder(tf.float32,shape=[None,len(XOR_X)],name="x")

y = tf.placeholder(tf.float32,shape=[None,2],name="y")

w1 = tf.Variable(tf.random_uniform([2, nb_hidden_nodes], -1, 1, seed=0),
                 name="w1")
w2 = tf.Variable(tf.random_uniform([nb_hidden_nodes, nb_classes], -1, 1,
                                   seed=0),
                 name="w2")
b1 = tf.Variable(tf.zeros([nb_hidden_nodes]), name="b1")
b2 = tf.Variable(tf.zeros([nb_classes]), name="b2")






NameError: name 'a' is not defined

In [51]:
import tensorflow as tf


import numpy as np

init_op = tf.initialize_all_variables()

#run the graph

x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

x1 = tf.convert_to_tensor(x)
weights_input_hidden1 = tf.convert_to_tensor(weights_input_hidden)

#x1_shape = tf.get_shape(x1)
print("x1.get_shape:",x1.get_shape())


x1shape = tf.shape(x1)
with tf.Session() as sess:
    print("x1shape:",sess.run(x1shape))

## Forward pass
hidden_layer_input = tf.matmul(tf.transpose(tf.convert_to_tensor(x)), tf.convert_to_tensor(weights_input_hidden))
hidden_layer_output = tf.sigmoid(tf.convert_to_tensor(hidden_layer_input))

with tf.Session() as sess:
    sess.run(init_op) #execute init_op
    #print the random values that we sample
    print (sess.run(hidden_layer_input))
    print (sess.run(hidden_layer_output))
    
    
output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate output error
## the error refers to the derivative of the least squares error. y_n-t_n where y_n is expected, t_n is NN output
error = target-output

# TODO: Calculate error term for output layer
output_error_term = error * output*(1-output)

# TODO: Calculate error term for hidden layer
#
hidden_error = np.dot(output_error_term, weights_hidden_output)
hidden_error_term = hidden_error*hidden_layer_output*(1-hidden_layer_output)

# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate*output_error_term*hidden_layer_output

# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_h = learnrate*hidden_error_term*x[:,None]

print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)



Instructions for updating:
Use `tf.global_variables_initializer` instead.
x1.get_shape: (3,)
x1shape: [3]


ValueError: Shape must be rank 2 but is rank 1 for 'MatMul_11' (op: 'MatMul') with input shapes: [3], [3,2].

<h4>Backpropagation with 1 hidden layer refactor</h4>
<p></p>
error_hidden is the same as hidden_error term. error_hidden is not the same as hidden_error and have very different
definitions.
<p></p>
error_hidden = $\delta_j$
<p></p>
hidden_error = $w_{jk}\delta_k$
<p></p>
error: the least squares error at the output stage after the derivative. They reversed the signs, $y-y^{hat}$
error = output-target
<p></p>
output_error_term the error term above times the derivative of the logistic: $(y-y^{hat}) \cdot y(1-y)$
<p></p>    
output_error_term = error*(output)(1-output)
<p></p>
hidden_error=np.dot(weights_hidden_output,output_error_term)  
<p></p>
hidden_error_term = hidden_error * hidden_layer_output * (1 - hidden_layer_output)
<p></p>
The delta rule consists of 3 terms, 
<li>learning rate times</li>
<li>delta error which is defined above and has 2 separate definitions, one for the output layer
and one for the hidden layer. The hidden layer contains the backpropagated term from the stage before, the output layer. </li>
<p></p>
$\delta^{o} = (y-y^{hat}) \cdot f'(Wa)$ 
where you take the derivative of the stage and their term Wa is our z where $z=\sum\limits_{n=0}^N w_h h_o$ 
<p></p>
$\delta^{h} = W\delta^{o}f'(h)$ where W is the weight matrix of the hidden layer, $\delta^{o}$ 
is the backpropagated error term from the stage before and f'(h) is the derivative of the hidden layer
with h the output of the hidden layer. They really mean f'(h) = h(1-h). 
<p></p>
<li>input into layer or output of stage before</li>
delta_who = learnrate * output_error_term * hidden_layer_output
<p></p>
If x is a 1 column vector, to take the transpose use x{:,None}. The
delta_whi = learnrate * hidden_error_term * x[:, None]
<p></p>

In [20]:
#From Udacity Class lesson 2 16 Implementing Backpropagation run this first before cell below
import numpy as np
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix


In [22]:
#From Udacity DL class Lesson 2 16 Implementing BackPropagation
import numpy as np
#data_prep is supposed to be excecuted in the above cell first
#from data_prep import features, targets, features_test, targets_test

np.random.seed(21)

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


# Hyperparameters
n_hidden = 2  # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

for e in range(epochs):
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    for x, y in zip(features.values, targets):
        ## Forward pass ##
        # TODO: Calculate the output
        hidden_input = np.dot(x,weights_input_hidden)
        hidden_output = sigmoid(hidden_input)
        output = sigmoid(np.dot(hidden_output,weights_hidden_output))

        ## Backward pass ##
        # TODO: Calculate the network's prediction error
        error = y - output 

        # TODO: Calculate error term for the output unit
        output_error_term = error*output*(1-output)

        ## propagate errors to hidden layer

        # TODO: Calculate the hidden layer's contribution to the error
        hidden_error = np.dot(weights_hidden_output,hidden_output)
        
        # TODO: Calculate the error term for the hidden layer
        hidden_error_term = hidden_error * hidden_output * (1-hidden_output)
        
        # TODO: Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output 
        del_w_input_hidden += hidden_error_term * x[:,None]

    # TODO: Update weights
    weights_input_hidden += learnrate * del_w_input_hidden/n_records
    weights_hidden_output += learnrate * del_w_hidden_output/n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))


Train loss:  0.251356688016
Train loss:  0.249919339075
Train loss:  0.248540044272
Train loss:  0.247218941997
Train loss:  0.245955883288
Train loss:  0.244750455044
Train loss:  0.243602004124
Train loss:  0.242509661901
Train loss:  0.241472368875
Train loss:  0.240488898959
Prediction accuracy: 0.725


In [9]:
import tensorflow as tf

tf.reset_default_graph()
input_value = tf.constant(0.5,name="input_value") 
weight = tf.Variable(1.0,name="weight") 
expected_output = tf.constant(0.0,name="expected_output") 
model = tf.multiply(input_value,weight,"model")
loss_function = tf.pow(expected_output - model,2,name="loss_function") 
optimizer = tf.train.GradientDescentOptimizer(0.025).minimize(loss_function) 

for value in [input_value,weight,expected_output,model,loss_function]: 
    tf.summary.scalar(value.op.name,value) 
summaries = tf.summary.merge_all()
init = tf.global_variables_initializer()
sess = tf.Session() 

sess.run(init)

summary_writer = tf.summary.FileWriter('log_simple_stats',sess.graph) 

#sess.run(tf.global_variables_initializer())
for i in range(100): 
    summary_writer.add_summary(sess.run(summaries),i) 
    sess.run(optimizer)
    
    

    
    
    
    

In [8]:
import tensorflow as tf
#
x = tf.Variable(2.0)
y = 2.0 * (x**3)
z = 3.0 + y**2
grad_z = tf.gradients(z,[x,y])
init = tf.global_variables_initializer()
sess = tf.Session() 
sess.run(init)
result = sess.run(grad_z)
print (result)

#I dunno which way is better the cool google internal coding style way or the easier to remember way? 

#with tf.Session() as sess:
#    sess.run(x.initializer)
#    result = sess.run(grad_z)
#    print (result)


[768.0, 32.0]


In [26]:
import tensorflow as tf

tf.reset_default_graph()

x1 = tf.constant(0.1)
x2 = tf.constant(0.3)

w1 = tf.Variable(0.4)
w2 = tf.Variable(-0.2)
z = tf.add(tf.multiply(x1,w1),tf.multiply(x2,w2))
sig_out = tf.sigmoid(z)
grad = tf.gradients(z,[w1,w2])
 
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
result = sess.run(grad)
print (result)



[0.1, 0.30000001]


In [22]:
tf.reset_default_graph()

global_step = tf.Variable(0, dtype=tf.int32, trainable=False,
 name='global_step')


x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input")


Theta1 = tf.Variable(tf.random_uniform([2,2], -1, 1), name="Theta1")
Theta2 = tf.Variable(tf.random_uniform([2,1], -1, 1), name="Theta2")
Bias1 = tf.Variable(tf.zeros([2]), name="Bias1")
Bias2 = tf.Variable(tf.zeros([1]), name="Bias2")

A2 = tf.sigmoid(tf.matmul(x_, Theta1) + Bias1)
Hypothesis = tf.sigmoid(tf.matmul(A2, Theta2) + Bias2)

cost = tf.reduce_mean(( (y_ * tf.log(Hypothesis)) + 
        ((1 - y_) * tf.log(1.0 - Hypothesis)) ) * -1)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

XOR_X = [[0,0],[0,1],[1,0],[1,1]]
XOR_Y = [[0],[1],[1],[0]]


for value in [Theta1, Theta2, Bias1, Bias2,Hypothesis]: 
    tf.summary.scalar(value.op.name,value) 
summaries = tf.summary.merge_all()


summary_writer = tf.summary.FileWriter('log_simple_stats',sess.graph) 


init = tf.global_variables_initializer()
sess = tf.Session() 
sess.run(init)

for i in range(100000):
    sess.run(train_step, feed_dict={x_: XOR_X, y_: XOR_Y})
    summary_writer.add_summary(sess.run(summaries),i) 
    
    if i % 1000 == 0:
        print('Epoch ', i)
        print('Hypothesis ', sess.run(Hypothesis, feed_dict={x_: XOR_X, y_: XOR_Y}))
        print('Theta1 ', sess.run(Theta1))
        print('Bias1 ', sess.run(Bias1))
        print('Theta2 ', sess.run(Theta2))
        print('Bias2 ', sess.run(Bias2))
        print('cost ', sess.run(cost, feed_dict={x_: XOR_X, y_: XOR_Y}))
        
        
writer = tf.summary.FileWriter("./logs/xor_logs", sess.graph_def)

InvalidArgumentError: tags and values not the same shape: [] != [1] (tag 'Bias2_1')
	 [[Node: Bias2_1 = ScalarSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Bias2_1/tags, Bias2/read)]]

Caused by op 'Bias2_1', defined at:
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-22-000699f1ac97>", line 28, in <module>
    tf.summary.scalar(value.op.name,value)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tensorflow/python/summary/summary.py", line 122, in scalar
    tags=scope.rstrip('/'), values=tensor, name=scope)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 281, in _scalar_summary
    name=name)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/dc/anaconda/envs/tf35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): tags and values not the same shape: [] != [1] (tag 'Bias2_1')
	 [[Node: Bias2_1 = ScalarSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Bias2_1/tags, Bias2/read)]]


In [None]:
# Siraj code, verify from videos. 



from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf


learning_rate = 0.01
training_iteration = 30
batch_size = 100
display_step = 2


x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

with tf.name_scope("Wx_b") as scope:
    model = tf.nn.softmax(tf.matmul(x, W) + b)

w_h = tf.summary.histogram("weights", W) 
b_h = tf.summary.histogram("biases", b) 

#this looks like a cross entropy. 
with tf.name_scope("cost_function") as scope:
   cost_function = -tf.reduce_sum(y*tf.log(model))
   tf.summary.scalar("cost_function", cost_function)

with tf.name_scope("train") as scope:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function)

init = tf.global_variables_initializer()  #tf.initialize_all_variables()

merged_summary_op = tf.summary.merge_all

#Launch the graph

with tf.Session() as sess:
    sess.run(init)
    summary_writer = tf.summary.FileWriter('/sirajravel/logs', graph_def=sess.graph_def)

    for iteration in range(training_iteration):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)  

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)

        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})

        avg_cost += sess.run(cost_function, feed_dict={x: batch_xs, y: batch_ys})/total_batch

        summary_str = sess.run(merged_summary_op, feed_dict={x: batch_xs, y: batch_ys})
        summary_writer.add_summary(summary_str, iteration*total_batch + i)

    if iteration % display_step == 0:
        print("Iteration", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(avg_cost))
    print("Tuning completed!")

    predictions = tf.equal(tf.argmax(model,1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(predictions, "float"))
    print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))




In [38]:
import tensorflow as tf

x1 = tf.Variable(0.1)
x2 = tf.Variable(0.3)
w1 = tf.Variable(0.4)
w2 = tf.Variable(-0.2)

y = tf.placeholder("float32")

z1 = tf.sigmoid(tf.multiply(x1,w1))
diff = y-z1

cost = tf.multiply(diff, diff)
step = tf.train.GradientDescentOptimizer(0.1)
#.minimize(cost)
compute_gradients = step.compute_gradients(cost)

#for i in range(10000):
#    batch_xs, batch_ys = 
#    sess.run(step, feed_dict = {a_0: batch_xs,
#                                y : batch_ys})
#    if i % 1000 == 0:
#        res = sess.run(acct_res, feed_dict =
#                       {a_0: mnist.test.images[:1000],
#                        y : mnist.test.labels[:1000]})
#        print (res)



In [55]:
#
#
#
import tensorflow as tf
import numpy as np

batch_size = 5
dim = 3
hidden_units = 10


sess = tf.Session()

with sess.as_default():
    x = tf.placeholder(dtype=tf.float32, shape=[None, dim], name="x")
    y = tf.placeholder(dtype=tf.int32, shape=[None], name="y")
    w = tf.Variable(initial_value=tf.random_normal(shape=[dim, hidden_units]), name="w")
    b = tf.Variable(initial_value=tf.zeros(shape=[hidden_units]), name="b")
    logits = tf.nn.tanh(tf.matmul(x, w) + b)

    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y,name="xentropy")
    # define model end


    # begin training
    optimizer = tf.train.GradientDescentOptimizer(1e-5)
    grads_and_vars = optimizer.compute_gradients(cross_entropy, tf.trainable_variables())

    # generate data
    data = np.random.randn(batch_size, dim)
    labels = np.random.randint(0, 10, size=batch_size)
    print("data shape:", data.shape," labels shape", labels.shape)
    print("data:", data)
    print("labels:", labels)
    sess.run(tf.initialize_all_variables())
    gradients_and_vars = sess.run(grads_and_vars, feed_dict={x:data, y:labels})
    for g, v in gradients_and_vars:
        if g is not None:
            print ("****************this is variable*************")
            print ("variable's shape:", v.shape)
            print (v)
            print ("****************this is gradient*************")
            print ("gradient's shape:", g.shape)
            print (g)

sess.close()

data shape: (5, 3)  labels shape (5,)
data: [[-1.24571758  1.57002731  1.09734996]
 [-0.64425685  0.41860331  0.35892995]
 [ 0.35279633 -1.16424067  0.40496092]
 [ 0.21339494 -0.65179829 -0.12803234]
 [-0.91103125  0.89913174 -0.31945746]]
labels: [3 0 8 4 3]
Instructions for updating:
Use `tf.global_variables_initializer` instead.


TypeError: Fetch argument None has invalid type <class 'NoneType'>

In [13]:
import tensorflow as tf

def sigma(x):
    return tf.div(tf.constant(1.0),
                  tf.add(tf.constant(1.0), tf.exp(tf.negative(x))))

def sigmaprime(x):
    return tf.multiply(sigma(x), tf.subtract(tf.constant(1.0), sigma(x)))


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

a_0 = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

middle = 30
w_1 = tf.Variable(tf.truncated_normal([784, middle]))
b_1 = tf.Variable(tf.truncated_normal([1, middle]))
w_2 = tf.Variable(tf.truncated_normal([middle, 10]))
b_2 = tf.Variable(tf.truncated_normal([1, 10]))
z_1 = tf.add(tf.matmul(a_0, w_1), b_1)
#a_1 = sigma(z_1)
a_1 = tf.sigmoid(z_1)
z_2 = tf.add(tf.matmul(a_1, w_2), b_2)
#a_2 = sigma(z_2)
a_2 = tf.sigmoid(z_2)

diff = tf.subtract(a_2, y)

#d_z_2 = tf.multiply(diff, sigmaprime(z_2))
#d_b_2 = d_z_2
#d_w_2 = tf.matmul(tf.transpose(a_1), d_z_2)

#d_a_1 = tf.matmul(d_z_2, tf.transpose(w_2))
#d_z_1 = tf.multiply(d_a_1, sigmaprime(z_1))
#d_b_1 = d_z_1
#d_w_1 = tf.matmul(tf.transpose(a_0), d_z_1)

eta = tf.constant(0.5)
#step = [
#    tf.assign(w_1,
#            tf.subtract(w_1, tf.multiply(eta, d_w_1)))
#  , tf.assign(b_1,
#            tf.subtract(b_1, tf.multiply(eta,
#                               tf.reduce_mean(d_b_1, axis=[0]))))
#  , tf.assign(w_2,
#            tf.subtract(w_2, tf.multiply(eta, d_w_2)))
#  , tf.assign(b_2,
#            tf.subtract(b_2, tf.multiply(eta,
#                               tf.reduce_mean(d_b_2, axis=[0]))))
#]

acct_mat = tf.equal(tf.argmax(a_2, 1), tf.argmax(y, 1))
acct_res = tf.reduce_sum(tf.cast(acct_mat, tf.float32))

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

cost = tf.multiply(diff, diff)
step = tf.train.GradientDescentOptimizer(0.1).minimize(cost)

for i in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(10)
    sess.run(step, feed_dict = {a_0: batch_xs,
                                y : batch_ys})
    if i % 1000 == 0:
        res = sess.run(acct_res, feed_dict =
                       {a_0: mnist.test.images[:1000],
                        y : mnist.test.labels[:1000]})
        print (res)




Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
94.0
734.0
791.0
809.0
807.0
819.0
822.0
828.0
823.0
838.0
