# Managing Models over the CPU and GPU

TensorFlow allows us to utilize multiple computing devices, if we so desire, to build and train our models. Supported devices are represented by string IDs and normally consist of the following:

* "/cpu:0" The CPU of this machine
* "/gpu:0" The first GPU of this machine (If you've have a Nvidia GPU)
* "/gpu:1" The second GPU of this machine.

When a TensorFlow operation has both CPU and GPU kernels, and GPU use is enabled, TensorFlow will automatically opt to use the GPU implementation. To inspect which devices are used by the computational graph, we can initialize our TensorFlow session with the log_device_placement set to True:

In [36]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

In [4]:
sess=tf.Session(config=tf.ConfigProto(log_device_placement=True))

If we desire to use a specific device, we may do so by using with tf.device to select the appropriate device. If the chosen device is not available, however, an error will be thrown. If we would like TensorFlow to find another available device if the chosen device does not exist, we can pass the allow_soft_placement flag to the session variable as follows.

In [7]:
with tf.device('/gpu:0'):
    a=tf.constant([1.0,2.0,3.0,4.0],shape=[2,2],name='a')
    b=tf.constant([1.0,2.0],shape=[2,1],name='b')
    c=tf.matmul(a,b)

sess=tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True))
sess.run(c)

array([[ 5.],
       [11.]], dtype=float32)

TensorFlow also  allows us to build models that span multiple GPUs by building models in a tower-like fashion as shown in Figure below. The following code is an example of multi-GPU code: 

In [17]:
c=[]
sum_=0
for d in ['/device:GPU:0','/device:GPU:1']:
    with tf.device(d):
        a = tf.constant([1.0, 2.0, 3.0, 4.0], shape=[2, 2],name='a')
        b = tf.constant([1.0, 2.0], shape=[2, 1], name='b') 
        c.append(tf.matmul(a,b))
with tf.device('/cpu:0'):
    sum_=tf.add_n(c)

sess=tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(sum_)

InvalidArgumentError: Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:GPU:2 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:2"](a_1, b)]]

Caused by op 'MatMul', defined at:
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\kernelapp.py", line 497, in start
    self.io_loop.start()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tornado\platform\asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 421, in run_forever
    self._run_once()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 1426, in _run_once
    handle._run()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\asyncio\events.py", line 127, in _run
    self._callback(*self._args)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tornado\platform\asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tornado\stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\zmq\eventloop\zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\zmq\eventloop\zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\zmq\eventloop\zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tornado\stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\ipykernel\zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2901, in run_ast_nodes
    if self.run_code(code, result):
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-6df6d562a2fd>", line 4, in <module>
    c=tf.matmul(a,b)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\math_ops.py", line 2014, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4567, in mat_mul
    name=name)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
    op_def=op_def)
  File "c:\users\deepblue\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:GPU:2 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:2"](a_1, b)]]


<img src='images/image4.PNG'>

# Specifying the Logistic Regression Model in TensorFlow

Now that we’ve developed all of the basic concepts of TensorFlow, let’s build a simple model to tackle the MNIST dataset. As you may recall, our goal is to identify handwritten digits from 28 x 28 black-and-white images. The first network that we’ll build implements a simple machine learning algorithm known as logistic regression.

On a high level, logistic regression is a method by which we can calculate the probability that an input belongs to one of the target classes. In our case, we’ll compute the probability that a given input image is a 0, 1, ..., or 9. Our model uses a matrix W representing the weights of the connections in the network, as well as a vector b corresponding to the biases to estimate whether an input x belongs to class i using the softmax expression we talked about earlier: 

<img src='images/softmax.PNG'>

Our goal is to learn the values for W and b that most effectively classify our inputs as accurately as possible. Pictorially, we can express the logistic regression network as shown in Figure below (bias connections are not shown to reduce clutter).

<img src='images/image5.PNG'>

You’ll notice that the network interpretation for logistic regression is rather primitive. It doesn’t have any hidden layers, meaning that it is limited in its ability to learn complex relationships! We have an output softmax of size 10 because we have 10 possible outcomes for each input. Moreover, we have an input layer of size 784, one input neuron for every pixel in the image! As we’ll see, the model makes decent headway toward correctly classifying our dataset, but there’s lots of room for improvement. 

We’ll build the the logistic regression model in four phases:

* 1. inference: produces a probability distribution over the output classes given a minibatch.

* 2. loss: computes the value of the error function (in this case, the cross-entropy loss).

* 3. training: responsible for computing the gradients of the model’s parameters and updating the model.

* 4. evaluate: will determine the effectiveness of a model 

Given a minibatch, which consists of 784-dimensional vectors representing MNIST images, we can represent logistic regression by taking the softmax of the input multiplied with a matrix representing the weights connecting the input and output layer. Each row of the output tensor represents the probability distribution over output classes for each corresponding data sample in the minibatch:

In [38]:
mnist=input_data.read_data_sets('data',one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [37]:
def inference(x):
    tf.constant_initializer(value=0)
    W=tf.get_variable("W",[784,10])
    b=tf.get_variable("b",[10])
    output=tf.nn.softmax(tf.matmul(x,W)+b)
    return output

Now, given the correct labels for a minibatch, we should be able to compute the average error per data sample. We accomplish this using the following code snippet that computes the cross-entropy loss    over a minibatch:


In [19]:
def loss(output,y):
    dot_product=y*tf.log(output)
    
    ''' Reduction along axis 0 collapses each column into a 
     single value, whereas reduction along axis 1 collapses 
     each row into a single value. In general, reduction along
     axis i collapses the ith dimension of a tensor to size 1.'''
    x_entropy=-tf.reduce_sum(dot_product,reduction_indices=1)
    loss=tf.reduce_mean(x_entropy)
    return loss

Then, given the current cost incurred, we’ll want to compute the gradients and modify the parameters of the model appropriately. TensorFlow makes this easy by giving us access to built-in optimizers that produce a special train operation that we can run via a TensorFlow session when we minimize them. Note that when we create the training operation, we also pass in a variable that represents the number of minibatches that have been processed. Each time the training operation is run, this step variable is incremented so that we can keep track of progress:

In [24]:
def training(cost,global_step):
    print("Cost: ",cost)
    optimizer=tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    train_op=optimizer.minimize(cost,global_step=global_step)
    return train_op

Finally, we put together a simple computational subgraph to evaluate the model on the validation or  test set:


In [49]:
def evaluate(output,y):
    correct_prediction=tf.equal(tf.argmax(output,1),tf.argmax(y,1))
    accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float64))
    print(accuracy)
    return accuracy

This completes Tensorflow graph setup for the logistic regression model.

## Logging and Training the Logistic Regression Model

In [45]:
# Parameters
learning_rate=0.01
training_epochs=100
batch_size=100
display_step=1

with tf.Graph().as_default():
    # mnist data image shape 28*28=784
    x=tf.placeholder("float",[None,784])
    y=tf.placeholder("float",[None,10])
    output=inference(x)
    
    cost=loss(output,y)
    global_step=tf.Variable(0,name='global_step',trainable=False)
    train_op=training(cost,global_step)
    eval_op=evaluate(output,y)
    summary_op=tf.summary.merge_all()
    saver=tf.train.Saver()
    sess=tf.Session()
    
    summary_writer=tf.summary.FileWriter("logistic_logs/",graph_def=sess.graph_def)
    
    init_op=tf.initialize_all_variables()
    
    sess.run(init_op)
    
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost=0
        total_batch=int(mnist.train.num_examples/batch_size)
        # loop over all batchs
        for i in range(total_batch):
            mbatch_x,mbatch_y=mnist.train.next_batch(batch_size)
            # Fit training using batch data
            feed_dict={x : mbatch_x,y : mbatch_y}
            sess.run(train_op,feed_dict)
            # Compute average loss
            minibatch_cost = sess.run(cost,feed_dict)
            avg_cost+=minibatch_cost/total_batch
        # Displat logs per epoch step
        if epoch%display_step==0:
            val_feed_dict={
                x:mnist.validation.images,
                y:mnist.validation.labels
            }
            accuracy=sess.run(eval_op,feed_dict)
            print("Validation Error: ",(1-accuracy))
            #summary_str=sess.run(summary_op,feed_dict=feed_dict)
            saver.save(sess,"logistic_logs/model-checkpoint",global_step)
        
print("Operation finised")


Cost:  Tensor("Mean:0", shape=(), dtype=float32)
Validation Error:  0.13999998569488525
Validation Error:  0.12000000476837158
Validation Error:  0.11000001430511475
Validation Error:  0.1600000262260437
Validation Error:  0.12000000476837158
Validation Error:  0.1899999976158142
Validation Error:  0.07999998331069946
Validation Error:  0.12999999523162842
Validation Error:  0.07999998331069946
Validation Error:  0.11000001430511475
Validation Error:  0.11000001430511475
Validation Error:  0.04000002145767212
Validation Error:  0.07999998331069946
Validation Error:  0.12000000476837158
Validation Error:  0.04000002145767212
Validation Error:  0.0899999737739563
Validation Error:  0.050000011920928955
Validation Error:  0.10000002384185791
Validation Error:  0.11000001430511475
Validation Error:  0.0899999737739563
Validation Error:  0.04000002145767212
Validation Error:  0.17000001668930054
Validation Error:  0.11000001430511475
Validation Error:  0.10000002384185791
Validation Error: 

## Tensorboard Visualization

<img src='images/tensorboard.PNG'>

## Building a Multilayer Model for MNIST

<img src='images/image6.PNG'>

We can reuse most of the code from our logistic regression example with a couple of modifications:


In [65]:
def layer(input,weight_shape,bias_shape):
    print(input)
    weight_stddev=(2.0/weight_shape[0])**0.5
    w_init=tf.random_normal_initializer(stddev=weight_stddev)
    bias_init=tf.constant_initializer(value=0)
    W=tf.get_variable("W",weight_shape,initializer=w_init,dtype=tf.float32)
    b=tf.get_variable("b",bias_shape,initializer=bias_init,dtype=tf.float32)
    return tf.nn.relu(tf.matmul(a=input,b=W)+b)

In [67]:
def inference(x):
    with tf.variable_scope("hidden_1"):
        hidden_1=layer(x,[784,256],[256])
    
    with tf.variable_scope("hidden_2"):
        hidden_2=layer(hidden_1,[256,256],[256])        
    
    with tf.variable_scope("output"):
        output=layer(hidden_2,[256,10],[10])
    return output

Most of the new code is self explanatory, but our initialization strategy deserves some additional description. The performance of deep neural networks very much depends on an effective initialization of its parameters. There are many features of the error surfaces of deep neural networks that make optimization using vanilla stochastic gradient descent very difficult. This problem is exacerbated as the number of layers in the model (and thus the complexity of the error surface) increases. Smart initialization is one way to mitigate this issue. 

For ReLU units, a study published in 2015 by He et al. demonstrates that the variance of weights in a network should be $ \frac{2}{n_{in}} $ where $n_{in}$ is the number of inputs coming into the neuron. The curious reader should investigate what happens when we change our initialization strategy. For example, changing tf.random_nor mal_initializer back to the tf.random_uniform_initializer we used in the logistic regression example significantly hurts performance. 

Finally, for slightly better performance, we perform the softmax while computing the loss instead of during the inference phase of the network. This results in the following modification:


In [74]:
def loss(output,y):
    xentropy=tf.nn.softmax_cross_entropy_with_logits(logits=output,labels=y)
    loss=tf.reduce_mean(xentropy)
    return loss

Running this program for 300 epochs gives us a massive improvement over the logistic regression model. The model operates with an accuracy of 98.2%, which is nearly a 78% reduction in the per-digit error rate compared to our first attempt.

In [80]:
learning_rate=0.01
training_epochs=100
batch_size=128
display_step=1

with tf.Graph().as_default():
    # mnist data image shape 28*28=784
    x=tf.placeholder(tf.float32,[None,784])
    y=tf.placeholder(tf.float32,[None,10])
    output=inference(x)
    
    cost=loss(output,y)
    global_step=tf.Variable(0,name='global_step',trainable=False)
    train_op=training(cost,global_step)
    eval_op=evaluate(output,y)
    summary_op=tf.summary.merge_all()
    saver=tf.train.Saver()
    sess=tf.Session()
    
    summary_writer=tf.summary.FileWriter("logistic_logs/",graph_def=sess.graph_def)
    
    init_op=tf.initialize_all_variables()
    
    sess.run(init_op)
    
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost=0
        total_batch=int(mnist.train.num_examples/batch_size)
        # loop over all batchs
        for i in range(total_batch):
            mbatch_x,mbatch_y=mnist.train.next_batch(batch_size)
            # Fit training using batch data
            feed_dict={x : mbatch_x,y : mbatch_y}
            sess.run(train_op,feed_dict)
            # Compute average loss
            minibatch_cost = sess.run(cost,feed_dict)
            avg_cost+=minibatch_cost/total_batch
        # Displat logs per epoch step
        if epoch%display_step==0:
            val_feed_dict={
                x:mnist.validation.images,
                y:mnist.validation.labels
            }
            accuracy=sess.run(eval_op,feed_dict)
            print("Validation Error: ",(1-accuracy)," Accuracy: ",accuracy)
            #summary_str=sess.run(summary_op,feed_dict=feed_dict)
            saver.save(sess,"logistic_logs/model-checkpoint",global_step)
        
print("Operation finised")

Tensor("Placeholder:0", shape=(?, 784), dtype=float32)
Tensor("hidden_1/Relu:0", shape=(?, 256), dtype=float32)
Tensor("hidden_2/Relu:0", shape=(?, 256), dtype=float32)
Cost:  Tensor("Mean:0", shape=(), dtype=float32)
Tensor("Mean_1:0", shape=(), dtype=float32)
Validation Error:  0.2890625  Accuracy:  0.7109375
Validation Error:  0.1953125  Accuracy:  0.8046875
Validation Error:  0.2265625  Accuracy:  0.7734375
Validation Error:  0.1796875  Accuracy:  0.8203125
Validation Error:  0.1953125  Accuracy:  0.8046875
Validation Error:  0.1484375  Accuracy:  0.8515625
Validation Error:  0.1640625  Accuracy:  0.8359375
Validation Error:  0.0625  Accuracy:  0.9375
Validation Error:  0.0546875  Accuracy:  0.9453125
Validation Error:  0.0546875  Accuracy:  0.9453125
Validation Error:  0.078125  Accuracy:  0.921875
Validation Error:  0.0625  Accuracy:  0.9375
Validation Error:  0.0390625  Accuracy:  0.9609375
Validation Error:  0.0234375  Accuracy:  0.9765625
Validation Error:  0.0546875  Accuracy