![Tensroflow logo](Images\TensorFlowLogo.png)
**TensorFlow** is an open source software library for machine learning across a range of tasks, and developed by Google to meet their needs for systems capable of building and training neural networks to detect and decipher patterns and correlations, analogous to the learning and reasoning which humans use. It is currently used for both research and production at Google products.

TensorFlow provides **multiple APIs**. The lowest level API **TensorFlow Core** provides you with complete programming control. We recommend TensorFlow Core for machine learning researchers and others who require fine levels of control over their models. The higher level APIs are built on top of TensorFlow Core. These higher level APIs are typically easier to learn and use than TensorFlow Core. In addition, the higher level APIs make repetitive tasks easier and more consistent between different users. A high-level API like tf.contrib.learn helps you manage data sets, estimators, training and inference. Let's just import all the necessary stuff required for the tutorial.

In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pickle

In [None]:
3                                       # a rank 0 tensor; this is a scalar with shape []
[1. ,2., 3.]                            # a rank 1 tensor; this is a vector with shape [3]
[[1., 2., 3.], [4., 5., 6.]]            # a rank 2 tensor; a matrix with shape [2, 3]
[[[1., 2., 3.]], [[7., 8., 9.]]]        # a rank 3 tensor with shape [2, 1, 3]

## The Computational Graph
A computational graph is a series of TensorFlow operations arranged into a graph of nodes. Each node takes zero or more tensors as inputs and produces a tensor as an output. One type of node is a constant. Like all TensorFlow constants, it takes no inputs, and it outputs a value it stores internally.

In [7]:
node1 = tf.constant(3.0,tf.float32)                  # A floating type constant
node2 = tf.constant(4, tf.uint8)                     # An integer type constant
print('Node1: {} \nNode2: {}'.format(node1,node2))

Node1: Tensor("Const_4:0", shape=(), dtype=float32) 
Node2: Tensor("Const_5:0", shape=(), dtype=uint8)


**Notice** that printing the nodes does not output the values 3.0 and 4.0 as you might expect. Instead, they are nodes that, when evaluated, would produce 3.0 and 4.0, respectively. **To actually evaluate the nodes, we must run the computational graph within a session. A session encapsulates the control and state of the TensorFlow runtime.** The following code creates a Session object and then invokes its run method to run enough of the computational graph to evaluate node1 and node2.

In [5]:
sess = tf.Session()                                  # Creating Session object
print(sess.run([node1,node2]))

[3.0, 4]


We can build more complicated computations by combining Tensor nodes with operations. For example-

In [11]:
Adder = tf.add(node1,tf.cast(node2, tf.float32))     # Notice the casting operation being performed here
print('Sum: {}'.format(sess.run(Adder)))

Sum: 7.0


As it stands, this graph is not especially interesting because it always produces a constant result. A graph can be parameterized to accept external inputs, known as **placeholders**. A placeholder is a promise to provide a value later. 

In [27]:
X = tf.placeholder(tf.float32)                      # A placeholder does not accept a tensor as a input
y = tf.placeholder(tf.float32)

X_train = np.random.randn(1,1).astype(np.float32)    
y_train = np.random.randn(2,2).astype(np.float32)

reduce = X + y
sess.run(reduce,{X:X_train, y:y_train})

array([[-1.724105  , -0.65286815],
       [-0.1504311 ,  0.19988585]], dtype=float32)

In machine learning, we create models and then we evaluate them to test how they peform on training and testing data. Loss functions are the measures, to observe how our model is learning over period of time. To understand this concept, let us first create a model and then perform evaluation on it. For this tutorial lets create a not gate using tensorflow.

In [43]:
X_train = np.random.randint(0,2,(5,1)).astype(np.float32)    # Training data
y_train = np.logical_not(X_train).astype(np.float32)         # Training labels

# Model parameter
X = tf.placeholder(tf.float32)                                
y = tf.placeholder(tf.float32)

W = tf.Variable(tf.random_normal([1,1]), tf.float32)
b = tf.Variable(tf.random_normal([1,1]), tf.float32)

model = W * X + b

# mean square error loss
loss = tf.reduce_sum(tf.square(model - y))                  

Now that we have defined a loss function, let's use one of the many in-build optimizers for reducing loss. 

In [44]:
optimizer = tf.train.GradientDescentOptimizer(0.01)      # Gradient Decent optimizer
train = optimizer.minimize(loss)

init = tf.global_variables_initializer()                 # initializing global variables
sess.run(init)                                           # Note: this is very important step 

# Training loop
for i in range(100):
    sess.run(train,{X:X_train, y:y_train})

print('actual output: \n{} \nPredicted output: \n{}'.format(y_train, np.round(sess.run(model,{X:X_train, y:y_train}))))

actual output: 
[[ 1.]
 [ 0.]
 [ 0.]
 [ 0.]
 [ 1.]] 
Predicted output: 
[[ 1.]
 [ 0.]
 [ 0.]
 [ 0.]
 [ 1.]]


We can clearly observe that, when rounded our model predicted all the values correctly. So, we have learnt the correct hyperparameters for this particular model. Let's look at the value-

In [49]:
print("Weight Matrix: {} \nBais: {}".format(sess.run(W), sess.run(b)))

Weight Matrix: [[-0.43818688]] 
Bais: [[ 0.59506792]]


Let's now move on the "hello world" program in machine learning, the **MNIST** data set. It is kind of **benchmark** for evaluation of any machine learning model. MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:

![MNIST data](Images/MNIST_example.png)

It also includes labels for each image, telling us which digit it is.For example, the labels for the above images are 5, 0, 4, and 1.

The MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!  

In [3]:
def oneHot(label, no_class):
    one_hot_labels = np.zeros((label.shape[0],no_class))
    one_hot_labels[np.arange(label.shape[0]),label] = 1
    return one_hot_labels

In [4]:
def loadMnist(path):
    fp = open(path,'rb')
    fileData = pickle.load(fp, encoding='latin1')
    feature_vector = fileData[0][0]
    label = oneHot(fileData[0][1],10)
    return(feature_vector, label)

In [6]:
no_class = 10
#fp = open('C:\\Users\\pradeepy\\Anconda Notebook\\Images\\mnist.pkl','rb')
#filedata = pickle.load(fp, encoding='latin1')

#feature_vector = filedata[0][0]
#label = filedata[0][1]

#labels = oneHot(label, no_class)
path = 'C:\\Users\\pradeepy\\Anconda Notebook\\Images\\mnist.pkl'
feature_vector, labels = loadMnist(path)

print(feature_vector.shape)
'''

X = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

W = tf.Variable(tf.random_normal([784,10]),tf.float32)
b = tf.Variable(tf.random_normal([1,10]),tf.float32)

model = tf.matmul(X,W) + b

softmax = tf.nn.softmax(model)

loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(softmax),axis=1))

optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for _ in range(50000):
    random_sequence = np.random.randint(0,50000,(100))
    feature = feature_vector[random_sequence,:]
    labell = labels[random_sequence,:]
    sess.run(train,{X:feature, y:labell})
    
#sess.run(softmax,{X:feature_vector,y:labels})
pred = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(y,1),tf.argmax(softmax,1)),tf.float32))

print('Accuracy achieved: {}% '.format(np.round((sess.run(pred,{X:feature_vector,y:labels})*100)),2))

'''

(50000, 784)


"\n\nX = tf.placeholder(tf.float32)\ny = tf.placeholder(tf.float32)\n\nW = tf.Variable(tf.random_normal([784,10]),tf.float32)\nb = tf.Variable(tf.random_normal([1,10]),tf.float32)\n\nmodel = tf.matmul(X,W) + b\n\nsoftmax = tf.nn.softmax(model)\n\nloss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(softmax),axis=1))\n\noptimizer = tf.train.GradientDescentOptimizer(0.5)\ntrain = optimizer.minimize(loss)\n\nsess = tf.Session()\nsess.run(tf.global_variables_initializer())\n\nfor _ in range(50000):\n    random_sequence = np.random.randint(0,50000,(100))\n    feature = feature_vector[random_sequence,:]\n    labell = labels[random_sequence,:]\n    sess.run(train,{X:feature, y:labell})\n    \n#sess.run(softmax,{X:feature_vector,y:labels})\npred = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(y,1),tf.argmax(softmax,1)),tf.float32))\n\nprint('Accuracy achieved: {}% '.format(np.round((sess.run(pred,{X:feature_vector,y:labels})*100)),2))\n\n"

Getting 93% accuracy on MNIST is bad. It's almost embarrassingly bad. In this section, we'll fix that, jumping from a very simple model to something moderately sophisticated: a small convolutional neural network. This will get us to around 99.2% accuracy which is not state of the art, but respectable. Lets directly jump to the code and see how different part of the code works.

**Note-**The convolution operation takes up a lot of memory, so this code might take up to 15 min before giving out the achieved accuracy.

In [None]:
###################################################################
# Let us start by loading the data into memory from a pickled file
###################################################################
fp = open('C:\\Users\\pradeepy\\Anconda Notebook\\Images\\mnist.pkl','rb')
filedata = pickle.load(fp, encoding='latin1')
feature_vector = filedata[0][0]
label = filedata[0][1]
one_hot_label = np.zeros((label.shape[0],10))                    # Creating one-hot vector for the labels
one_hot_label[np.arange(label.shape[0]),label] = 1


############################################################################
# lets move on and build our very first convolutional neural network model
############################################################################

# Here is our placeholders for feature vector and labels required for learning
X = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

# first we need to reshape our original image to a rank-4 tensor for performing convolution operation
X_image = tf.reshape(X, [-1,28,28,1])

#########################
# First Convolution layer
#########################

# Weight and Bais initialization
W_conv1 = tf.Variable(tf.truncated_normal([5,5,1,32],stddev=0.1)) # truncated_normal discards any value outside greater or less 
b_conv1 = tf.Variable(tf.constant(0.1,shape = [1,32]))            # than 2*stddev from mean value

# performing convolution and maxpool operation one after another
h_conv1 = tf.nn.relu(tf.nn.conv2d(X_image, W_conv1, strides=[1,1,1,1], padding='SAME'))   # middle two value discribe stride
h_maxpool1 = tf.nn.max_pool(h_conv1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')  # x and y respectively
                                                                                          # first value is batch
                                                                                          # last is depth of convolution  

############################
# Second Convolutional Layer
############################

# Weight and Bias initialization  
W_conv2 = tf.Variable(tf.truncated_normal([5,5,32,64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[1,64]))

# performing convolution and maxpool operation one after another
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_maxpool1, W_conv2, strides=[1,1,1,1], padding='SAME'))
h_maxpool2 = tf.nn.max_pool(h_conv2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

#############
# Dense layer
#############

# Weight and Bias initialization
W_dense = tf.Variable(tf.truncated_normal([7*7*64,1024], stddev=0.1))
b_dense = tf.Variable(tf.constant(0.1, shape=[1,1024]))

# Introducing a dense layer
h_dense = tf.nn.relu(tf.matmul(tf.reshape(h_maxpool2,[-1, 7*7*64]), W_dense) + b_dense)

################
# Dropout layer
################

# Creating placeholder for drop-probabilities and performimg dropout operation 
drop_prob = tf.placeholder(tf.float32)
drop_layer = tf.nn.dropout(h_dense, drop_prob)

###########################
# Creating a softmax layer
###########################

# Weight and Bias initialization
W_softmax = tf.Variable(tf.truncated_normal([1024,10], stddev=0.1))
b_softmax = tf.Variable(tf.constant(0.1, shape=[1,10]))

# performing matrix multiplication operation and adding bias
h_sfmx = tf.matmul(drop_layer, W_softmax) + b_softmax

# Defining loss 
loss = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=h_sfmx)  # function automatically calculates softmax and 
                                                                         # calculates the cross-entropy loss

# Note:: We can also use tf.nn.sparse_softmax_cross_entropy_with_logits()  which will automatically create a one-hot encoding
#        for your one dimentional label vector
    
# Defining optimizing function
train = tf.train.AdamOptimizer(1e-3).minimize(loss)

# Node to check-out accuracy of the bulit model
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.arg_max(h_sfmx,1), tf.arg_max(y,1)),tf.float32))

# Creating and initilizing session variables
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)


# Training the network using batches of 100 images at a time
for _ in range(1000):
    random_sequence = np.random.randint(0,50000,(100))
    feature = feature_vector[random_sequence,:]
    labell = one_hot_label[random_sequence,:]
    sess.run(train,{X:feature,y:labell,drop_prob:0.5})
    
# Printing out the accuracy of the built system
print('Accuracy: {}'.format(sess.run(accuracy,{X:feature_vector[:5000,:], y:one_hot_label[:5000,:],drop_prob:1.0})))
print(sess.run(drop_prob,{drop_prob:1.0}))