# TensorFlow 101

The goal of this article is to show how to get staretd with TensorFlow to the point of being able to built a convolutional neural network (CNN) and hopefully get to a decent level in the leaderboard.

First things first is to install Tensorflow for this I recommend to use anaconda and install TensFlow via the conda-forge channel.

```bash
conda install -c conda-forge tensorflow
```

Now it is possible to quickly test if everything runs correctly by running the following command:

In [70]:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

b'Hello, TensorFlow!'


Before trying to build a CNN directly I think it's important to learn a few basics about TensorFlow. Following the tutorials here the next section is going to start with exploring a few TensorFlow commands before ending up building a CNN. Only from there will we take it to the next level with the Kaggle competition.

At the lower end of TF is TF core and its central unit **tensor**, which is a set of values with no specifc types in the form of an array. Each array has a **rank**, which is the number of dimensions. TF core programs consist of two steps:

1. Building the computational graph
2. Running the computational graph

The imple model here below creates two constant and adds them. Nothing very complex so far but it shows how TF is creating graphs that mixes nodes and operations.

In [71]:
from __future__ import print_function

# Building the computational graph
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
node3 = tf.add(node1, node2) # we could have also use node1 + node2

# Running the compuational graph
sess = tf.Session()
print(sess.run([node1, node2]))
print("node3:", node3)
print("sess.run(node3):", sess.run(node3))

[3.0, 4.0]
node3: Tensor("Add_1:0", shape=(), dtype=float32)
sess.run(node3): 7.0


Another interesting example would be to create another more complex model and being able to evaluate it. Here below we are training a simple linear regression, set the ideal paramater W and b manually to get a loss function with a value of 0.

In [72]:
# Initialise the problem
W = tf.Variable([-1], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x + b
init = tf.global_variables_initializer()
sess.run(init)
# Save output into y variable and compute loss function
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

0.0


The example here above isn't really useful as ideally we would prefer to train a model so that TF figures out W and b by itself. This is where the function `tf.trainGradientDescentOptimizer` kicks in. This will automatically compute the derivative of the loss function up until finding the sweet spot where the loss functions is minimsed.

In [73]:
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
sess.run(init) # reset values to incorrect defaults.
for i in range(1000):
  sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]})

print(sess.run([W, b]))

[array([-1.], dtype=float32), array([ 1.], dtype=float32)]


Now TF saves many different models and simplifies a lot the procedure to input the data, train the model (no ugly loops), etc. Here below is an example using the Linear regression of the estimator module:

In [74]:
# NumPy is often used to load, manipulate and preprocess data.
import numpy as np
tf.logging.set_verbosity(tf.logging.ERROR)

# Declare list of features. We only have one numeric feature. There are many
# other types of columns that are more complicated and useful.
feature_columns = [tf.feature_column.numeric_column("x", shape=[1])]

# An estimator is the front end to invoke training (fitting) and evaluation
# (inference). There are many predefined types like linear regression,
# linear classification, and many neural network classifiers and regressors.
# The following code provides an estimator that does linear regression.
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns)

# TensorFlow provides many helper methods to read and set up data sets.
# Here we use two data sets: one for training and one for evaluation
# We have to tell the function how many batches
# of data (num_epochs) we want and how big each batch should be.
x_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7, 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)

# We can invoke 1000 training steps by invoking the  method and passing the
# training data set.
estimator.train(input_fn=input_fn, steps=1000)

# Here we evaluate how well our model did.
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)

train metrics: {'average_loss': 7.2832633e-09, 'loss': 2.9133053e-08, 'global_step': 1000}
eval metrics: {'average_loss': 0.0025343848, 'loss': 0.010137539, 'global_step': 1000}


The model here above is still quite simple and before moving on to the Kaggle competition it's important to understand the complexity that comes with a convolutional neural network. For this we're going to use the Kaggle MNIST dataset and example to see how it performs.

`! kg config -c digit-recognizer | kg download`

In [75]:
import pandas as pd
from IPython.display import display
mnist = pd.read_csv("train.csv")
display(mnist.head())

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now that we have some data it's time to write our first serious Neural Network: the Convolutional one! A CNN differentiate itself from other NN because it really applies vision concepts to the way the networks learns from the data.

As observed below the first layer of a CNN is a **Convolutional Layer** that layer filters a certain area or receptieve field of the image and then slides that area over the whole image. The two main parameters of a CNN are the **stride** and the **padding**. This stride is simply how many pixels at the time will your filter slide and the padding is a way to preserve the size of the input after a convolution layer so that the dimension of the input doesn't decreased too fast. Other layers exists such as pooling, ReLU or dropout and I recommend you read [this nice post](https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/) for a deeper explanation.
Let's move to the code now...

![CNN](CNN.png)

Tensorflow gives us a lot of flexibility 

In [76]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

# input
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

# First layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# Second layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Densely Connected Layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout Layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

In [78]:
from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder(n_values = 10, sparse = False)
y_train = enc.fit_transform(mnist[['label']])
x_train = mnist.drop('label', axis = 1)


cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(200):
        if i % 20 == 0:
            train_accuracy = accuracy.eval(feed_dict = {
                x : x_train, y_ : y_train, keep_prob : 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))
        train_step.run(feed_dict={x: x_train, y_: y_train, keep_prob: 0.5})

step 0, training accuracy 0.0808095
step 20, training accuracy 0.739286
step 40, training accuracy 0.867333
step 60, training accuracy 0.911905
step 80, training accuracy 0.930619
step 100, training accuracy 0.942571
step 120, training accuracy 0.950952
step 140, training accuracy 0.957619
step 160, training accuracy 0.962548
step 180, training accuracy 0.967024


[]