# Use of convolutions with tensorflow

In this notebook, we will use tensorflow to build a Convolutional Neural Network (CNN).  

### Convolution

Ressources :  [this notebook](https://nbviewer.jupyter.org/github/marc-moreaux/Deep-Learning-classes/blob/master/notebooks/Convolution.ipynb) and this [wikipedia page](https://en.wikipedia.org/wiki/Convolution)

no, if we consider two functions $f$ and $g$ taking values from $\mathbb{Z} \to \mathbb{R}$ then:  
$ (f * g)[n] = \sum_{m = -\infty}^{+\infty} f[m] \cdot g[n - m] $

In our case, we consider the two vectors $x$ and $w$ :  
$ x = (x_1, x_2, ..., x_{n-1}, x_n) $  
$ w = (w_1, w_2) $

And get :   
$ x * w = (w_1 x_1 + w_2 x_2, w_1 x_2 + w_2 x_3, ..., w_1 x_{n-1} + w_2 x_n)$


#### Deep learning subtility :
    
In most of deep learning framewoks, we need to chose in between three paddings:
- **Same**: $(f*g)$ has the same shape as x (we pad the entry with zeros)
- **valid**: $(f*g)$ has the shape of x minus the shape of w plus 1 (no padding on x)
- **Causal**: $(f*g)(n_t)$ does not depend on any $(n_{t+1})$

### Tensorflow

"TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks.[3] It is used for both research and production at Google often replacing its closed-source predecessor, DistBelief." - Wikipedia

We'll be using tensorflow to build the models we want to use. 

Here below, we build a AND gate with a very simple neural network :

In [1]:
import tensorflow as tf
import numpy as np

tf.reset_default_graph()

# Define our Dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
Y = np.array([0,0,0,1]).reshape(-1,1)


# Define the tensorflow tensors
x = tf.placeholder(tf.float32, [None, 2], name='X')  # inputs
y = tf.placeholder(tf.float32, [None, 1], name='Y')  # outputs
W = tf.Variable(tf.zeros([2, 1]), name='W')
b = tf.Variable(tf.zeros([1,]), name='b')

# Define the model
pred = tf.nn.sigmoid(tf.matmul(x, W) + b)  # Model

# Define the loss
with tf.name_scope("loss"):
    loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred) + (1-y) * tf.log(1-pred), reduction_indices=1))

# Define the optimizer method you want to use
with tf.name_scope("optimizer"):
    optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# Include some Tensorboard visualization
writer_train = tf.summary.FileWriter("./my_model/")


# Start training session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer_train.add_graph(sess.graph)
    
    for epoch in range(1000):
        _, c, p = sess.run([optimizer, loss, pred], feed_dict={x: X,
                                                      y: Y})
print (p, y)

[[ 0.00839782]
 [ 0.1499088 ]
 [ 0.1499088 ]
 [ 0.78595555]] Tensor("Y:0", shape=(?, 1), dtype=float32)


To visualize the graph you just created, launch tensorbord.  
`$tensorboard --logdirs=./` on linux (with corresponding logdir)

---
### Building a XOR gate

Design a neural network with 2 layers.
- layer1 has 2 neurons (sigmoid or tanh activation)
- Layer2 has 1 neuron (it outouts the prediction)


In [2]:
# Define our Dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
Y = np.array([0,1,1,0]).reshape(-1,1)

tf.reset_default_graph()

# Define the tensorflow tensors
x = tf.placeholder(tf.float32, [None, 2], name='X')  # inputs
y = tf.placeholder(tf.float32, [None, 1], name='Y')  # outputs
W1 = tf.Variable(tf.random_normal([2, 2]), name='W1') #Weights matrixes
W2 = tf.Variable(tf.random_normal([2, 1]), name='W2')
#b1 = tf.Variable(tf.random_normal([2,2],mean=0,stddev=1), name='b1')
b1 = tf.Variable(tf.zeros([1,]), name='b1') #Biases
b2 = tf.Variable(tf.zeros([1,]), name='b2')
#,stddev=0.01

# Define the prediction -> that's the data we want to improve
#pred = tf.nn.sigmoid(tf.matmul(x, W1) + b1)  # Model
pred = tf.nn.sigmoid(tf.matmul(tf.nn.tanh(tf.matmul(x, W1) + b1),W2)+b2)
#pred = tf.nn.sigmoid((tf.nn.tanh(tf.matmul(x, W1) + b1)+tf.nn.tanh(tf.matmul(x, W1) + b1)))

# Define the loss -> minimizing the loss will result in better predictions -> better model
with tf.name_scope("loss"):
    loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred) + (1-y) * tf.log(1-pred), reduction_indices=1))
    loss_summary = tf.summary.scalar("loss",loss)
    
# Define the optimizer method we want to use
with tf.name_scope("optimizer"):
    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss) #0.1 is the learning rate

# Include some Tensorboard visualization
writer_train = tf.summary.FileWriter("./my_model/")


# Start training session
with tf.Session() as sess:
    #Running the session
    sess.run(tf.global_variables_initializer())
    writer_train.add_graph(sess.graph)
    
    for epoch in range(10000):
        _, c, p = sess.run([optimizer, loss, pred], feed_dict={x: X,
                                              y: Y})
        #We need those to print the weights later on
        variables_names = [v.name for v in tf.trainable_variables()]
        values = sess.run(variables_names)
            

        #writer_train.add_summary(ls,epoch)
print (p, y)


[[ 0.01878518]
 [ 0.49531072]
 [ 0.97386271]
 [ 0.51125109]] Tensor("Y:0", shape=(?, 1), dtype=float32)


As we can see we usually have these results within a 10^-2 margin 
[[ 0.04336669] -> 0 XOR 0 : 0 
[ 0.92657214]  -> 0 XOR 1 : 1
[ 0.92823118]  -> 1 XOR 0 : 1
[ 0.03871076]] -> 1 XOR 1 : 0
So the model is pretty good.
Tensor("Y:0", shape=(?, 1), dtype=float32)

### Print the weights of the model


In [3]:
for k, v in zip(variables_names, values):
    print ("Variable: ", k)
    print ("Shape: ", v.shape)
    print (v)
        
         
#W1:
#[[ 2.74875498 -2.75169706]
# [-2.72136331  2.83918357]]

#W2:
#[[-3.58781314]
# [-3.57973051]]

#The first weights are two opposed vectors basically. So it is meant to balance something at first sight.
#When you look at the columns, you look at the weights applied to the entrance of the OR gate. 
#The OR gate will only give 0 if both entries are at 0, so if any entry is at 1, the result is 1.
#In other words, 25% of outputs will be 0, 75% will be 1
#Hence a positive weight on the first item (0) and a negative one for the second (1)

#Same logic applies to the second column, the NAND will give 1 if there is any 0
#So the 0 is weighted negatively and the 1 positively.

#For W2, we have a XOR gate. 50% of having either 0 or 1
#Hence equal weights for W2


Variable:  W1:0
Shape:  (2, 2)
[[-1.30482197 -2.29883933]
 [ 3.06002903 -3.35569358]]
Variable:  W2:0
Shape:  (2, 1)
[[-2.5333662 ]
 [-3.04376483]]
Variable:  b1:0
Shape:  (1,)
[ 0.72431636]
Variable:  b2:0
Shape:  (1,)
[-0.50039285]


---
### Building a CNN to predict the MNIST digits
We will use a Convolutional neural network to predict the digits from MNIST.

Ressources : [SNN](https://nbviewer.jupyter.org/github/marc-moreaux/Deep-Learning-classes/blob/master/notebooks/Intro_to_SNN.ipynb)

Model :
- 1st layer : 6 convolutional kernels with shape (3,3)
- 2nd layer : 6 convolutional kernels with shape (3,3)
- 3rd layer : Softmax layer


In [4]:
import pickle, gzip, numpy, math
import numpy as np


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# we create the session
sess = tf.InteractiveSession()

# we initiate the computationnal graph
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)

import tensorflow as tf
sess = tf.InteractiveSession()

#Simple function to create a wieght variable out of normal random numbers
def weight_variable(shape, given_name):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial, name=given_name)

#Function to create a constant tensor for the biases
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

#We define this function to gian some space, also the strides and padding will always be the same
#Strides at 1 means that the kernel will travel all around the image, pixel per pixel
#Padding = SAME means that the output image will be 28x28 by adding a padding of 1 pixel on each side
#Since output_size = image_size - kernel_size + 1
#So we would have had 26x26 images, which is no good. Now we will have 28x28 convolved images
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

#We initialize the kernels & bias for the first convolution
W_conv1 = weight_variable([3, 3, 1, 6], "First_conv")
b_conv1 = bias_variable([6])
#We reshape x in order to convolve it with W1 whose size is [3,3,1,6], because X's shape is [?, 784] right now
x_image = tf.reshape(x, [-1, 28, 28, 1])
#And we convolve the two of them and make the result go through a rectifier as an activtion function
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

#We initialize the second kernels and bias
W_conv2 = weight_variable([3, 3, 6, 6], "Second_conv")
b_conv2 = bias_variable([6])
#And we convolve the previous result with the new kernels
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2) + b_conv2)

#Now that the image has been convolved twice, we will softmax the results which will represent the third layer of the CNN
W_fc1 = weight_variable([28 * 28 * 6, 10], "Softmax")
b_fc1 = bias_variable([10])
h_conv2_flat = tf.reshape(h_conv2, [-1, 28*28*6])

#And here we have the result of the whole process
y_conv = tf.nn.softmax(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)

#We create this cross_entropy vector by softmaxing y_conv and applying a mean algorithm on it
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
#Then we optimize the training by calculating the train_step that will be the best for us
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
#We calculate the correct predictions by comparing y_conv and y
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
#And calculate the accuracy by simply applying the mean algorithm
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

#Here is the training!
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #This is basically the epoch
    for i in range(20000):
        #Shuffling the dataset, 50 at a time to maintain some relationship between the samples
        batch = mnist.train.next_batch(50)
        #Every 100 iterations (fewer prints)
        if i % 100 == 0:
            #We calculate the current accuracy and print it
            train_accuracy = accuracy.eval(feed_dict={
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))
        #And we train again the model
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    #The next 2 lines are here to print the final values of the kernels later on
    variables_names = [v.name for v in tf.trainable_variables()]
    values = sess.run(variables_names)
    #After we trained, we calculate the accuracy of the model by testing it with new data
    print('test accuracy %g' % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.16
step 100, training accuracy 0.1
step 200, training accuracy 0.28
step 300, training accuracy 0.46
step 400, training accuracy 0.76
step 500, training accuracy 0.66
step 600, training accuracy 0.72
step 700, training accuracy 0.78
step 800, training accuracy 0.78
step 900, training accuracy 0.88
step 1000, training accuracy 0.9
step 1100, training accuracy 0.76
step 1200, training accuracy 0.88
step 1300, training accuracy 0.78
step 1400, training accuracy 0.76
step 1500, training accuracy 0.8
step 1600, training 

### Print of the weights of the model

In [5]:
for k, v in zip(variables_names, values):
    if (k=="First_conv:0") or (k=="Second_conv:0"):
        print (k)
        print (v)

#Those are the convolution kernels, the patterns we want to scan on every image.
#The first_conv variable below is simply 6 kernels of size 3x3. It is hard to see it of course due to the amount of numbers.

#Let's say one of these "filters" is a vertical line : [[0,1,0][0,1,0][0,1,0]]
#Then if passing through the image, it reveals that the number is having this pattern, 
#we can conclude it is a 1, 4 or 5 for example.

#Those numbers are the patterns we want to look after in the images to have this program recognize handwritten numbers.
#This website is a great example of illustration : http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html

#The second layer is the same idea but instead of applying it to the images of the numbers, 
#we apply it to the first layer of convoluted images, which allow greater accuracy

First_conv:0
[[[[ 0.41040269 -0.24751025 -0.02143869 -0.08657531  0.49471778 -0.15619288]]

  [[ 0.21743721 -0.10778099  0.02708933  0.05726246  0.4997879  -0.03044852]]

  [[ 0.20509793 -0.13276838 -0.29111812  0.05846177  0.39199898 -0.25167939]]]


 [[[ 0.34010619 -0.18600784 -0.19548735 -0.24181874  0.29769245 -0.1752955 ]]

  [[ 0.19219247 -0.21399131 -0.19083185 -0.02770928  0.28351545 -0.16490547]]

  [[ 0.07996404 -0.07903549 -0.08292654 -0.30114734  0.39681226 -0.15142746]]]


 [[[ 0.11254784 -0.28572974 -0.24157012 -0.29847753 -0.01149186 -0.12195846]]

  [[ 0.10432011 -0.09811818 -0.15643612 -0.05929291 -0.02031766 -0.17199965]]

  [[-0.06009627 -0.1670676   0.01220296 -0.08182309  0.02755539 -0.04755379]]]]
Second_conv:0
[[[[ 0.00037815 -0.13597956  0.19302335  0.1910414  -0.08074488  0.14495184]
   [ 0.03262785  0.35274938  0.07395323 -0.00901592  0.15325676 -0.10162979]
   [-0.06974695  0.29939443  0.01173009 -0.10176157 -0.00187154 -0.21027333]
   [ 0.02178338  0.2978798

### Initialasing the kernel

In [6]:

import pickle, gzip, numpy, math
import numpy as np


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# we create the session
sess = tf.InteractiveSession()

# we initiate the computationnal graph
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# we create the session
sess = tf.InteractiveSession()

# we initiate the computationnal graph
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)

import tensorflow as tf
sess = tf.InteractiveSession()

def weight_variable(shape, given_name):
    if(given_name=="First_conv"):
        initial = [[[[-1,0.5,-1,-1,-1,-1]],
                    [[1,1,-1,-1,-1,-1]],
                    [[0.5,0.5,-1,-1,-1,1]]],
                    
                   [[[-1,1,1,-1,1,-1]],
                    [[-1,-1,-1,1,1,1]],
                    [[1,1,1,-1,1,1]]],
                    
                   [[[-1,-1,0.5,-1,-1,1]],
                    [[1,-1,1,1,-1,-1]],
                    [[0.5,-1,0.5,-1,-1,-1]]]]

    else:
        initial = tf.truncated_normal(shape, stddev=0.1)
    
    return tf.Variable(initial, name=given_name)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

W_conv1 = weight_variable([3, 3, 1, 6], "First_conv_set")
b_conv1 = bias_variable([6])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)


W_conv2 = weight_variable([3, 3, 6, 6], "Second_conv_set")
b_conv2 = bias_variable([6])
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2) + b_conv2)

W_fc1 = weight_variable([28 * 28 * 6, 10], "Softmax_set")
b_fc1 = bias_variable([10])
h_conv2_flat = tf.reshape(h_conv2, [-1, 28*28*6])

y_conv = tf.nn.softmax(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    variables_names = [v.name for v in tf.trainable_variables()]
    values = sess.run(variables_names)
    print('test accuracy %g' % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
    
for k, v in zip(variables_names, values):
    if (k=="First_conv_set:0"):
        print (k)
        print (v)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.1
step 100, training accuracy 0.26
step 200, training accuracy 0.56
step 300, training accuracy 0.36
step 400, training accuracy 0.68
step 500, training accuracy 0.6
step 600, training accuracy 0.64
step 700, training accuracy 0.68
step 800, training accuracy 0.66
step 900, training accuracy 0.72
test accuracy 0.6697
First_conv_set:0
[[[[-0.08505905 -0.09973254 -0.1043578  -0.03310198 -0.08401346  0.11194234]]

  [[ 0.11677647  0.06157358 -0.00574576  0.06438847 -0.03816743  0.02394777]]

  [[-0.09995101 -0.17703106 -0.14432123  0.15987951 -0.06459424  0.11894002]]]


 [[[-0.17

We initialise the first kernel 
We did a small study to find the patterns we wanted to implement

![deep_1.png](attachment:deep_1.png)
(IF IMAGE DOES NOT DISPLAY : https://i.imgur.com/ytaklrz.jpg)

It was rather easy at this point to say "ok, let's implement it".
The problem came when we saw the shape of W1, since it is like a stack of 6 3x3 pictures. We tried to guess how the kernels were rearranged in this rectangular cuboid (or pavé droit in french) also known as brick.

We then came up with that :

![deep_2.png](attachment:deep_2.png)
(IF IMAGE DOES NOT DISPLAY : https://i.imgur.com/2heocic.jpg)

which resulted in that code : 
        initial = [[[[-1,0.5,-1,-1,-1,-1]],
                    [[1,1,-1,-1,-1,-1]],
                    [[0.5,0.5,-1,-1,-1,1]]],
                    
                   [[[-1,1,1,-1,1,-1]],
                    [[-1,-1,-1,1,1,1]],
                    [[1,1,1,-1,1,1]]],
                    
                   [[[-1,-1,0.5,-1,-1,1]],
                    [[1,-1,1,1,-1,-1]],
                    [[0.5,-1,0.5,-1,-1,-1]]]]
                    
So here are the results

On 1000 samples :


   With the random weights |      With initialized weights<br>
step 000, 0.14 | 0.1<br>
step 100, 0.16 | 0.46<br>
step 200, 0.26 | 0.6<br>
step 300, 0.54 | 0.62<br>
step 400, 0.64 | 0.7<br>
step 500, 0.72 | 0.78<br>
step 600, 0.62 | 0.92<br>
step 700, 0.68 | 0.8<br>
step 800, 0.76 | 0.8<br>
step 900, 0.72 | 0.96<br>
test accuracy 0.7958 | 0.9072<br>

<br>
But the kernel alues are modified by backpropagation, so in the end, we can see that the accuracy goes up way faster than with the random kernel but with 20000 epoch, it is not really distinguishable. Still, that's quite cool.