# Use of convolutions with tensorflow

In this notebook, you'll be using tensorflow to build a Convolutional Neural Network (CNN).  

### Convolution

Both, [this notebook](https://nbviewer.jupyter.org/github/marc-moreaux/Deep-Learning-classes/blob/master/notebooks/Convolution.ipynb) and this [wikipedia page](https://en.wikipedia.org/wiki/Convolution) might help you understand what is a convolution.

no, if we consider two functions $f$ and $g$ taking values from $\mathbb{Z} \to \mathbb{R}$ then:  
$ (f * g)[n] = \sum_{m = -\infty}^{+\infty} f[m] \cdot g[n - m] $

In our case, we consider the two vectors $x$ and $w$ :  
$ x = (x_1, x_2, ..., x_{n-1}, x_n) $  
$ w = (w_1, w_2) $

And get :   
$ x * w = (w_1 x_1 + w_2 x_2, w_1 x_2 + w_2 x_3, ..., w_1 x_{n-1} + w_2 x_n)$


#### Deep learning subtility :
    
In most of deep learning framewoks, you'll get to chose in between three paddings:
- **Same**: $(f*g)$ has the same shape as x (we pad the entry with zeros)
- **valid**: $(f*g)$ has the shape of x minus the shape of w plus 1 (no padding on x)
- **Causal**: $(f*g)(n_t)$ does not depend on any $(n_{t+1})$

### Tensorflow

"TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks.[3] It is used for both research and production at Google often replacing its closed-source predecessor, DistBelief." - Wikipedia

We'll be using tensorflow to build the models we want to use. 

Here below, we build a AND gate with a very simple neural network :

In [16]:
import tensorflow as tf
import numpy as np

tf.reset_default_graph()

# Define our Dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
Y = np.array([0,0,0,1]).reshape(-1,1)

# Define the tensorflow tensors
x = tf.placeholder(tf.float32, [None, 2], name='X')  # inputs
y = tf.placeholder(tf.float32, [None, 1], name='Y')  # outputs
W = tf.Variable(tf.zeros([2, 1]), name='W')
b = tf.Variable(tf.zeros([1,]), name='b')

# Define the model
pred = tf.nn.sigmoid(tf.matmul(x, W) + b)  # Model

# Define the loss
with tf.name_scope("loss"):
    loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred) + (1-y) * tf.log(1-pred), reduction_indices=1))

# Define the optimizer method you want to use
with tf.name_scope("optimizer"):
    optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# Include some Tensorboard visualization
writer_train = tf.summary.FileWriter("./my_model/")


# Start training session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer_train.add_graph(sess.graph)
    
    for epoch in range(1000):
        _, c, p = sess.run([optimizer, loss, pred], feed_dict={x: X,
                                                      y: Y})
print(p, y)

[[ 0.00839782]
 [ 0.1499088 ]
 [ 0.1499088 ]
 [ 0.78595555]] Tensor("Y:0", shape=(?, 1), dtype=float32)


To visualize the graph you just created, launch tensorbord.  
`$tensorboard --logdirs=./` on linux (with corresponding logdir)

---
### Get inspiration from the preceding code to build a XOR gate

Design a neural network with 2 layers.
- layer1 has 2 neurons (sigmoid or tanh activation)
- Layer2 has 1 neuron (it outouts the prediction)

And train  it

It's **mandatory** that you get a **tensorboard visualization** of your graph, try to make it look good, plz :)

Here below I put a graph of the model you want to have (yet your weights won't be the same)
![graph](https://i.stack.imgur.com/nRZ6z.png)

In [17]:
### Code here

tf.reset_default_graph()

# Define our Dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
Y = np.array([0,1,1,0]).reshape(-1,1)

# Define the tensorflow tensors
x = tf.placeholder(tf.float32, [None, 2], name='X')  # inputs
y = tf.placeholder(tf.float32, [None, 1], name='Y')  # outputs
W1 = tf.Variable(tf.random_normal([2, 2]), name='W1')
b1 = tf.Variable(tf.random_normal([2,]), name='b1')
W2 = tf.Variable(tf.random_normal([2, 1]), name='W2')
b2 = tf.Variable(tf.random_normal([1,]), name='b2')

# Define the model
A2 = tf.nn.tanh(tf.matmul(x, W1) + b1)  # Model
pred = tf.nn.sigmoid(tf.matmul(A2, W2) + b2)

# Define the loss
with tf.name_scope("loss"):
    loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred) + (1-y) * tf.log(1-pred), reduction_indices=1))

# Define the optimizer method you want to use
with tf.name_scope("optimizer"):
    optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# Include some Tensorboard visualization
writer_train = tf.summary.FileWriter("./my_modelXOR/")


# Start training session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer_train.add_graph(sess.graph)
    
    for epoch in range(10000):
        _, c, p = sess.run([optimizer, loss, pred], feed_dict={x: X,
                                                      y: Y})
    print_W1 = sess.run(tf.trainable_variables("W1"))
    print_W2 = sess.run(tf.trainable_variables("W2"))
print(p, y)

[[ 0.00197501]
 [ 0.99865156]
 [ 0.49910623]
 [ 0.50053406]] Tensor("Y:0", shape=(?, 1), dtype=float32)


### Print the weights of your model
And give an interpretation on what they are doing

In [18]:
### Code here
print("W1", print_W1)
print("W2", print_W2)

W1 [array([[ 4.84613419,  4.65626192],
       [-2.7323525 ,  2.09388518]], dtype=float32)]
W2 [array([[-4.24455166],
       [ 3.93683577]], dtype=float32)]


---
### Build a CNN to predict the MNIST digits
You can now move to CNNs. You'll have to train a convolutional neural network to predict the digits from MNIST.

You might want to reuse some pieces of code from [SNN](https://nbviewer.jupyter.org/github/marc-moreaux/Deep-Learning-classes/blob/master/notebooks/Intro_to_SNN.ipynb)

Your model should have 3 layers:
- 1st layer : 6 convolutional kernels with shape (3,3)
- 2nd layer : 6 convolutional kernels with shape (3,3)
- 3rd layer : Softmax layer

Train your model.

Explain all you do, and why, make it lovely to read, plz o:)

In [19]:
# Importation of data, instead of downloading via an external source, we use the data included in Tensorflow
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [20]:
# Helper Functions

def weight_variable(shape):
  initial = tf.random_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


In [22]:
# CNN


n_iteration = 20000
learning_rate = 0.01
batch_size = 50

tf.reset_default_graph()

# Initialize placeholders for input and output
with tf.variable_scope('Placeholders') as scope:
    x = tf.placeholder(tf.float32, [None, 784], name='X')  # inputs
    y = tf.placeholder(tf.float32, [None, 10], name='Y')  # outputs

# Define the first layer
with tf.variable_scope('Layer1') as scope:
    W_conv1 = weight_variable([3, 3, 1, 6])
    b_conv1 = bias_variable([6])
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    h_conv1 = conv2d(x_image, W_conv1) + b_conv1

# Define the second layer
with tf.variable_scope('Layer2') as scope:
    W_conv2 = weight_variable([3, 3, 6, 6])
    b_conv2 = bias_variable([6])
    h_conv2 = conv2d(h_conv1, W_conv2) + b_conv2

# Define the last layer
with tf.variable_scope('Layer3') as scope:
    h_layer3 = tf.contrib.layers.flatten(h_conv2)
    logits = tf.layers.dense(h_layer3, 10)
    y_conv = tf.nn.softmax(logits)

# Loss
with tf.name_scope("loss"):
    entropy = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=y)
    loss = tf.reduce_mean(entropy)
    
# Optimizer
with tf.name_scope("optimizer"):
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss)

# Get infos on accuracy
with tf.name_scope("accuracy"):
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

# To display the loss and the training accuracy in Tensorboard 
with tf.name_scope("summary"):
    tf.summary.scalar('accuracy', accuracy)
    tf.summary.scalar('loss', loss)
    merge_summary = tf.summary.merge_all()

# Print the logs for Tensorboard
writer_train = tf.summary.FileWriter("./my_modelMNIST/")

# Run session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer_train.add_graph(sess.graph)
    
    for i in range(n_iteration):
        total_loss = 0
        batch = mnist.train.next_batch(batch_size)
        _, l, summary= sess.run([optimizer, loss, merge_summary], feed_dict={x: batch[0], y: batch[1]})
        total_loss += l
        if i % 100 == 0:
          train_accuracy = accuracy.eval(feed_dict={x: batch[0], y: batch[1]})
          print('step :', i, 'training accuracy :', train_accuracy, 'loss :', total_loss/(len(batch[0])))
        writer_train.add_summary(summary, i)

    print_W_conv1 = sess.run(W_conv1)
    print_W_conv1 = sess.run(W_conv2)
    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y: mnist.test.labels}))


step : 0 training accuracy : 0.48 loss : 0.0455452489853
step : 100 training accuracy : 0.9 loss : 0.0073886090517
step : 200 training accuracy : 0.82 loss : 0.0191392862797
step : 300 training accuracy : 0.9 loss : 0.00964303135872
step : 400 training accuracy : 0.88 loss : 0.00614393174648
step : 500 training accuracy : 0.92 loss : 0.00987895131111
step : 600 training accuracy : 0.96 loss : 0.004123685956
step : 700 training accuracy : 0.94 loss : 0.00504373311996
step : 800 training accuracy : 0.9 loss : 0.00747117459774
step : 900 training accuracy : 0.88 loss : 0.00853220820427
step : 1000 training accuracy : 0.82 loss : 0.00909156858921
step : 1100 training accuracy : 0.96 loss : 0.00326681941748
step : 1200 training accuracy : 0.9 loss : 0.00616743266582
step : 1300 training accuracy : 0.92 loss : 0.00850886523724
step : 1400 training accuracy : 0.92 loss : 0.00566857814789
step : 1500 training accuracy : 0.84 loss : 0.0104892086983
step : 1600 training accuracy : 0.92 loss : 0.

step : 13500 training accuracy : 0.94 loss : 0.00265745341778
step : 13600 training accuracy : 0.78 loss : 0.0156340074539
step : 13700 training accuracy : 0.96 loss : 0.0019284543395
step : 13800 training accuracy : 0.92 loss : 0.00653774738312
step : 13900 training accuracy : 0.94 loss : 0.00685636878014
step : 14000 training accuracy : 0.98 loss : 0.00187950193882
step : 14100 training accuracy : 0.98 loss : 0.00820836484432
step : 14200 training accuracy : 0.92 loss : 0.0102838206291
step : 14300 training accuracy : 0.96 loss : 0.00510863900185
step : 14400 training accuracy : 0.9 loss : 0.0058442735672
step : 14500 training accuracy : 0.92 loss : 0.0035718023777
step : 14600 training accuracy : 0.92 loss : 0.00662855505943
step : 14700 training accuracy : 0.94 loss : 0.00316259086132
step : 14800 training accuracy : 0.98 loss : 0.0025366383791
step : 14900 training accuracy : 0.94 loss : 0.00543116152287
step : 15000 training accuracy : 0.9 loss : 0.0102091228962
step : 15100 trai

### Print the weights of your model
And give an interpretation on what they are doing

In [23]:
print("W_con1", print_W1)
print("W_con2", print_W2)

W_con1 [array([[ 4.84613419,  4.65626192],
       [-2.7323525 ,  2.09388518]], dtype=float32)]
W_con2 [array([[-4.24455166],
       [ 3.93683577]], dtype=float32)]


### Chose one (tell me what you chose...)
- Show how the gradients (show only one kernel) evolve for good and wrong prediction. (hard)
- Initialize the kernels with values that make sense for you and show how they evolve. (easy) 
- When training is finished, show the 6+6=12 results of some convolved immages. (easy)

In [None]:
### Code here