# Tensorflow tricks

In the [previous notebook](./sgd.ipynb), we saw how useful the automatic gradient computation provided by tensorflow can be. Now, we get to know further pieces of tensorflow code that help us in reducing boilerplate code.

In [5]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

## Build models with Keras

In the [previous notebook](./mlp.ipynb), we explicitly defined the neural network in terms of matrix multiplications and activation functions. In particular, when we go to deeper models later proceeding in this way could become very tedious. Using the ``tensorflow`` library `Keras`, we can remove much of this boilerplate code, so that building deep networks becomes playing lego!

`Keras` is built on top of the concept of `Layer`s that can be combined in a flexible way. For us a layer is essentially a matrix multiplication together with a non-linearity. Later in the course, we will meet more refined examples.

In [6]:
input_dim = 5

x = tf.keras.layers.Input(shape = (input_dim,))
y = tf.keras.layers.Dense(1)(x)

model = tf.keras.Model(x,y)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 5)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 6         
Total params: 6
Trainable params: 6
Non-trainable params: 0
_________________________________________________________________


Again, we generate simple training data as before.

In [7]:
tf.set_random_seed(42)
input_size = int(1e6)

w_true = tf.transpose([[1., 2., 3., 4., 5.]])
x = tf.random_uniform((input_size, input_dim), -1, 1)
y = tf.matmul(x,w_true) + tf.random_uniform((input_size,1), -.1, .1)

Next, we `compile` the model by specifying the loss function and the optimizer. We go for mean-squared error and plain SGD. After that we fit the model to the data. The parameter `epochs` determines how many passes we make over the data. `batch_size` controls the size of the mini-batch for the stochastic gradient descent.

In [8]:
batch_size = 128

model.compile(loss='mse', optimizer=tf.train.GradientDescentOptimizer(learning_rate = .01))
model.fit(x, y, epochs = 1, batch_size = batch_size)

Epoch 1/1


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7fa7db928898>

To check whether we have achieved convergence to the correct parameters, we peek inside the weight matrix. First, there are two layers: one formal input layer, and the actual dense layer

In [9]:
model.layers

[<tensorflow.python.keras._impl.keras.engine.input_layer.InputLayer at 0x7fa7dc197ac8>,
 <tensorflow.python.keras._impl.keras.layers.core.Dense at 0x7fa7dc197b38>]

From the dense layer, we can now extract the weights and thereby verify the convergence.

In [10]:
model.layers[1].weights

[<tf.Variable 'dense/kernel:0' shape=(5, 1) dtype=float32, numpy=
 array([[0.9998297],
        [1.9995269],
        [2.9989161],
        [4.0008774],
        [5.000411 ]], dtype=float32)>,
 <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32, numpy=array([-0.00014079], dtype=float32)>]

## Homework

Build and train a Keras MLP for the Iris dataset. What is a suitable loss function?