In [1]:
from keras import models
from keras import layers

network = models.Sequential()

Using TensorFlow backend.


This initializes an empty net that we can start adding layers to.
We progress as follows to build a network that can learn a function, which we know already, should be close to<br>
>   <x,y> --> <x+y,x-y>

First an input layer; see the (highly incomplete, sic!) documentation at https://keras.io/layers/core/

In [2]:
network.add(layers.Dense(2, activation='sigmoid', use_bias=True, input_shape=(2,)))

A Dense layer is a standard one which is connected to all nodes in the layer before and after (incl. here to the implicit input layer).<br>
The first argument, here given as 2, indicates the number of nodes (or neurons) in the hidden layer.
The input_shape argument is more cryptic. For now, we are satisfied by knowing that
> input_shape=(2,)

specifies a that this layer, and thus this network, takes 2 inputs (the apparently misplaced comma is correct Python syntax (sic!)).<br>
Notice in the documentation and the textbook, the word "Kernel" is used for the collection (more precisely: matrix) of weights. Setting use_bias=True adds an additional constant weigth as we have seen in the lecture (and the excerpt of Mitchell's book).<br>
A list of activation functions can be found here: https://keras.io/activations/<br>
<br>
So far we have built a network which, according to standard terminology, has two input nodes and three hidden nodes.<br>
The last added layer will be the output layer, so to complete it with an output layer with two nodes, we do as follows.

In [3]:
network.add(layers.Dense(2, activation='linear'))

The activation function 'linear' is a misnomer, it refers to the identity function, in other words it means no activation function.<br>
But now we have completed or network, and the next step is to configure the training which is done in the "compile" function, which takes three arguments:
> <i>optimizer</i>: corresponds to the backpropagation algorithm, and Keras a collection of variants; a stub of documentation is available at https://keras.io/optimizers/; see the example below of how an optimizer is created and used<br>
<i>loss</i>: what we have referred to as the error function; independently of what it is called, it is the function that will attempted minimized during training<br>
> <i>metrics</i>: a bit unclear if it has any influence on the training process; it is explained as a measurement that each monitored during training (unclear if monitoring here means 'printed out' or something more). Most examples in the textbook uses  <tt>metrics=['accuracy']</tt>, so we do that here.
Unfortunately, the textbook and the Keras documentation do not give a definition of accuracy (sic!).

In [4]:
from keras import optimizers
sgd = optimizers.SGD(lr=0.01,  momentum=0.1)
network.compile(optimizer =sgd, loss='mean_squared_error', metrics=['accuracy'])

The <tt>SGD</tt> type of optimizer stands for  seems to be closest to the traditional backpropagation; its <tt>lr</tt> parameter stands for learning rate, and <tt>momentum</tt> is what it says;  (the documentation mentions other possible arguments, some are more obvious than others).<p>
Now comes the difficult part, reading in the input data from a file (sic!)<br>
The following assumes that we have the file backPropTraining.csv in the working directory.
By a bit of guessing and arbitrary searching, the following code was put together

In [5]:
from numpy import genfromtxt
all_data = genfromtxt('backPropTraining.csv', delimiter=',')
inputs = all_data[:,(0,1)]
outputs = all_data[:,(2,3)]

Try the following to test that things look right

In [6]:
all_data

array([[-0.8 , -0.9 , -1.7 ,  0.1 ],
       [ 0.53,  0.43,  0.96,  0.1 ],
       [ 0.63, -0.96, -0.33,  1.59],
       ...,
       [-0.4 , -0.17, -0.57, -0.23],
       [ 0.21,  0.83,  1.04, -0.62],
       [ 0.01,  0.72,  0.73, -0.71]])

In [7]:
inputs

array([[-0.8 , -0.9 ],
       [ 0.53,  0.43],
       [ 0.63, -0.96],
       ...,
       [-0.4 , -0.17],
       [ 0.21,  0.83],
       [ 0.01,  0.72]])

In [8]:
outputs

array([[-1.7 ,  0.1 ],
       [ 0.96,  0.1 ],
       [-0.33,  1.59],
       ...,
       [-0.57, -0.23],
       [ 1.04, -0.62],
       [ 0.73, -0.71]])

To follow good practice, we split the data set into two parts, training and validation:

In [9]:
training_inputs = inputs[0:699,:]
validation_inputs = inputs[700:999,:]
training_outputs = outputs[0:699,:]
validation_outputs = outputs[700:999,:]

In [10]:
network.fit(inputs, outputs, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.callbacks.History at 0xea04dc8>

Continue from here on you own:
<ul>
	<li>Test the trained model on the validation data; can you see any signs of over training?</li>
    <li>Generate plots that shows how fast the training converges; you may copy-paste some lines from the 3.5-classifying-movie-reviewsCORRECTED-BY-HC.ipynb notebook</li>
	<li>Try to vary the training set-up in different ways to see the effect on the results (loss, accuracy, overtraining, speed of convergence:
       <ul>
	     <li>Different number of nodes in the hidden layer</li>
	     <li>With and without bias and with momentum of different magnitudea (for each of the different number of nodes in the hidden layer)</li>
	     <li>Try to add more hidden layers to see the effect</li>
	     <li>Try out different activation functions as well as different optimizers and loss functions
           (you will probably not know what they exactly mean, but try them anyhow to see if they are useful and perhaps try to find out more about them)</li>
       </ul>
    </li>
	<li>Find a mathematical expression that defines the sort of "accuracy" that Keras (and our textbook) applies. You teachers searches for a good answer resulted in useless references into some weird Python code.
    </li>

</ul>
