# Python for deep learning

## Cell 7

```python
from keras.models import Sequential
from keras.layers import Dense
from IPython.display import SVG
from keras.utils import model_to_dot

model = Sequential()
model.add(Dense(N*N-1, input_dim=(N*N), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
SVG(model_to_dot(model, show_shapes=True, show_layer_names=True, dpi=65).create(prog='dot', format='svg'))
```

We create our neural network using a sequential model. In a sequential model we have a number of sequential layers of neurons where neurons in the layer N are conneted to neurons in the layer N+1.  

An alternative way of building a neural network is using the functional API, which allows to create the network as an arbitrary directed, acyclic graph. However we will only use the sequential model here.

## Creating the model

We create a sequential model by instanciating the class Sequential of keras. We can then add step by step the different layers of neurons. Keras makes this easy for us.

In [5]:
from keras.models import Sequential
model = Sequential()

The network is empty at the beginning.

## Adding the first layer

We do not explicitly add a layer of input neurons, instead we specify the size of the input when we add the first inner (hidden) layer of neurons. 

In [7]:
N = 3
model.add(Dense(N*N-1, input_dim=(N*N), activation='relu'))
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 8)                 80        
Total params: 80
Trainable params: 80
Non-trainable params: 0
_________________________________________________________________


We create the first hidden layer as a densly connected layer and add it to the model. The layer is created by instanciating the class Dense of keras. The parameters are the number of neurons in the layer, the size of the input vector and the name of the activation function for the neurons in the layer.

We choose an input vector of size 9. The first layer has 8 neurons. Densly connected means that each of the 9 inputs is connected to each of the 8 neurons in the layer. That means we have $9\cdot 8 = 72$ connections between the input neurons and the neurons in the first layer. Each connection has a weight associated. Initially the weights are choosen randomly. They the values of the weights are a part of the parameters of the network, that will be modified during the training phase. 

In [10]:
model.weights

[<tf.Variable 'dense_1/kernel:0' shape=(9, 8) dtype=float32, numpy=
 array([[-8.6144090e-02, -2.4059239e-01,  4.0054047e-01, -3.6419776e-01,
         -2.8079641e-01, -2.1960348e-01, -3.5815239e-03, -5.8612972e-01],
        [-4.6627033e-01, -1.5683523e-01,  1.2064022e-01,  4.1032016e-01,
         -5.1545614e-01, -5.2719396e-01,  3.5565704e-01, -5.5124265e-01],
        [ 1.0242170e-01,  1.1665982e-01, -4.7613895e-01, -4.4029039e-01,
         -5.6886828e-01,  2.3656553e-01, -2.9338965e-01,  5.0669265e-01],
        [ 3.9589250e-01, -2.5064698e-01, -2.0996365e-01,  2.6267689e-01,
          4.5887136e-01, -3.3204722e-01,  2.4469316e-02,  1.8672305e-01],
        [-2.3420095e-01, -7.0637167e-02, -2.3495024e-01,  4.3944383e-01,
          5.6600571e-01,  6.1010897e-02, -5.5117929e-01, -6.7953169e-02],
        [ 2.2388858e-01,  1.1389434e-02, -2.0871052e-01, -3.6625922e-01,
          5.8868706e-02, -3.7787765e-01, -3.9690399e-01,  3.3117896e-01],
        [-5.6672508e-01, -1.7338607e-01,  2.564564

The summary of the model tells us that there are 80 parameters. 72 parameters are the weights of the connections. What are the remaining 8 parameters? Each neuron has an additional parameter, the bias. The output of a neuron is calculated by applying the activation function on the input values multiplied by the corresponding weights plus the bias. The summary shows that the biases of the eight neurons are initially 0. Like the weights the biases will be modified in the training phase. 

As activation function we selected ``relu``, which stands for `Rectified Linear Unit`. The ``relu`` function is defined as:

  $relu(x) = max(x,0)$

## A minimal network example