The following sections demonstrate the function and behavior of some Keras layers. 

With building a single layer model and running a forward pass (e.g. calling `predict()`) it is possible to introspect the behavior of a layer in isolation. 

For more information see [Keras layers](https://keras.io/layers/about-keras-layers/).

## Dense layer

A dense layer performs the computation `output = activation(dot(input, W) + b)`.

A dense layer takes a tensor of shape (batch_size, input_size) as input and returns a tensor of shape (batch_size, output_size).

 * `W` is a (input_size, output_size) weight matrix 
 * `b` is a output_size dim. vector

Some frequently used [activation functions](https://keras.io/activations) are:
 * `linear`: identity function, e.g. no activation is applied
 * `relu`: rectified linear unit
 * `sigmoid`: Sigmoid function, used in binary classification
 * `softmax`: softmax function, used in multi-class classification


In [18]:
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model

W = np.array([
    [1,2,3,4,5],
    [1,2,3,4,5]])
b = np.array([0,0,0,0,0])
weights_and_bias = (W, b)

inputs = Input(shape=(2,))
outputs = Dense(5, activation='linear', weights=weights_and_bias)(inputs) 
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='sgd', loss='mse')

print('Input shape', model.input.shape)
print('Output shape', model.output.shape)

x = np.array([1,2])
print('Output:', model.predict(np.expand_dims(x, 0)))

np_result = np.dot(x, W) + b
print('Numpy result:', np_result)

Input shape (?, 2)
Output shape (?, 5)
Output: [[ 3.  6.  9. 12. 15.]]
Numpy result: [ 3  6  9 12 15]


## Softmax activation

For a vector $x = [x_0,...,x_n]$ the softmax function calculates:

$ softmax(x_i) = \frac{e^{x_i}}{\sum_i {e^{x_i}}} $

**Note:** In Keras any [activation function](https://keras.io/activations/) can either be used with an Activation layer, or through the activation argument in the layer constructor.

In [12]:
import numpy as np
from keras.layers import Input, Activation
from keras.models import Model

nb_classes = 4
inputs = Input(shape=(nb_classes,), dtype='float32')
softmax = Activation('softmax')(inputs)
model = Model(inputs=inputs, outputs=softmax)
model.compile(optimizer='sgd', loss='mse')

activation = np.array([[8.0, 2.0, 9.0, 3.0]])
probs = model.predict(activation)
print('probabilities:', probs)
print('predicted class:', np.argmax(probs))
print('sum of probabilities:', probs.sum())

probabilities: [[2.6827645e-01 6.6499080e-04 7.2925097e-01 1.8076325e-03]]
predicted class: 2
sum of probabilities: 1.0000001


## Cross Entropy loss

Input to the cross entropy function **must** be a probability distribution.

$ cross\_entropy(y\_true, y\_pred) = -log(y\_pred_{y\_true}) $

There are multiple implementations of cross entropy:
 * categorical vs. binary
 * sparse vs. one-hot encoded
 
In Tensorflow you usually use optimized functions that combine softmax and cross entropy.

In [26]:
import keras
from keras import backend as K
from keras.losses import sparse_categorical_crossentropy

nb_classes = 5
y_true = K.variable(value=np.array([1]), dtype='float32')
y_pred = K.variable(value=np.array([[0.01, 0.01, 0.96, 0.01, 0.01]]), dtype='float32')
loss_fn = keras.losses.sparse_categorical_crossentropy(y_true, y_pred)

loss = K.eval(loss_fn)
print('cross entropy loss:', loss[0])

cross entropy loss: 4.6051702


## Embedding layer

In [None]:
import numpy as np
from keras.layers import Input, Dense
from keras.layers.embeddings import Embedding
from keras.models import Model

max_sequence = 5
embedding_dims = 10

# Size of the vocabulary. The assumption is that indexing starts with 0
# and is consequtive.
vocab_size = 3

inputs = Input(shape=(max_sequence,), dtype='int32') # each X_i is a sequence of 'max_sequence' integers
outputs = Embedding(vocab_size, embedding_dims, input_length=max_sequence)(inputs)

model = Model([inputs], [outputs])
model.compile(optimizer='sgd', loss='mse')

model.predict( np.array([[0,1,1,1,2]]) )

# An embedding layer is like a lookup table. The values in the input 
# vector are used as indices in the internal weights matrix.

## Lambda layer
Simple way to add functionality to a model. Best used for stateless functions. For stateful functions it is better to implement a separate layer.

In [5]:
import numpy as np
from keras.layers import Input, Concatenate, Lambda, Embedding, Average, Dense, Add
from keras.models import Model


inputs = Input(shape=(4,4), dtype='float32')
first_row = Lambda(lambda x: x[:,0,:])(inputs)
first_column = Lambda(lambda x: x[:,:,0])(inputs)
doubled = Lambda(lambda x: x*2.0)(inputs)

model = Model(inputs=inputs, outputs=[first_row, first_column, doubled])
model.compile(optimizer='sgd', loss='mse')

x = np.array([
    [
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3],
    [4,4,4,4],        
    ]
], dtype=np.float32)

x,y,z = model.predict(x)
print(x)
print(y)
print(z)

[[1. 1. 1. 1.]]
[[1. 2. 3. 4.]]
[[[2. 2. 2. 2.]
  [4. 4. 4. 4.]
  [6. 6. 6. 6.]
  [8. 8. 8. 8.]]]


## Merge layer: concatenate, average and sum

In [2]:
import numpy as np
from keras.layers import Input, Concatenate, Lambda, Embedding, Average, Dense, Add
from keras.models import Model


inputs = Input(shape=(4,4), dtype='float32')
word_vector_rows = [Lambda(lambda x: x[:,i,:])(inputs) for i in range(win_size)]
concat_out = Concatenate()(word_vector_rows)
avg_out = Average()(word_vector_rows)
sum_out = Add()(word_vector_rows)


model = Model(inputs=inputs, outputs=[concat_out, avg_out, sum_out])

model.compile(optimizer='sgd', loss='mse')

x = np.array([
    [
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3],
    [4,4,4,4],        
    ]
], dtype=np.float32)

print(x.shape)

concat_out_val, avg_out_val, sum_out_val = model.predict(x)
print('concat_out', concat_out_val)
print('avg_out', avg_out_val)
print('sum_out', sum_out_val)

(1, 4, 4)
concat_out [[1. 1. 1. 1. 2. 2. 2. 2. 3. 3. 3. 3. 4. 4. 4. 4.]]
avg_out [[2.5 2.5 2.5 2.5]]
sum_out [[10. 10. 10. 10.]]
