# Express Deep Learning in Python

## Advanced Layers

The `Dense` layer is only one of the possible core layers of Keras. `Dense` is a *forward* layer, this are the ones that take an input and do some transformation on it (in this case a matrix multiplication).

Other important layers to consider are: activation layers, regularization layers, dropout layers, convolutional layers, pooling layers, recurrent layers, normalization layers, embedding layers, noise layers, etc.

For this tutorial we will focus on some layers to aid in the tuning of the network: activations, regularizers and dropout; as well as the layers needed to design convolutional neural networks: convolutional and pooling layers.

We will point out other tutorials and examples to learn about the other kind of layers at the end of this tutorial.

In [None]:
from keras import backend as K
from keras.layers import Activation, Dense
from keras.models import Sequential

## Activation Functions

A neural network classifier with linear activations has no more *representation* power than a logistic regression classifier. In order to express non-linearity with a neural network model a non-linear function is needed as activation function for each neuron.

One simple activation function to use is the **sigmoid (or logistic) function**, the same one used in the logistic regression algorithm, which restricts the output value to be between zero and one. This was one of the most common nonlinearities used as activation function in some of the *first versions* of neural networks. There are however other possibilities (all the following available in Keras, but there are more which can be adapted):

* rectified linear unit (ReLU)
* tanh
* hard sigmoid
* softsign
* softplus
* exponential linear unit (elu)
* scaled exponential linear unit (selu)
* leaky rectifier linear unit (Leaky ReLU)
* parametric rectified linear unit (PReLU)

Of these, the one most used in the present state-of-the-art neural networks classifiers is the **ReLU**, because tipically learns much faster in networks with many layers [1].

There is another activation layer which is the **SoftMax** activation. This is generally used as the last activation layer, i.e. as the output of the network. This function, also known as *normalized exponential function* is a generalization of the logistic function that "squashes" a K-dimensional vector ${\displaystyle \mathbf {z}}$ of arbitrary real values to a K-dimensional vector ${\displaystyle \sigma (\mathbf {z} )}$ of real values in the range [0, 1] that add up to 1.

### Activation Functions in Keras

Keras provides two ways to define an activation function. Any method is equally valid.

#### Activation as a parameter of a forward layer

In [None]:
model = Sequential()
model.add(Dense(64, input_shape=(784,), activation='relu'))
model.add(Dense(10, activation='softmax'))

#### Activation as a layer

In [None]:
model = Sequential()
model.add(Dense(64, input_shape=(784,)))
model.add(Activation('tanh'))
model.add(Dense(10))
model.add(Activation('softmax'))

#### Activation from a TensorFlow function

In the previous examples we used some of the available functions in the Keras library.

We can also use an element-wise TensorFlow function as activation.

In [None]:
model = Sequential()
model.add(Dense(64, input_shape(784,), activation=K.sigmoid))
model.add(Dense(10, activation='softmax'))

## References

[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521, no. 7553 (2015): 436-444.