# Chapter 2: Keras in Action

- Tensor: n-dimensional matrix
- A single neuron will take several inputs, in here there will be a computation by actiovation function and outputing a value. Each value from several inputs will be multiplied with its corresponding weights and plug it to activation function, Voila... there is an output.
- Activation Function:
  - Sigmoid Activation Function
    Defined as: $\frac{1}{1 + e^{-z}}$ which renders the output between 0 and 1. Low influence -> low output and higher influence -> high output. Keras: keras.activations.sigmoid(x).
  - ReLU Activation Function
    Uses function $f(z) = max(0, z)$. Keras: keras.activations.relu(x, alpha=0.0, max_value=None). Because if the value of z is negative, the output will be zero (which is a horizontal line, with derivative zero). So the weights will not easily updated. To solve the problem, there is Leaky ReLU, where the negative value outputs a slightly slanting line instead of a horizontal line, which helps in updating the weights through backpropagation. f(z) = z, when z > 0 and f(z) = az, when z < 0 and where a is a parameter that defined as a small constant, say 0.005. Keras: keras.activations.LeakyReLU(X, alpha=0.0, max_value=None).



## Model

The easiest way to define a model is by using the sequential model, which allows easy creation of a linear stack of layers. The following code: the layer would have 10 neurons, and would recieve an input with 15 neurons and be activated with the ReLU activation function.


In [0]:
# from keras.models import Sequential
# from keras.layers import Dense, Activation

# model = Sequential()
# model.add(Dense(10, input_dim=15))
# model.add(Activation('relu'))

## Layers

- Layer in DNN is defined as a group of neurons.

### Core Layers

- Dense Layer: is a regular DNN layer that connects every neuron in the defined layer to every neuron in the previous layer. For instance, if Layer 1
has 5 neurons and Layer 2 (dense layer) has 3 neurons, the total number
of connections between Layer 1 and Layer 2 would be 15 (5 × 3). Since it
accommodates every possible connection between the layers, it is called a
“dense” layer.

In [0]:
keras.layers.Dense(units, 
                   activation=None, 
                   use_bias=True,
                   kernel_initializer='glorot_uniform',
                   bias_initializer='zeros',
                   kernel_regularizer=None,
                   bias_regularizer=None,
                   activity_regularizer=None,
                   kernel_constraint=None,
                   bias_constraint=None)

Example: one hidden layer of 5 neurons with 10 features and single output layer (binary classification)

In [0]:
# model = Sequential()
# model.add(Dense(5, input_dim=10, activation='sigmoid'))
# model.add(Dense(1, activation='sigmoid'))

## Dropout Layer

Reduce overfitting by introducing regularization and generalization capabilities into the model. In action, it drops certain neurons or set it to zero and reduce computation in the training process.

In [0]:
# keras.layers.Dropout(rate, noise_shape=None, seed=None)

Add the dropout layer

In [0]:
# model = Sequential()
# model.add(Dense(5, input_dim=10, activation='sigmoid'))
# model.add(Dropout(rate=0.1, seed=100))
# model.add(Dense(1, activation='sigmoid'))

## Other Important Layers:


- Layerr for image feature extraction: Convolutional Layer
- Layer for NLP: Recurrent neural network (RNN)
- Embedding layer
- Pooling layer
- Merge layer
- Normalization layer



## The Loss Function

The error that the machine learns to minimize.

Some loss function for regression:

In [0]:
# Mean Squared Error
# keras.losses.mean_squared_error(y_actual, y_pred)

# Mean Absolute Error
# keras.losses.mean_absolute_error(y_actual, y_pred)

# Mean Average Percentage Error
# keras.losses.mean_average_percentage_error

# Mean Square Logarithmic Error
# keras.losses.mean_square_logarithmic_error

Some loss function for classification:

In [0]:
# Binary cross-entropy: Defines the loss when the
# categorical outcomes is a binary variable, that is, with
# two possible outcomes: (Pass/Fail) or (Yes/No)
# keras.losses.cross_binaryentropy(y_actual, y_pred)

# Categorical cross-entropy: Defines the loss when the
# categorical outcomes is a nonbinary, that is, >2 possible
# outcomes: (Yes/No/Maybe) or (Type 1/ Type 2/… Type n)
# keras.losses.categorical_crossentropy(y_actual, y_pred)

## Optimizers

- Giving feedback to the model.
- Backpropagation is an optimizer algorithm.
- Random weights at the beginning. 
- The weights that determined the
influence of a neuron on the next neuron or the final output are updated
during the learning process by the network.
- The computation of one training sample from the input layer
to the output layer is called a pass.
- A batch is a collection
of training samples from the entire input.
- The computing of all training samples provided in the input data
with batch-by-batch weight updates is called an epoch.

### Stochastic Gradient Descent (SGD)

- Performs an iteration with each training sample (after the pass of each training sample it calculates the loss and updates the weight)
- Weights = Weights - learning rate * loss

In [0]:
# keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

- To reduce the fluctuations, reduce the number of iterations by providing mini batch. Usually in powers of 2 (4, 8, 16, ..)

### Adam

- Adaptive Moment Estimation
- COmputes an adaptive learning rate for each parameter.
- It defines momentum and variance of the gradient of the loss
and leverages a combined effect to update the weight parameters. The
momentum and variance together help smooth the learning curve and
effectively improve the learning process.
- Weights = Weights - (momentum + variance)

In [0]:
# keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999,
# epsilon=None, decay=0.0, amsgrad=False)

The parameters beta_1 and beta_2 are used in computing the momentum and variance respectively.

### Other Important Optimizers

- Adagrad
- Adadelta
- RMSProp
- Adamax
- Nadam

## Metrics

- Function used to judge the performance of the model on a different unseen dataset (validation set).
- Difference to loss function is that the results from metrics are not used in training teh model with respect to optimization.
- Used to validate the test results while reporting.


In [0]:
# Binary accuracy
# keras.metrics.binary_accuracy

# Categorical accuracy
# keras.metrics.categorical_accuracy

# Sparse categorical accuracy
# keras.metrics.sparse_categorical_accuracy

## Model Configuration

Once you have designed your network, Keras provides you with an
easy one-step model configuration process with the ‘compile’ command.
To compile a model, we need to provide three parameters: an optimization
function, a loss function, and a metric for the model to measure
performance on the validation dataset.

The following example builds a DNN with two hidden layers, with
32 and 16 neurons, respectively, with a ReLU activation function. The
final output is for a binary categorical numeric output using a sigmoid
activation. We compile the model with the Adam optimizer and define
binary cross-entropy as the loss function and “accuracy” as the metric for
validation.

In [0]:
# from keras.models import Sequential
# from keras.layers import Dense, Activation

# model = Sequential()
# model.add(Dense(32, input_dim=10, activation='relu')) 
# # first layer 32 neurons and 10 input features
# model.add(Dense(16, activation='relu')
# # second layer 16 neurons
# model.add(Dese(1, activation='sigmoid'))
# # output layer, binary classification

# model.compile(optimizer='Adam', loss='binary_crossentropy', 
#               metrics='accuracy')

## Model Training

- While training, it is always a good practice to provide a validation dataset for us to evaluate whether the model is performing as desired after each epoch.
- The performance on the validation dataset is a good cue for the overall performance.
- Common practice: 60% for training, 20% for validation, and 20% for testing.

In [0]:
# if we already have the model defined
# model.fit(X_train, y_train, batch_size=64, epoch=3, validation_data=(X_val, y_val))

We have a model being trained on a training dataset named x_train with
the actual labels in y_train. We choose a batch size of 64. Therefore, if there
were 500 training samples, the model would intake and process 64 samples
at a time in a batch before it updates the model weights. The last batch may
have < 64 training sample if unavailable. We have set the number of epochs
to three; therefore, the whole process of training 500 sample in batches of 64
will be repeated thrice. Also, we have provided the validation dataset as x_val
and y_val. At the end of each epoch, the model would use the validation data
to make predictions and compute the performance metrics as defined in the
metrics parameter of the model configuration.

In [0]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation

# generate dummy training dataset
np.random.seed(2019)
x_train = np.random.random((6000, 10))
y_train = np.random.randint(2, size=(6000, 1))

# generate dummy validation dataset
x_val = np.random.random((2000, 10))
y_val = np.random.randint(2, size=(2000, 1))

# generate dummy test dataset
x_test = np.random.random((2000, 10))
y_test = np.random.randint(2, size=(2000, 1))


# define the model architecture
model = Sequential()
model.add(Dense(64, input_dim=10, activation='relu')) # layer 1
model.add(Dense(32, activation='relu')) # layer 2
model.add(Dense(16, activation='relu')) # layer 3
model.add(Dense(8, activation='relu')) # layer 4
model.add(Dense(4, activation='relu')) # layer 5
model.add(Dense(1, activation='sigmoid')) # output layer

#configure the model
model.compile(optimizer='Adam', loss='binary_crossentropy', 
              metrics=['accuracy'])

# train the model
model.fit(x_train, y_train, batch_size=64, epochs=3, 
          validation_data=(x_val, y_val))

Using TensorFlow backend.
W0904 06:56:20.203682 140383334332288 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0904 06:56:20.243904 140383334332288 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0904 06:56:20.249902 140383334332288 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0904 06:56:20.334862 140383334332288 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0904 06:56:20.357499 14038333

Train on 6000 samples, validate on 2000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7fad50d98240>

## Model Evaluation

- Understanding how effectively your model is performing on an unseen test dataset.
- Keras provides the model object equipped with inbuilt model
evaluation and another function to predict the outcome from a test
dataset.

In [0]:
# evaluate(x=None, y=None, batch_size=None, verbose=1, 
#          sample_weight=None, steps=None)    

In [0]:
print(model.evaluate(x_test, y_test))

[0.6932605152130127, 0.494]


In [0]:
print(model.metrics_names)

['loss', 'acc']


In [0]:
# make predictions on the test dataset and print the first 10 predictions
pred = model.predict(x_test)
pred[:10]

array([[0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454],
       [0.49437454]], dtype=float32)

This output can be used to make even more refined final predictions.
A simple example is that the model would use 0.5 as the threshold for the
predictions. Therefore, any predicted value above 0.5 is classified as 1 (say,
Pass), and others as 0 (Fail).

## Putting All the Building Blocks Together

In [0]:
# download the data using keras
from keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

#Explore the data structure using basic python commands
print("Type of the Dataset:",type(y_train))
print("Shape of training data :",x_train.shape)
print("Shape of training labels :",y_train.shape)
print("Shape of testing data :",type(x_test))
print("Shape of testing labels :",y_test.shape)

Downloading data from https://s3.amazonaws.com/keras-datasets/boston_housing.npz
Type of the Dataset: <class 'numpy.ndarray'>
Shape of training data : (404, 13)
Shape of training labels : (404,)
Shape of testing data : <class 'numpy.ndarray'>
Shape of testing labels : (102,)


In [0]:
x_train[:3, :]

array([[1.23247e+00, 0.00000e+00, 8.14000e+00, 0.00000e+00, 5.38000e-01,
        6.14200e+00, 9.17000e+01, 3.97690e+00, 4.00000e+00, 3.07000e+02,
        2.10000e+01, 3.96900e+02, 1.87200e+01],
       [2.17700e-02, 8.25000e+01, 2.03000e+00, 0.00000e+00, 4.15000e-01,
        7.61000e+00, 1.57000e+01, 6.27000e+00, 2.00000e+00, 3.48000e+02,
        1.47000e+01, 3.95380e+02, 3.11000e+00],
       [4.89822e+00, 0.00000e+00, 1.81000e+01, 0.00000e+00, 6.31000e-01,
        4.97000e+00, 1.00000e+02, 1.33250e+00, 2.40000e+01, 6.66000e+02,
        2.02000e+01, 3.75520e+02, 3.26000e+00]])

In [0]:
# extract the last 100 rows from the training data to create the validation datasets
x_val = x_train[300:,]
y_val = y_train[300:,]

In [0]:
# define the model architecture
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', 
                activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))

# compile model
model.compile(loss='mean_squared_error', optimizer='adam', 
              metrics=['mean_absolute_percentage_error'])

# train the model
model.fit(x_train, y_train, batch_size=32, epochs=3,
          validation_data=(x_val, y_val))

Train on 404 samples, validate on 104 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7fad4b9625c0>

In [0]:
results = model.evaluate(x_test, y_test)

for i in range(len(model.metrics_names)):
  print(model.metrics_names[i], ' : ', results[i])

loss  :  502.7818053002451
mean_absolute_percentage_error  :  84.62569517247817


In [0]:
results

[502.7818053002451, 84.62569517247817]

In [0]:
model.metrics_names

['loss', 'mean_absolute_percentage_error']

In DL, the model updates weight after every iteration and evaluates
after every epoch. Since the updates are quite small, it usually takes a fairly
higher number of epochs for a generic model to learn appropriately. To
test the performance once again, let’s increase the number of epochs to 30
instead of 3. This would increase the computation significantly and might
take a while to execute. But since this is a fairly small dataset, training with
30 epochs should not be a problem. It should execute in ~1 min on your
system.

In [0]:
# train the model
model.fit(x_train, y_train, batch_size=32, epochs=30, 
          validation_data=(x_val,y_val))

Train on 404 samples, validate on 104 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7fad4b416390>

In [0]:
results = model.evaluate(x_test, y_test)

for i in range(len(model.metrics_names)):
  print(model.metrics_names[i], ' : ', results[i])

loss  :  67.16364034016927
mean_absolute_percentage_error  :  32.16827949823118


As discussed earlier, this gap is an indicator that the
model has overfit, or in simple terms, has overcomplicated the process of
learning.

---

## Experimentation

In [0]:
# import the tools
from keras.models import Sequential
from keras.layers import Dense, Activation

# download the dataset
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

from sklearn.model_selection import train_test_split
# make validation data
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.15, random_state=21)

# define the model
model = Sequential()
model.add(Dense(13, input_dim=13, activation='relu'))
model.add(Dense(11, activation='relu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(3, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# compile the model
model.compile(optimizer='Adam', loss='mean_squared_error', metrics=['mean_absolute_error'])

# train the model
model.fit(x_train, y_train, batch_size=50, epochs=50, validation_data=(x_val, y_val))

Train on 343 samples, validate on 61 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7fad429c4b38>

In [0]:
results = model.evaluate(x_test, y_test)
print(results)

[570.7010079178156, 22.078431858735925]
