# Keras concept

# 1. Model

# Sequential vs. Functional, which to use and when?
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.

## Example cases of functional APIs

### Multi-input and multi-output models

https://keras.io/getting-started/functional-api-guide/



In [None]:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)







Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model.

In [None]:
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM output:

In [None]:
auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# We stack a deep densely-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

This defines a model with two inputs and two outputs:

In [None]:
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different loss_weights or loss for each different output, you can use a list or a dictionary. Here we pass a single loss as the loss argument, so the same loss will be used on all outputs.

In [None]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

We can train the model by passing it lists of input arrays and target arrays:

In [None]:
model.fit([headline_data, additional_data], [labels, labels],
          epochs=50, batch_size=32)

Since our inputs and outputs are named (we passed them a "name" argument), we could also have compiled the model via:

In [None]:
model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          epochs=50, batch_size=32)

### Shared Layers
https://keras.io/getting-started/functional-api-guide/

### The concept of layer "node"
https://keras.io/getting-started/functional-api-guide/

### Inception module --> Concatenate
https://keras.io/getting-started/functional-api-guide/

### ResNet--> layers.add
https://keras.io/getting-started/functional-api-guide/

## Shared vision model
This model reuses the same image-processing module on two inputs, to classify whether two MNIST digits are the same digit or different digits.


In [None]:
from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten
from keras.models import Model

# First, define the vision modules
digit_input = Input(shape=(27, 27, 1))
x = Conv2D(64, (3, 3))(digit_input)
x = Conv2D(64, (3, 3))(x)
x = MaxPooling2D((2, 2))(x)
out = Flatten()(x)

vision_model = Model(digit_input, out)

# Then define the tell-digits-apart model
digit_a = Input(shape=(27, 27, 1))
digit_b = Input(shape=(27, 27, 1))

# The vision model will be shared, weights and all
out_a = vision_model(digit_a)
out_b = vision_model(digit_b)

concatenated = keras.layers.concatenate([out_a, out_b])
out = Dense(1, activation='sigmoid')(concatenated)

classification_model = Model([digit_a, digit_b], out)

## Visual question answering model
This model can select the correct one-word answer when asked a natural-language question about a picture.

It works by encoding the question into a vector, encoding the image into a vector, concatenating the two, and training on top a logistic regression over some vocabulary of potential answers.

In [None]:
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential

# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Flatten())

# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(224, 224, 3))
encoded_image = vision_model(image_input)

# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 word long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# Let's concatenate the question vector and the image vector:
merged = keras.layers.concatenate([encoded_question, encoded_image])

# And let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# This is our final model:
vqa_model = Model(inputs=[image_input, question_input], outputs=output)

# The next stage would be training this model on actual data.

## Video question answering model
Now that we have trained our image QA model, we can quickly turn it into a video QA model. With appropriate training, you will be able to show it a short video (e.g. 100-frame human action) and ask a natural language question about the video (e.g. "what sport is the boy playing?" -> "football").

In [None]:
from keras.layers import TimeDistributed

video_input = Input(shape=(100, 224, 224, 3))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

# How can I use Keras with datasets that don't fit in memory?
1. Read from HDD yourself and use train_on_batch
https://keras.io/models/sequential/
2. Use fit_generator
Alternatively, you can write a generator that yields batches of training data and use the method  model.fit_generator(data_generator, steps_per_epoch, epochs).
CIFAR example:
https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py

In [None]:
def generate_arrays_from_file(path):
    while True:
        with open(path) as f:
            for line in f:
                # create numpy arrays of input data
                # and labels, from each line in the file
                x1, x2, y = process_line(line)
                yield ({'input_1': x1, 'input_2': x2}, {'output': y})

model.fit_generator(generate_arrays_from_file('/my_file.txt'),
                    steps_per_epoch=10000, epochs=10)

Similarly, a generator could be standard ImageDataGenerator from Keras

# How can I interrupt training when the validation loss isn't decreasing anymore?
You can use an EarlyStopping callback:

In [None]:
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(x, y, validation_split=0.2, callbacks=[early_stopping])

# How is the validation split computed?
If you set the validation_split argument in model.fit to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed.

The same validation set is used for all epochs (within a same call to fit).

# Is the data shuffled during training?
Yes, if the shuffle argument in model.fit is set to True (which is the default), the training data will be randomly shuffled at each epoch.

Validation data is never shuffled.

# How can I record the training / validation loss / accuracy at each epoch?
The model.fit method returns an History callback, which has a history attribute containing the lists of successive losses and other metrics.

In [None]:
hist = model.fit(x, y, validation_split=0.2)
print(hist.history)

# How can I "freeze" Keras layers?
To "freeze" a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.

You can pass a trainable argument (boolean) to a layer constructor to set a layer to be non-trainable:

In [None]:
frozen_layer = Dense(32, trainable=False)

Additionally, you can set the trainable property of a layer to True or False after instantiation. For this to take effect, you will need to call compile() on your model after modifying the trainable property. Here's an example:

In [None]:
x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)

frozen_model = Model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.compile(optimizer='rmsprop', loss='mse')

layer.trainable = True
trainable_model = Model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')

frozen_model.fit(data, labels)  # this does NOT update the weights of `layer`
trainable_model.fit(data, labels)  # this updates the weights of `layer`

# How can I use stateful RNNs?
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.

When using stateful RNNs, it is therefore assumed that:

all batches have the same number of samples

If x1 and x2 are successive batches of samples, then x2[i] is the follow-up sequence to x1[i], for every i.

To use statefulness in RNNs, you need to:

- explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model. E.g.  batch_size=32 for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.

- set stateful=True in your RNN layer(s).

- specify shuffle=False when calling fit().

To reset the states accumulated:

- use model.reset_states() to reset the states of all layers in the model
- use layer.reset_states() to reset the states of a specific stateful RNN layer

In [None]:
x  # this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10

model = Sequential()
model.add(LSTM(32, input_shape=(10, 16), batch_size=32, stateful=True))
model.add(Dense(16, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# we train the network to predict the 11th timestep given the first 10:
model.train_on_batch(x[:, :10, :], np.reshape(x[:, 10, :], (32, 16)))

# the state of the network has changed. We can feed the follow-up sequences:
model.train_on_batch(x[:, 10:20, :], np.reshape(x[:, 20, :], (32, 16)))

# let's reset the states of the LSTM layer:
model.reset_states()

# another way to do it in this case:
model.layers[0].reset_states()

Note that the methods predict, fit, train_on_batch, predict_classes, etc. will all update the states of the stateful layers in a model. 

This allows you to do not only stateful training, but also stateful prediction.

# Where is the Keras configuration file stored?
The default directory where all Keras data is stored is: $HOME/.keras/
    
Note that Windows users should replace $HOME with %USERPROFILE%. In case Keras cannot create the above directory (e.g. due to permission issues), /tmp/.keras/ is used as a backup.

The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json. The default configuration file looks like this:




    
    

In [3]:

{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

{'backend': 'tensorflow',
 'epsilon': 1e-07,
 'floatx': 'float32',
 'image_data_format': 'channels_last'}

It contains the following fields:

The image data format to be used as default by image processing layers and utilities (either channels_last or  channels_first).

- The epsilon numerical fuzz factor to be used to prevent division by zero in some operations.
 
- The default float data type.

- The default backend. See the backend documentation.


Likewise, cached dataset files, such as those downloaded with get_file(), are stored by default in  $HOME/.keras/datasets/.

# How can I obtain reproducible results using Keras during development?
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample. 


The below snippet of code provides an example of how to obtain reproducible results - this is geared towards a TensorFlow backend for a Python 3 environment.

In [None]:
import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

# Rest of code follows ...

## Sequential in the constructor

https://keras.io/getting-started/sequential-model-guide/

The Sequential model is a linear stack of layers.

You can create a Sequential model by passing a list of layer instances to the constructor:


In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])

## Sequential as a stack (.add)
You can also simply add layers via the .add() method:

In [None]:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

# Keras API conventions

The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:

Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.


Some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.


If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a  batch_size argument to a layer. If you pass both batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8).

As such, the following snippets are strictly equivalent:

In [None]:
model = Sequential()
model.add(Dense(32, input_shape=(784,)))

In [None]:
model = Sequential()
model.add(Dense(32, input_dim=784))

#### Input shape

nD tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

#### Output shape

nD tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape  (batch_size, input_dim), the output would have shape (batch_size, units).

# channels_first in conv.

For many layers, specially convolutional ones, like (Flatten, Conv2D,...etc), we find an arg "data_format": 

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs.

The purpose of this argument is to preserve weight ordering when switching a model from one data format to another. channels_last corresponds to inputs with shape  (batch, ..., channels) while channels_first corresponds to inputs with shape (batch, channels, ...). 

It defaults to the image_data_format value found in your Keras config file at  ~/.keras/keras.json. If you never set it, then it will be "channels_last".

When using the TF backend, channels_last (NHWC) is faster on CPU, while channels_first (NCHW) is faster for GPU.

Quote from https://www.tensorflow.org/performance/performance_models:
"Most TensorFlow operations used by a CNN support both NHWC and NCHW data format. On GPU, NCHW is faster. But on CPU, NHWC is sometimes faster."

# Special layers

An input to Keras layer (e.g. in Functional model) must be output of another keras layer. This is similar to tensors in tensorflow graph. The idea is to have one symbolic graph.

For this reason, some simple matrix/tensor operations must be perfromed using special keras layers, like Lambda (slice), Cropping, Upsampling, ZeroPadding.

## Cropping
https://keras.io/layers/convolutional/

If we try to crop manual we get the error:
    Output tensors to a Model must be Keras tensors.

In [14]:
from keras.layers import Input, Cropping1D
from keras.losses import categorical_crossentropy
from keras.models import Model

inputs = Input(shape=(100,))
# Crop 10 from start and 10 from end
out = inputs[10:90]

model = Model(inputs=[inputs], outputs=[out])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

TypeError: Output tensors to a Model must be Keras tensors. Found: Tensor("strided_slice:0", shape=(?, 100), dtype=float32)

In [18]:
from keras.layers import Input, Cropping1D
from keras.losses import categorical_crossentropy
from keras.models import Model

inputs = Input(shape=(100,1))
# Crop 10 from start and 10 from end
out = Cropping1D(cropping=(10,10))(inputs)

model = Model(inputs=[inputs], outputs=[out])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_9 (InputLayer)         (None, 100, 1)            0         
_________________________________________________________________
cropping1d_3 (Cropping1D)    (None, 80, 1)             0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


# Reshape
Imagine the input is 1D of shape [100,], while the keras APIs expext [100,1].

We have 2 solutions:

1. Reshape

In [21]:
from keras.layers import Input, Cropping1D, Reshape
from keras.losses import categorical_crossentropy
from keras.models import Model
#import keras.backend as K

inputs = Input(shape=(100,))
main_l = Reshape(target_shape=(100,1))(inputs)
#main_l = K.expand_dims()
# Crop 10 from start and 10 from end
out = Cropping1D(cropping=(10,10))(main_l)

model = Model(inputs=[inputs], outputs=[out])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_12 (InputLayer)        (None, 100)               0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 100, 1)            0         
_________________________________________________________________
cropping1d_4 (Cropping1D)    (None, 80, 1)             0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


2. expand_dims as in numpy

If we do it with numpy we get error:

    Layer cropping1d_6 was called with an input that isn't a symbolic tensor

In [23]:
from keras.layers import Input, Cropping1D, Reshape
from keras.losses import categorical_crossentropy
from keras.models import Model
import numpy as np


inputs = Input(shape=(100,))
#main_l = Reshape(target_shape=(100,1))(inputs)
main_l = np.expand_dims(inputs, axis=-1)

# Crop 10 from start and 10 from end
out = Cropping1D(cropping=(10,10))(main_l)

model = Model(inputs=[inputs], outputs=[out])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

ValueError: Layer cropping1d_6 was called with an input that isn't a symbolic tensor. Received type: <class 'numpy.ndarray'>. Full input: [array([<tf.Tensor 'input_14:0' shape=(?, 100) dtype=float32>], dtype=object)]. All inputs to the layer should be tensors.

3. Keras backend expand_dims

## But Keras backend does not provide Keras layer output!

It's not enough to have a symbolic tensor to build Keras model, but it must be also an output of Keras layer.

The symptom of that is that we get the following error:

'Tensor' object has no attribute '_keras_history'

We can print _keras_history after each step to see the layer name

In [30]:
from keras import backend as K
from keras.layers import Input, Cropping1D, Reshape
from keras.losses import categorical_crossentropy
from keras.models import Model
import numpy as np


inputs = Input(shape=(100,))
print(inputs._keras_history)
#main_l = Reshape(target_shape=(100,1))(inputs)
main_l = K.expand_dims(inputs, axis=-1)
print(main_l._keras_history)# 'Tensor' object has no attribute '_keras_history'


# Crop 10 from start and 10 from end
out = Cropping1D(cropping=(10,10))(main_l)

model = Model(inputs=[inputs], outputs=[out])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

(<keras.engine.topology.InputLayer object at 0x0000024B4F889E80>, 0, 0)


AttributeError: 'Tensor' object has no attribute '_keras_history'

# So the best option is Reshape layer!


# But what is the use of Keras backend?

Keras is a model-level library, providing high-level building blocks for developing deep learning models. 

It does not handle itself low-level operations such as tensor products, convolutions and so on.

Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. 

Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.

## Tensor operations in backend

In [None]:
# Initializing Tensors with Random Numbers
b = K.random_uniform_variable(shape=(3, 4), low=0, high=1) # Uniform distribution
c = K.random_normal_variable(shape=(3, 4), mean=0, scale=1) # Gaussian distribution
d = K.random_normal_variable(shape=(3, 4), mean=0, scale=1)

# Tensor Arithmetic
a = b + c * K.abs(d)
c = K.dot(a, K.transpose(b))
a = K.sum(b, axis=1)
a = K.softmax(b)
a = K.concatenate([b, c], axis=-1)

# But those tensors cannot be used in Functional model, as tehy do not implement Layer of keras, adn hence do not have _keras_history

## Permute

In [None]:
model = Sequential()
model.add(Permute((2, 1), input_shape=(10, 64)))
# now: model.output_shape == (None, 64, 10)
# note: `None` is the batch dimension


## Lambda
Can be used to apply lamda expression on the layer and produce Keras layer as well.
Useful in input slicing

In [None]:
#Inputs
input_l = Input(shape=[5,1])
print(input_l._keras_history)
print(input_l.shape)

#category0 = Input(shape=[1])
#category0 = input_l[:,0]# --> This is invalid since category0 is not an output of keras layer anymore, so it doesnt have any of the keras layer attribs like _leras_history
#print(category0.shape) #--> (?,) while it should be (?,1) so that Embedding layer can be indexed
category0 = Lambda(lambda x: x[:,0])(input_l)
print(category0._keras_history)
print(category0.shape)
category1 = Lambda(lambda x: x[:,1])(input_l)
print(category1.shape)
category2 = Lambda(lambda x: x[:,2])(input_l)
category3 = Lambda(lambda x: x[:,3])(input_l)
category4 = Lambda(lambda x: x[:,4])(input_l)

#Embeddings layers
max_feature_nominal_value = 3
embedding_vector_size = 50
emb_category0 = Embedding(max_feature_nominal_value, embedding_vector_size)(category0)
print(emb_category0.shape)
emb_category1 = Embedding(max_feature_nominal_value, embedding_vector_size)(category1)
emb_category2 = Embedding(max_feature_nominal_value, embedding_vector_size)(category2)
emb_category3 = Embedding(max_feature_nominal_value, embedding_vector_size)(category3)
emb_category4 = Embedding(max_feature_nominal_value, embedding_vector_size)(category4)

## Masking
keras.layers.Masking(mask_value=0.0)

Useful to skip some sequences from time steps when the value = mask_value.
Skip means not to run feed fwd at all, not feeding 0's. 

Example

Consider a Numpy data array x of shape (samples, timesteps, features), to be fed to an LSTM layer. You want to mask timestep #3 and #5 because you lack data for these timesteps. You can:

- set x[:, 3, :] = 0. and x[:, 5, :] = 0.
- insert a Masking layer with mask_value=0. before the LSTM layer:

In [None]:
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(32))

## Upsampling
https://keras.io/layers/convolutional/

## ZeroPadding
https://keras.io/layers/convolutional/

# Convlution layers

# Conv1D
1D conv also called temporal convolution. 

## Input shape

3D tensor with shape: (batch_size, steps, input_dim) --> e.g. (batch_sz, num_words_per_sent, emb_sz)

## Output shape

3D tensor with shape: (batch_size, new_steps, filters).

steps value might have changed due to padding or strides.

# Conv1D vs. Conv2D 

Conv2D is spatial while Conv1D is temporal conv.

## Input shape

4D tensor with shape: (samples, channels, rows, cols) if data_format is "channels_first" or 4D tensor with shape: (samples, rows, cols, channels) if data_format is "channels_last".

## Output shape

4D tensor with shape: (samples, filters, new_rows, new_cols) if data_format is "channels_first" or 4D tensor with shape: (samples, new_rows, new_cols, filters) if data_format is "channels_last". rows and  cols values might have changed due to padding.

# Separable convolution
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

## SeparableConv1D
## SeparableConv2D

Separable convolutions consist in first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes together the resulting output channels. The  depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step.

### Size reduction
For NC_inHW and C_out (where C_in is the #of input channels and C_out is the #of output channels:

Conv2D params = C_inxC_outxHxW

SeparableConv2D params = C_inxHxW + C_outx1x1 (1x1Conv)= C_inxHxW + C_out

# Conv2DTranspose (NOT Deconvolution)
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different. A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.

Example:
An image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.

If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.

A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.
As you can imagine now, this step will not reverse the process from above. At least not concerning the numeric values.
It merely reconstructs the spatial resolution from before and performs a convolution.


## Pure Deconv operation
https://distill.pub/2016/deconv-checkerboard/

In [12]:
from keras.layers import Input, Conv2D, Conv2DTranspose, Activation, concatenate
#from keras.activations import Activation
from keras.losses import categorical_crossentropy
from keras.models import Model

inputs = Input(shape=(64, 64, 3))

conv_1 = Conv2D(1, (3, 3), strides=(1, 1), padding='same')(inputs)
act_1 = Activation('relu')(conv_1)


conv_2 = Conv2D(64, (3, 3), strides=(1, 1), padding='same')(act_1)
act_2 = Activation('relu')(conv_2)

deconv_1 = Conv2DTranspose(64, (3, 3), strides=(1, 1), padding='same')(act_2)
act_3 = Activation('relu')(deconv_1)

merge_1 = concatenate([act_3, act_1], axis=3)

deconv_2 = Conv2DTranspose(1, (3, 3), strides=(1, 1), padding='same')(merge_1)
act_4 = Activation('relu')(deconv_2)

model = Model(inputs=[inputs], outputs=[act_4])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_4 (InputLayer)             (None, 64, 64, 3)     0                                            
____________________________________________________________________________________________________
conv2d_7 (Conv2D)                (None, 64, 64, 1)     28          input_4[0][0]                    
____________________________________________________________________________________________________
activation_12 (Activation)       (None, 64, 64, 1)     0           conv2d_7[0][0]                   
____________________________________________________________________________________________________
conv2d_8 (Conv2D)                (None, 64, 64, 64)    640         activation_12[0][0]              
___________________________________________________________________________________________

# Sequential vs. Functional model revisited

Two differences we can see:
    1. model.output_shape at each step
    2. special layers like concat

Let's debug the output shape. For this we need to switch to sequential model, o.w. we need to build model at each step.

In [43]:
from keras.layers import Input, Conv2D, Conv2DTranspose, Activation, concatenate
#from keras.activations import Activation
from keras.losses import categorical_crossentropy
from keras.models import Sequential


#inputs = Input(shape=(64, 64, 3))
model = Sequential()
model.add(Conv2D(1, (3, 3), strides=(1, 1), padding='same', input_shape=(64, 64, 3)))
model.add(Activation('relu'))
print(model.output_shape)

model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same'))
model.add(Activation('relu'))
print(model.output_shape)
model.add(Conv2DTranspose(64, (3, 3), strides=(1, 1), padding='same'))
model.add(Activation('relu'))
print(model.output_shape)

# merge_1 = concatenate([act_3, act_1], axis=3) Now we cannot do it in sequential

model.add(Conv2DTranspose(1, (3, 3), strides=(1, 1), padding='same'))
model.add(Activation('relu'))
print(model.output_shape)



model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

(None, 64, 64, 1)
(None, 64, 64, 64)
(None, 64, 64, 64)
(None, 64, 64, 1)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_17 (Conv2D)           (None, 64, 64, 1)         28        
_________________________________________________________________
activation_46 (Activation)   (None, 64, 64, 1)         0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 64, 64, 64)        640       
_________________________________________________________________
activation_47 (Activation)   (None, 64, 64, 64)        0         
_________________________________________________________________
conv2d_transpose_22 (Conv2DT (None, 64, 64, 64)        36928     
_________________________________________________________________
activation_48 (Activation)   (None, 64, 64, 64)        0         
_________________________________________________________________
co

## SeparableConv revisited
See the effect of params reduction in case of SeparableConv2D

In [34]:
from keras.layers import Input, Conv2D, Conv2DTranspose, Activation, concatenate, SeparableConv2D
#from keras.activations import Activation
from keras.losses import categorical_crossentropy
from keras.models import Model

inputs = Input(shape=(64, 64, 3))

conv_1 = SeparableConv2D(1, (3, 3), strides=(1, 1), padding='same')(inputs)
act_1 = Activation('relu')(conv_1)
print(model.output_shape)

conv_2 = SeparableConv2D(64, (3, 3), strides=(1, 1), padding='same')(act_1)
act_2 = Activation('relu')(conv_2)
print(model.output_shape)

deconv_1 = Conv2DTranspose(64, (3, 3), strides=(1, 1), padding='same')(act_2)
act_3 = Activation('relu')(deconv_1)

merge_1 = concatenate([act_3, act_1], axis=3)

deconv_2 = Conv2DTranspose(1, (3, 3), strides=(1, 1), padding='same')(merge_1)
act_4 = Activation('relu')(deconv_2)

model = Model(inputs=[inputs], outputs=[act_4])

model.compile(optimizer='adadelta', loss=categorical_crossentropy, metrics=[categorical_crossentropy])

model.summary()

(None, 64, 64, 1)


AttributeError: 'Tensor' object has no attribute 'output_shape'

## Upsampling2D vs Conv2DTranspose
Upsampling2D: Repeats the rows and columns of the data by size[0] and size[1] respectively.

Conv2DTranspose: Mulitplies a kernel by a padded input version to achieve a desired output shape.

# MaxPooling vs. GlobalMaxPooling and AveragePooling vs. GlobalAveragePooling

For example, if the input of the max pooling layer is 0,1,2,2,5,1,2, global max pooling outputs 5, whereas ordinary max pooling layer with pool size equals to 3 outputs 2,2,5,5,5 (assuming stride=1).

That's why MaxPooling1D takes a pool_length argument, whereas GlobalMaxPooling1D does not.


# Locally connected layers
The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.

Higher number of params vs. much more model capacity (and risk of overfitting)

# 2. Optimization (compile)
Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or adagrad), or an instance of the  Optimizer class. See: optimizers.

A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: losses.

A list of metrics. For any classification problem you will want to set this to metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function.

In [None]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

# 3. Data and Training
Keras models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the  fit function. Read its documentation here.

In [None]:
# For a single-input model with 10 classes (categorical classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

# Model.summary()

# Embedding layer

# Stateful LSTM

A stateful recurrent model is one for which the internal states (memories) obtained after processing a batch of samples are reused as initial states for the samples of the next batch. This allows to process longer sequences while keeping computational complexity manageable.

### Note that we have to provide the full batch_input_shape since the network is stateful.

In [None]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))

## TF Backend

# Why all tensors must come from Keras layer?
(Mercari, slicing Lamda)

# Keras for ML

# Categrial input handling

# Classification targets handling

# Keras integration with tf (tf.contrib.keras)

# Keras from TF (tf.keras)

# Keras callbacks

## EarlyStopping

## Save best model

## TensorBoard

# Sequence handling with Keras

## Variable sequence length RNN

## Multi stream architectures and input slicing (Lambda layers)

# Keras pipeline (from_directory and from_csv)

### Data augmentation through Keras

# Pre-trained models and Transfer learning (Dogs vs. Cats)

## Advanced: Adding custom layers to Keras