##  Keras tutorial

[Keras](https://keras.io/)

[Original repo](https://github.com/fchollet/keras)

[Fork with MXNet support](https://github.com/dmlc/keras)

[FAQ](https://keras.io/getting-started/faq/)

### Installation

#### Backend

- [TensorFlow installation instructions](https://www.tensorflow.org/install/)
- Also now Keras is a part of TensorFlow as [tf.keras module](https://www.tensorflow.org/api_docs/python/tf/keras)
- [Theano installation instructions](http://deeplearning.net/software/theano/install.html#install)
- [CNTK installation instructions](https://docs.microsoft.com/en-us/cognitive-toolkit/setup-cntk-on-your-machine)
- [Keras with MXNet instructions](https://github.com/dmlc/keras/wiki/Installation)

#### Dependencies

- cuDNN (recommended if you plan on running Keras on GPU).
- HDF5 and h5py (required if you plan on saving Keras models to disk).
- graphviz and pydot (used by visualization utilities to plot model graphs).


#### Install Keras from PyPI (recommended):

``` bash
pip install keras
```

#### Alternatively: install Keras from the Github source:

``` bash
git clone https://github.com/fchollet/keras.git
cd keras
sudo python setup.py install
```

### Config

Keras directory:

```
$HOME/.keras/                # Linux
%USERPROFILE%/.keras/        # Windows
$HOME/.keras/keras.json      # configuration file
```

Default configuration file:

```
{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}
```

### Layers

1. Core layers
    - Dense
    - Activation|
    - Dropout
    - Flatten
    - ...
2. Convolutional layers
    - Conv2D
    - Upsampling2D
    - ZeroPadding2D
    - Cropping2D
    - ...
3. Pooling layers
    - MaxPooling2D
    - AveragePooling2D
    - Global***
    - ...
4. Recurrent Layers
    - RNN
    - LSTM
    - GRU
    - ...
5. Other
    - Embedding layers
    - Merge layers
    - Advanced activations layers
    - Normalization layers
    - Noise layers
6. Custom layers: https://keras.io/layers/writing-your-own-keras-layers/

### Activations

1. Core
    - softmax
    - tanh
    - sigmoid
    - relu
    - selu
    - linear
2. Advanced activations layers
3. Other

In [None]:
from keras.layers import Activation, Dense

# 1. Activation layer
model.add(Dense(64))
model.add(Activation('tanh'))

# 2. Dense (or other) layer param
model.add(Dense(64, activation='tanh'))

# 3. Backend activation
from keras import backend as K

model.add(Dense(64, activation=K.tanh))
model.add(Activation(K.tanh))

### Sequential API

Easy, but Sequential model is alsways a linear stack of layers

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])

model = Sequential()
model.add(Dense(32, input_shape=(784,), activation='relu'))
model.add(Dense(10, activation='softmax'))

### Functional API

- More flexible
- Non-sequential models
    - U-Net
    - ResNet
    - Multi-input and multi-output models
    - Shared layers
    - More examples: https://keras.io/getting-started/functional-api-guide/


- A layer instance is callable (on a tensor), and it returns a tensor
- All models are callable, just like layers
- Input tensor(s) and output tensor(s) can then be used to define a Model
- Such a model can be trained just like Keras Sequential models.

In [None]:
from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)

In [None]:
from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

### Scikit-learn API

**Not recommended**

There are two wrappers available:

- keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params)
- keras.wrappers.scikit_learn.KerasRegressor(build_fn=None, **sk_params)


Arguments

- build_fn: callable function or class instance
- sk_params: model parameters & fitting parameters

More info: https://keras.io/scikit-learn-api/

### Compilation

Before training a model, we need to configure the learning process, which is done via the *compile()* method.

#### compile() arguments:

- optimizer
    - the string identifier of an existing optimizer (such as rmsprop or adagrad)
    - or an instance of the  Optimizer class
- loss
    - This is the objective that the model will try to minimize
    - the string identifier of an existing loss function (such as categorical_crossentropy or mse)
    - or it can be an objective function
- metrics 
    - List of metrics
    - Each metric is:
        - the string identifier of an existing metric
        - or a custom metric function

In [None]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

In [None]:
# Custom metric and loss

def dice_loss(y_true, y_pred):
    smooth = 1.
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def bce_dice_loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) + (1 - dice_loss(y_true, y_pred))

model.compile(optimizer=Adam(1e-3), loss=bce_dice_loss, metrics=[dice_loss])

### Training

Model class has 3 versions of fit (and evaluate/predict) methods:

- **fit** - trains the model for a fixed number of epochs (iterations on a dataset)
- **fit_generator** - fits the model on data yielded batch-by-batch by a Python generator
    - The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time data augmentation on images on CPU in parallel to training your model on GPU.
    - The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of every input per epoch when using use_multiprocessing=True.
- **train_on_batch** - runs a single gradient update on a single batch of data.

More info: https://keras.io/models/model/

#### fit() examples

In [None]:
# For a single-input model with 2 classes (binary classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

In [None]:
# For a single-input model with 10 classes (categorical classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

#### fit_generator() example

In [None]:
def train_generator(ids_train_split, batch_size):
    while True:
        for start in range(0, len(ids_train_split), batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, len(ids_train_split))
            ids_train_batch = ids_train_split[start:end]
            for id in ids_train_batch.values:
                img = cv2.imread(os.path.join(TRAIN_DIR, '{}.jpg'.format(id)))
                mask = scm.imread(os.path.join(MASK_DIR, '{}_mask.gif'.format(id)), mode='L')
                x_batch.append(img)
                y_batch.append(mask)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = np.array(y_batch, np.float32) / 255
            yield x_batch, y_batch

def val_generator(ids_val_split, batch_size):
    while True:
        for start in range(0, len(ids_val_split), batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, len(ids_val_split))
            ids_valid_batch = ids_val_split[start:end]
            for id in ids_valid_batch.values:
                img = cv2.imread(os.path.join(TRAIN_DIR, '{}.jpg'.format(id)))
                mask = scm.imread(os.path.join(MASK_DIR, '{}_mask.gif'.format(id)), mode='L')
                x_batch.append(img)
                y_batch.append(mask)
            x_batch = np.array(x_batch, np.float32) / 255
            y_batch = np.array(y_batch, np.float32) / 255
            yield x_batch, y_batch

model.fit_generator(generator=train_generator(ids_train_split, batch_size),
        steps_per_epoch=np.ceil(float(len(ids_train_split)) / float(batch_size)),
        validation_data=val_generator(ids_val_split, batch_size),
        validation_steps=np.ceil(float(len(ids_val_split)) / float(batch_size)))

### Preprocessing

#### Images

https://keras.io/preprocessing/image/

In [None]:
keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
    samplewise_center=False,
    featurewise_std_normalization=False,
    samplewise_std_normalization=False,
    zca_whitening=False,
    zca_epsilon=1e-6,
    rotation_range=0.,
    width_shift_range=0.,
    height_shift_range=0.,
    shear_range=0.,
    zoom_range=0.,
    channel_shift_range=0.,
    fill_mode='nearest',
    cval=0.,
    horizontal_flip=False,
    vertical_flip=False,
    rescale=None,
    preprocessing_function=None,
    data_format=K.image_data_format())

[imgaug](https://github.com/aleju/imgaug) - cool image augmentation library

#### Text

https://keras.io/preprocessing/text/

In [None]:
keras.preprocessing.text.text_to_word_sequence(text,
                                               filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                                               lower=True,
                                               split=" ")

keras.preprocessing.text.one_hot(text,
                                 n,
                                 filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                                 lower=True,
                                 split=" ")

keras.preprocessing.text.hashing_trick(text, 
                                       n,
                                       hash_function=None,
                                       filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                                       lower=True,
                                       split=' ')

keras.preprocessing.text.Tokenizer(num_words=None,
                                   filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                                   lower=True,
                                   split=" ",
                                   char_level=False)

#### Sequences

https://keras.io/preprocessing/sequence/

In [None]:
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32',
    padding='pre', truncating='pre', value=0.)
    
keras.preprocessing.sequence.skipgrams(sequence, vocabulary_size,
    window_size=4, negative_samples=1., shuffle=True,
    categorical=False, sampling_table=None)
    
keras.preprocessing.sequence.make_sampling_table(size, sampling_factor=1e-5)

### Save and load models

#### Saving/loading whole models (architecture + weights + optimizer state)

pickle or cPickle is not recommended

In [None]:
from keras.models import load_model

# HDF5 file contains:
#     - the architecture of the model, allowing to re-create the model
#     - the weights of the model
#     - the training configuration (loss, optimizer)
#     - the state of the optimizer, allowing to resume training exactly where you left off.
model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'

# returns a compiled model identical to the previous one
# load_model will also take care of compiling the model using the saved training configuration
# (unless the model was never compiled in the first place)
model = load_model('my_model.h5')

#### Saving/loading only a model's architecture

In [None]:
# save as JSON
json_string = model.to_json()

# save as YAML
yaml_string = model.to_yaml()

# The generated JSON / YAML files are human-readable and can be manually edited if needed.


# model reconstruction from JSON:
from keras.models import model_from_json
model = model_from_json(json_string)

# model reconstruction from YAML
from keras.models import model_from_yaml
model = model_from_yaml(yaml_string)

#### Saving/loading only a model's weights

In [None]:
model.save_weights('my_model_weights.h5')

# Assuming you have code for instantiating your model,
# you can then load the weights you saved into a model with the same architecture:
model.load_weights('my_model_weights.h5')

# If you need to load weights into a different architecture (with some layers in common),
# for instance for fine-tuning or transfer-learning, you can load weights by layer name:
# # #
# Assuming the original model looks like this:
#     model = Sequential()
#     model.add(Dense(2, input_dim=3, name='dense_1'))
#     model.add(Dense(3, name='dense_2'))
#     ...
#     model.save_weights(fname)
# # #

# new model
model = Sequential()
model.add(Dense(2, input_dim=3, name='dense_1'))  # will be loaded
model.add(Dense(10, name='new_dense'))  # will not be loaded

# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname, by_name=True)

#### Handling custom layers (or other custom objects) in saved models

In [None]:
from keras.models import load_model
# Assuming your model includes instance of an "AttentionLayer" class
model = load_model('my_model.h5', custom_objects={'AttentionLayer': AttentionLayer, 'dice_loss': dice_loss})

# Alternatively, you can use a custom object scope:
from keras.utils import CustomObjectScope

with CustomObjectScope({'AttentionLayer': AttentionLayer}):
    model = load_model('my_model.h5')

# Custom objects handling works the same way for load_model, model_from_json, model_from_yaml:
from keras.models import model_from_json
model = model_from_json(json_string, custom_objects={'AttentionLayer': AttentionLayer})

### Callbacks

In [None]:
from keras.callbacks import ModelCheckpoint

model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

'''
saves the model weights after each epoch if the validation loss decreased
'''

callbacks = [
    EarlyStopping(monitor='val_loss',
                        patience=5,
                        verbose=1,
                        min_delta=1e-6,
                        mode='max'),
    ReduceLROnPlateau(monitor='val_loss',
                        factor=0.2,
                        patience=5,
                        min_lr=0.001),
    ModelCheckpoint(filepath='/tmp/weights.hdf5',
                        verbose=1,
                        save_best_only=True,
                        save_only_weights=True),
    TensorBoard(log_dir='/tmp/')
]

model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0,
          validation_data=(X_test, Y_test), callbacks=callbacks)

### [Model ZOO](https://keras.io/applications/)

- Xception (only TF)
- VGG16
- VGG19
- ResNet50
- InceptionV3
- InceptionResNetV2
- MobileNet (only TF)

More models: just google "[framework name] [model name]" e.g. "keras densenet"

In [None]:
from keras.applications.xception import Xception
from keras.applications.vgg16 import VGG16
from keras.applications.vgg19 import VGG19
from keras.applications.resnet50 import ResNet50
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.mobilenet import MobileNet

model = VGG16(weights='imagenet', include_top=True)

### Fine-tuning

#### Freeze layers

To "freeze" a layer = to exclude it from training = its weights will never be updated.

In [None]:
# 1. pass a trainable argument (boolean) to a layer constructor:

frozen_layer = Dense(32, trainable=False)

# 2. set the trainable property of a layer after instantiation
# need to call compile() after modifying the trainable property

x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)

frozen_model = Model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.compile(optimizer='rmsprop', loss='mse')

layer.trainable = True
trainable_model = Model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')

frozen_model.fit(data, labels)  # this does NOT update the weights of `layer`
trainable_model.fit(data, labels)  # this updates the weights of `layer`

#### Fine-tune InceptionV3 on a new set of classes

In [None]:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

In [None]:
# train the model on the new data for a few epochs
model.fit_generator(...)

# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.

In [None]:
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
    layer.trainable = False
for layer in model.layers[249:]:
    layer.trainable = True

# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')

In [None]:
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)