## Deep Learning Framework based on theano
- [github](https://github.com/fchollet/keras)
- [website](http://keras.io/)
- [examples showing off modelling capability](http://keras.io/examples/)
- [example codes on image/text data](https://github.com/fchollet/keras/tree/master/examples)

### Philosophy
- fast prototyping with flexible and minimal configuration: Torch like interface within Python, also supports sklearn-like prediction interface, e.g., `fit`, `train_on_batch`, `evaluate`, `predict_classes`, `predict_proba`.
- run on both cpu and gpu
- support both convotlutional networks and recurrent networks
- easy to extend

### Basic Usage
Like almost all other apis, the main compoents of keras are (1) different types of layers (2) model (net) consisting of layers and a loss function, (3) optimizers and optionally some data-processing facilities e.g. for image/text/sequence data. 

### Main APIS

#### A. Data Processing - most of them are helper functions, and helper processors
- packages: 
    - `keras.preprocessing.sequence` for sequence data
    - `keras.preprocessing.text` for text data
    - `keras.preprocessing.image` for image data

#### B. Layers
- packages:
    - `keras.layers.core` for core layers
    - `keras.layers.convolutional` for convolution/pooling layers
    - `keras.layers.recurrent` for recurrent layers
    - `keras.layers.advanced_activations` as its name suggests
    - `keras.layers.normalization` for normalizations
    - `keras.layers.embeddings` for text embedding (vector representation)
    - `keras.layers.noise` for noise-adding
    - `keras.layers.containers` for ensemble/composite layers, e.g. sequentially stacked multilayers
- activation functions: activations of layers can be specified (1) either via a separate activation layer or (2) through the activation argument supported by all forward layers.Existing activations are
    - softmax: expect shape to be either (nsamples, ntimesteps, ndims) or (nsamples, ndims)
    - softplus
    - relu
    - tanh
    - sigmoid
    - hard_sigmoid
    - linear
- initialization of layer weights can be specified by `init` param in the layer construtor, out-of-box initialization include
    - uniform
    - lecun_uniform (uniform initialization scaled by sqrt of nins)
    - normal
    - identity 
    - orthogonal
    - zero
    - glorot_normal (Gaussian initialization scaled by nin+nout)
    - glorot_uniform
    - he_normal
    - he_uniform
- regularization of layer weights: they are either on layer weights and/or layer activations. These are done via three parameters to a layer. The parameters can have different regularizer instances from the `keras.regularizers` package.
    - `W_regularizer`: l1(l=0.01), l2(l=0.01), l1l2(l1=0.01, l2=0.01)
    - `b_regularizer`: l1(l=0.01), l2(l=0.01), l1l2(l1=0.01, l2=0.01)
    - `activity_regularizer`: activity_l1(l=0.01), activity_l2(l=0.01), activity_l1l2(l1=0.01, l2=0.01)
- constraints: some layers need constraints, see [doc](http://keras.io/constraints/) for details

#### C. Objective Functions
Objective functions can be specifed either by name (see below the out-of-box objective function names) or a Theano symbolic function that returns a scalar for each data point - exmaples can be found in [source code](https://github.com/fchollet/keras/blob/master/keras/objectives.py). Available functions include,
- mean_squared_error / mse
- mean_absolute_error / mae
- mean_absolute_percentage_error / mape
- mean_squared_logarithmic_error / msle
- squared_hinge: only for binary classification
- hinge: only for binary classification
- binary_crossentropy: Also known as logloss.
- categorical_crossentropy: aka softmax for multi-classification. ***It needs the labels are in one-hot-encoding, i.e., binary arrays of shape (nsamples, nclasses)***

#### D. Optimizers 
Existing optimizers and their parameters can be found in the [doc](http://keras.io/optimizers/)

#### E. Callback functors
Callback functors are subclasses of `keras.callbacks.Callback` with specific event slots such as `on_train_begin/end(logs={})`, `on_epoch_begin/end(epoch, logs={})`, `on_batch_begin/end(batch, logs={})`. The commonly used out-of-box callbacks are 
- `ModelCheckpoint(filepath, verbose = 0, save_best_only=False)`: Save the model after every epoch. If save_best_only=True, the latest best model according to the validation loss will not be overwritten.
- `EarlyStopping(monitor='val_loss', patience=0, verbose=0)`: Stop training after no improvement of the metric monitor is seen for patience epochs. The parameter of monitor is a key in the `logs` dictionary passed into event listeners.

#### F. Models
- it is the main access point for training/evaluating. 
- it assembles other components such as layers, objective functions and optimizers, e.g.,
    - add layer by `model.add`
    - set loss function and optimizer in `model.compile`
    - set callback functions in `model.fit`
- specify callback functions at different stages


In [5]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline