[Keras](https://keras.io/) is an API that provides high-level building blocks for developing machine learning models.

Keras does not implement low level operations like tensor manipulations and differentiation itself but instead delegates them to a backend engine. Several different backend engines can be plugged into Keras:

 * [TensorFlow (Google)](https://www.tensorflow.org/)
 * [Theano (MILA lab, Universite of Montreal)](http://deeplearning.net/software/theano/)
 * [Microsoft Cognitive Toolkit (CNTK)](https://github.com/Microsoft/CNTK)

Keras models can be run with any of these backends without having to change the code.

Keras is able to run seamlessly on both CPUs and GPUs.

A typical deployment stack looks like this: 
 
<img src="images/keras_stack.png" height="250" width="400"/> 

 * [CUDA](https://developer.nvidia.com/cuda-toolkit) is a parallel computing API for Nvidia devices
 * [cuDNN (Deep Neural Network library)](https://developer.nvidia.com/cudnn) is a library that provides primitives for neural networks 
 * [BLAS (Basic Linear Algebra Subprograms)](http://www.netlib.org/blas/) is a library with basic vector and matrix operations
 * [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page) is library for linear algebra
 


## Anatomy of a Keras model

A Keras model contains the following objects:

 * Layers, which are combined into a model
 * The input data and labels
 * The loss function, which defines the feedback signal used for learning
 * The optimizer, which determines how learning proceeds

<img src="images/keras_model.png" height="250" width="400"/> 

## Layers

A layer is a function that takes as input one or more tensors and that outputs one or more tensors.

Some layers are stateless, but more frequently layers have a state: the layer’s weights, one or several tensors learned with stochastic gradient descent.

Examples of stateless layers:
 * Dropout: regularization to reducing overfitting in models
 * Merge layers: concatenate, sum, mean, min, max etc.

Examples of stateful layers:
 * Dense layers
 * Recurrent layers
 * Convolution layers

Different layers are appropriate for different types of data processing:

 * Vector data, stored in 2D tensors of shape (batch_size, features), is usually processed by dense layers
 * Sequence data, stored in 3D tensors of shape (batch_size, timesteps, features), is usually processed by recurrent layers
 * Image data, stored in 4D tensors of shape (batch_size, height, width, colors), is usually processed by convolution layers

You can think of layers as LEGO bricks.

Models are built by clipping together compatible layers to form useful data-transformation pipelines.

The notion of layer compatibility here refers specifically to the fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape

A model is a directed, acyclic graph of layers. The most common instance is a linear stack of layers, mapping a single input to a single output. More complex models will have multiple inputs/outputs or short-cut connections.

For each problem class usually exist one or more standard model architectures. It is always a good idea to start with one of this models.

In general picking the right model architecture is more an art than a science.
