# Introduction to Keras
In the last week we built a neural network from scratch using nothing but raw python and the matrix library numpy. While it is a great way to understand the inner workings of neural networks, it in not very practical to always implement your own learning algorithms from scratch. In fact, much of the progress in machine learning in recent years was archived because reliable, high performance and easy to use libraries where created. For the rest of the course we will be using [Keras](https://keras.io/). Keras is a high level neural network API that works on top of other deep learning libraries. We will be using Keras in combination with Googles [TensorFlow](https://www.tensorflow.org/), a very popular deep learning library. You can imagine Keras as a front end which you as a developer use while TensorFlow handles all the maths in the background. This setup allows us to harness the high performance of TensorFlow while at the same time iterating quickly with an easy to use API.

## MNIST with Keras
Perhaps the best way to understand how Keras works is by just getting started with it. Last weeks challenge was the MNIST dataset, a collection of hand written digits. In this introduction we are going to use the same dataset to get to know Keras.

In [1]:
from keras.models import Sequential

Using TensorFlow backend.


Couldn't import dot_parser, loading of dot files will not be possible.


Keras offers two basic ways to build models, the [sequential model](https://keras.io/getting-started/sequential-model-guide/), in which layers are just stacked on top of each other and the [functional API](https://keras.io/getting-started/functional-api-guide/) that allows to create more complex structures. For most of the course we will be using the sequential model. As you also can see from the import statement, Keras is using TensorFlow as a back end. Next up we need to import some modules we use to create our network:

In [2]:
from keras.layers import Dense, Activation

We just imported the dense layer module and the activation function module. A dense layer is simply a layer in which every node is fully connected to all nodes from the previous layers. This was the case in all neural networks we have built so far but there are other possibilities, too. We will explore them later. Keras also provides a utility to directly load some common machine learning datasets.

In [14]:
from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

For onehot encoding we will continue to use SciKit Learn

In [15]:
from sklearn.preprocessing import OneHotEncoder
# Generate one hot encoding

# Reshape from array to vector
y_train = y_train.reshape(y_train.shape[0],1)
# Generate one hot encoding
enc = OneHotEncoder()
onehot = enc.fit_transform(y_train)
# Convert to numpy vector
y_train = onehot.toarray()

# Reshape from array to vector
y_test = y_test.reshape(y_test.shape[0],1)
# Generate one hot encoding
enc = OneHotEncoder()
onehot = enc.fit_transform(y_test)
# Convert to numpy vector
y_test = onehot.toarray()

We also have to reshape the input X, which is a stack of matrices in the raw data into a stack of vectors.

In [16]:
X_train = X_train.reshape(X_train.shape[0],X_train.shape[1] * X_train.shape[2])
X_test = X_test.reshape(X_test.shape[0],X_test.shape[1] * X_test.shape[2])

Now it is time to build our model! We initialize the model building process like this:

In [6]:
model = Sequential()

Now adding layers can be done with a simple ```.add()```

In [10]:
# For the first layer we have to specify the input dimensions
model.add(Dense(units=320, input_dim=784, activation='tanh'))

model.add(Dense(units=160, activation='tanh'))

model.add(Dense(units=10, activation='softmax'))


Now we have to compile the model, turning it into a [static graph TensorFlow can execute](https://stackoverflow.com/questions/46154189/what-is-the-difference-of-static-computational-graphs-in-tensorflow-and-dynamic). In the compile statement we also get to specify things like the learning rate or whether we want to use some more advanced optimizer. If we do not specify a learning rate, Keras will choose a default value for us. We can also specify which metrics we want to track.

In [11]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

Now there is only the training left to be done.

In [12]:
# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
model.fit(X_train, y_train, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1257ac550>

You will probably have noticed that this runs quite a bit faster than when we implemented our own neural network in numpy. That is because TensorFlow, which handles all the math operations is optimized for exactly these kinds of operations. Another advantage is that TensorFlow can run on a graphics processing unit (GPU). GPUs where originally invented to render computer game graphics, but it turned out that their architecture was ideal for deep learning. Much of deep learnings recent progress is owed to the fact that powerful GPUs and tools to use them for things other than graphics came on the market.

To conclude this chapter we are going to evaluate our model with Keras evaluate function. It outputs an array of all metrics that are being kept track of, in our case the loss and the accuracy.

In [18]:
model.evaluate(x=X_test,y=y_test)



[0.31037911518812178, 0.90590000000000004]

## Summary
And thus concludes our brief introduction to Keras. To get more used to its sequential model, try implementing a different model for MNIST. Good luck!