# Advanced Theano in Keras

To illustrate how Keras can allow us to build advanced models in Theano concisely, lets build both a fully-connected as well as a deep convolutional neural network to classify hand-written digits into their classes. 

Let's first start by building a fully-connected network:

In [1]:
from keras.models import Model
from keras.layers import Input, Flatten, Dense

# define model architecture
input_ = Input(shape=(28, 28))
flat_input = Flatten()(input_)
hidden = Dense(100, activation='relu')(flat_input)
output = Dense(10, activation='softmax')(hidden)

# build model and compile
model = Model(input=[input_], output=[output])
model.compile(
    optimizer='sgd',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)
model.summary()

Using Theano backend.


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 28, 28)        0                                            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 784)           0           input_1[0][0]                    
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 100)           78500       flatten_1[0][0]                  
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 10)            1010        dense_1[0][0]                    
Total params: 79510
_______________________________________________________________________

Now lets load the data and do some preprocessing to it. 

In [2]:
import numpy as np
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical


def preprocess_data(x, y):
    
    # conver to float32, normalize, and one-hot 
    x = np.asarray(x, dtype='float32') / 256.
    y = to_categorical(y, nb_classes=10) 
    
    return x, y


# load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# preprocess the data
x_train, y_train = preprocess_data(x_train, y_train)
x_test, y_test = preprocess_data(x_test, y_test)

Next we can pass this data to our model so that it can fit to it. 

In [3]:
# fit model to the data
hist = model.fit(
    x=x_train, y=y_train, batch_size=32,
    nb_epoch=10, verbose=1, validation_split=0.3
)

Train on 42000 samples, validate on 18000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [4]:
import json

# save model history after fitting          
with open('model_history.txt', 'w') as save_file:
    json.dump(hist.history, save_file)
save_file.close()

# save model architecture, weights, and config
model.save('my_model.h5')

We can see that with a very simple model, we get very reasonable results on MNIST classificaiton. 

As a comparison and to introduce some more complex layers, we can also build a deep convolutional neural network with relative ease. 

In [5]:
from keras.layers import Convolution2D, MaxPooling2D, BatchNormalization, Activation, GlobalAveragePooling2D

# define model architecture
input_ = Input(shape=(1, 28, 28))

net = Convolution2D(32, 5, 5, activation='linear')(input_)
net = BatchNormalization(mode=0, axis=1)(net)
net = Activation('relu')(net)
net = MaxPooling2D(pool_size=(2, 2))(net)

net = Convolution2D(64, 3, 3, activation='linear')(net)
net = BatchNormalization(mode=0, axis=1)(net)
net = Activation('relu')(net)
net = MaxPooling2D(pool_size=(2, 2))(net)

net = Convolution2D(10, 1, 1, activation='linear')(net)
net = BatchNormalization(mode=0, axis=-1)(net)
net = Activation('relu')(net)

net = GlobalAveragePooling2D()(net)
output = Activation('softmax')(net)


# build model and compile
model = Model(input=[input_], output=[output])
model.compile(
    optimizer='nadam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_2 (InputLayer)             (None, 1, 28, 28)     0                                            
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 32, 24, 24)    832         input_2[0][0]                    
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 32, 24, 24)    64          convolution2d_1[0][0]            
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 32, 24, 24)    0           batchnormalization_1[0][0]       
___________________________________________________________________________________________

Now lets fit this model to the data and compare it's performance to the fully-connected model. 

In [6]:
# expand channel dimension of data
x_train = np.expand_dims(x_train, axis=1)
x_test = np.expand_dims(x_test, axis=1)

# fit model to the data
model.fit(
    x=x_train, y=y_train, batch_size=32,
    nb_epoch=5, verbose=1, validation_split=0.3
)

Train on 42000 samples, validate on 18000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1122cd8d0>