In [1]:
from keras.models import Sequential
from keras.layers import Dense, Activation


Using TensorFlow backend.


In [2]:
#Sequential is the most basic type of model - a linear stack of layers
model = Sequential()

#Layers are added to the model with .add()
model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

Gonna take a second to review what those layers are doing - this comes from https://keras.io/layers/core/:


"Dense" means it's a densely-connected NN layer.  It implements the following operation:
output = activation(dot(input, kernel) + bias)
Where 'activation' is the element-wise activation funcation, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (if use_bias is True)

'units' tell the layer what shape the output array should be in, so in the above example it will output an array of shape (\*, 64)
'input_dim' tells the model to expect an array of those dimensions as input, so in this case (\*, 100).  Note that input_dim is only used for 2D layers, otherwise you use input_shape.

You don't have to tell the model the input dimensions for future layers - it knows that based on the output dimensions of previous layers.

Also, instead of 'units=64', you can just do Dense(64)


'Activation' adds the TensorFlow or Theano operation - this specifies the mathematical function performed in each neuron.  There are a bunch of options for which function you want to use, the above uses 'relu' (Rectified Linear Unit - which is basically just the array result of [x for x in array if x >= 0] on the first layer and then 'softmax' on the second layer, which is much harder to type out on here.  Details on softmax here: https://en.wikipedia.org/wiki/Softmax_function

In [3]:
#Configures the learning process
model.compile(loss='categorical_crossentropy',
             optimizer='sgd',
             metrics=['accuracy'])

The keras documentation notes you can further configure the optimizer with something like the following:

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))

In [10]:
'''We dont have train data yet, so this will error at this point'''

#X and Y are Numpy arrays

#Process batches of training data can be done multiple ways - iterating:
model.fit(x_train, y_train, epochs=5, batch_size=32)

#Or manually
model.train_on_batch(x_batch, y_batch)

#Performance evaluation
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)

#Generate predictions on new data
classes = model.predict(x_test, batch_size=128)

You can also simply call Sequential() with the layers as parameters

In [4]:
model = Sequential([
    Dense(32, input_dim=784),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])

In [5]:
#Now we have to compile, which takes three arguments:

# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

So the above just has several examples of compilers - lets look at the parameters its taking:


optimizer - Computes adaptive learning rates - you can define this by name as we did above or modify the built in library defaults (full list of avaiable optimizers by name is at: https://keras.io/optimizers/).  Use something like the following if modifying from the library:

In [6]:
from keras import optimizers

sgd = optimizers.SGD(lr=0.01, clipvalue=0.5)

loss - just your loss function, can be called by name or pulled from a Keras library similar to above. The available loss functions are at https://keras.io/losses/  To pull from the library, use something like the following:

In [7]:
from keras import losses

model.compile(loss=losses.mean_squared_error, optimizer='sgd')

metrics - how you score results.  For classification problems this should be set to accuracy 'accuracy'

### Training

In [8]:
#2 Class classification, single input model

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f020c370490>

Interesting - the first epoch took a long time, while others were almost instant - I'm going to try that exact same thing again just to see what the performance is like

In [9]:
#2 Class classification, single input model

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f02081aaad0>

That's what I thought, TensorFlow had to activate my GPU on the first run which increased the duration, but the second run took no time at all.  So there's a small initial time investment that disappears after the first run. Cool.

Now lets try it on a problem with more classes.  

Note that in the documentation we always imported a sub-module of Keras, so we have to add a line to what's on their site.  Also, I increased the number of epochs from 10 to 400 and the we went from 15.5% accuracy to 98.6% accuracy.  This is a little confounding given that we're dealing with random numbers.

In [13]:
#What we had to add:
import keras

# For a single-input model with 10 classes (categorical classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding
binary_labels = keras.utils.to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, binary_labels, epochs=400, batch_size=32)

Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400
Epoch 20/400
Epoch 21/400
Epoch 22/400
Epoch 23/400
Epoch 24/400
Epoch 25/400
Epoch 26/400
Epoch 27/400
Epoch 28/400
Epoch 29/400
Epoch 30/400
Epoch 31/400
Epoch 32/400
Epoch 33/400
Epoch 34/400
Epoch 35/400
Epoch 36/400
Epoch 37/400
Epoch 38/400
Epoch 39/400
Epoch 40/400
Epoch 41/400
Epoch 42/400
Epoch 43/400
Epoch 44/400
Epoch 45/400
Epoch 46/400
Epoch 47/400
Epoch 48/400
Epoch 49/400
Epoch 50/400
Epoch 51/400
Epoch 52/400
Epoch 53/400
Epoch 54/400
Epoch 55/400
Epoch 56/400
Epoch 57/400
Epoch 58/400
Epoch 59/400
Epoch 60/400
Epoch 61/400
Epoch 62/400
Epoch 63/400
Epoch 64/400
Epoch 65/400
Epoch 66/400
Epoch 67/400
Epoch 68/400
Epoch 69/400
Epoch 70/400
Epoch 71/400
Epoch 72/400
Epoch 73/400
Epoch 74/400
Epoch 75/400
Epoch 76/400
Epoch 77/400
Epoch 78

<keras.callbacks.History at 0x7f020812fa90>

### Stacked LSTM

The following stacks 3 LSTM layers, the first two of which retain the temporal dimension while the final discards it on output, resulting in a single vector.

In [19]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
import time

data_dim = 16
timesteps = 8
num_classes = 10

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))

# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))

start = time.time()

model.fit(x_train, y_train,
          batch_size=64, epochs=5,
          validation_data=(x_val, y_val))

stop = time.time()
print "Completed in %s seconds" % (stop-start)

Train on 1000 samples, validate on 100 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Completed in 6.15553593636 seconds


Ok, one more - a stateful LSTM model, where the previous temporal weights are retained and passed as the initial states for the next batch.  This should reduce computational complexity.

In [18]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

start = time.time()

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))

stop = time.time()
print "Completed in %s seconds" % (stop-start)

Train on 320 samples, validate on 96 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Completed in 4.40588402748 seconds


That's it for the basic examples - let's jump into the advanced examples - that sharted in the next iPython Notebook