Here we are working with a prepackaged Keras dataset of 50,000 movie reviews from the Internet Movie Database. The 50,000 reviews are split into 2 sets, one for training and one for testing, respectively.

In the imdb.load() function call, we specify num_words = 10,000 because we only want the 10,000 most frequently used words in a given movie review.

Next, we will use the vectorize_sequences function to convert lists of integers into tensors. We do this by one-hot encoding our list of data to turn it into a 10,000 dimensional vector, where every value is initialized to 0. Then, for every value in the sequence, its corresponding index in the tensor will be reassigned to 1.

In [2]:
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

## Encoding the integer sequences into a binary matrix

import numpy as np

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

# Also vectorizing the labels:
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

At this point, our data is ready to be fed into a neural network...

In [3]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

Next, in order to monitor the accuracy of the model on data it has never seen before during training, we will create a validation set by setting apart 10,000 samples.

Now, we begin the process of training the model for 20 epochs (20 iterations over all samples in the x_train and y_train tensors), in mini batches of 512 samples. 

During this process, we will simultaneously be monitoring the loss and accurary on the 10,000 samples that we set apart. This is specified in the validation_data=(x_val, y_val) argument within the model.fit() function call below.

In [None]:
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

results = model.evaluate(x_test, y_test)

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) +1)

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.clf() # clear figure

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Instructions for updating:
Use tf.cast instead.
Train on 15000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

Next, we will be building a network that classifies Reuters newswires into 46 mutually exclusive topics. This is a known as a single-label, multiclass classification problem. The Reuters dataset has 46 topics, each with at least 10 examples in the training set.

To vectorize the data, we can use the same method as in the previous example.

In [None]:
from keras.datasets import reuters

(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

To vectorize the labels, we will use a method called one-hot encoding. One-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place of the label index.

In [None]:
from keras.utils.np_utils import to_categorical

one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

Next, we will build the model. Since we are going to be learning 46 different categories this time around, we will need larger intermediate dimensional layers, so that the information doesn't bottleneck at one of the layers. We will use 64 this time. We end the network with a 46-dimensional vector because that is the form that our output will come out in.

After this, we compile the model, like in the previous example. The results variable will hold the resulting accuracy from the model's interpretion of the test data.

In [1]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

x_val = x_train[:1000]
partial_x_train = x_train[1000:]

y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

results = model.evaluate(x_test, one_hot_test_labels)

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

plt.clf()
acc = history.history['acc']
val_acc = history.history['val_acc']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.


NameError: name 'x_train' is not defined