# Reuters Multiclass Classification

This is the next section after the imdb classification notebook, where we classified two mutually exclusive classes. In this notebook we'll be exploring how to classify multiple classes.

The Reuters dataset is dataset published in 1986 with a set of short newswires with their topics. This toy dataset is good for text classification with 46 different topics some topics with more representation than others. However, each topic has at least 10 examples in the training set. This dataset comes as a part of Keras package.

---
## Importing Libraries & Loading Dataset

In [1]:
import copy
import numpy as np
from keras import models
from keras import layers
import matplotlib.pyplot as plt
from keras.datasets import reuters
from keras.utils.np_utils import to_categorical

%matplotlib inline

Using TensorFlow backend.


---
## Setting Up The Data
Following the steps taken in the imdb classification notebook, we will restrict the data to the 10,000 most frequently used words.

In [2]:
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

In [3]:
len(train_data)

8982

In [4]:
len(test_data)

2246

In [5]:
train_data[10]

[1,
 245,
 273,
 207,
 156,
 53,
 74,
 160,
 26,
 14,
 46,
 296,
 26,
 39,
 74,
 2979,
 3554,
 14,
 46,
 4689,
 4329,
 86,
 61,
 3499,
 4795,
 14,
 61,
 451,
 4329,
 17,
 12]

## Decoding Newswires Back to Text


In [12]:
word_index = reuters.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}
# Indices are offset by 3 because  first 3 indices are for padding, start of sequence, and unknown
decoded_newswire = ' '.join([reverse_word_index.get(i-3, '?') for i in train_data[0]])

## Encoding The Data

In [14]:
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

In [15]:
def to_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1.
    return results
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)

## Building The Network

In [16]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [17]:
# Compiling The Model
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['accuracy'])

In [18]:
# Setting aside a validation set
x_val = x_train[:10000]
partial_x_train = x_train[10000:]

y_val = one_hot_train_labels[:10000]
partial_y_train = one_hot_train_labels[10000:]

In [21]:
help(model.fit())

ValueError: If fitting from data tensors, you should specify the `steps_per_epoch` argument.

In [22]:
history = model.fit(partial_x_train,
                    partial_y_train,
                    batch_size=512,
                    epochs = 20,
                    validation_data=(x_val, y_val))

Train on 0 samples, validate on 8982 samples
Epoch 1/20


UnboundLocalError: local variable 'batch_index' referenced before assignment

---
## Plotting Model Loss And Accuracy

In [None]:
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

In [None]:
plt.clf()

acc = history.history['acc']
val_acc = history.history['val_acc']

plt.plot(epochs, acc, 'bo', label='Training Accuracy')
plt.plt(epochs, val_acc, 'b', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

---
## Creating A New Model From Scratch

In [None]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['accuracy'])
model.fit(partial_x_train,
         partial_y_train,
         epochs=9,
         batch_size=512,
         validation_data=(x_val, y_val))
results = model.evaluate(x_test, one_hot_test_labels)

---
## Generating Prediction From New Data

In [None]:
predictions = model.predict(x_test)

In [None]:
predictions[0].shape

In [None]:
np.sum(predictions[0])

In [None]:
np.argmax(predictions[0])

In [None]:
y_train = np.array(train_labels)
y_test = np.array(test_labels)

In [None]:
model.compile(optimizer='rmsprop',
             loss='sparse_categorical_crossentropy',
             metrics=['acc'])

---
## Creating A Model With A Bottleneck

In [None]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['accuracy'])
model.fit(partial_x_train,
         partial_y_train,
         epochs=20,
         batch_size=128,
         validation_data=(x_val, y_val))