## Multi-class classification: Reuters newswire

This example is based on the [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python-second-edition) book.

We will build a network to classify Reuters newswires into 46 different mutually-exclusive topics. This is an instance of *multi-class classification*, and since each data point should be classified into only one category, the problem is more specifically an instance of *single-label, multi-class classification*. 


In [None]:
# importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from keras.datasets import reuters
from keras import models
from keras import layers

### Loading the dataset

We will be working with the [Keras' Reuters dataset](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/reuters). It consists of 11,228 newswires labeled over 46 topics, originally generated by parsing and preprocessing the [classic Reuters-21578 dataset](https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) used for text categorization.

There are 46 different topics; some topics are more represented than others, but each topic has at least 10 examples in the training set.


In [None]:
# loading training and testing data with the top-10,000 most frequent words
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)


We have 8,982 training examples and 2,246 test examples:

In [None]:
print(f'Training data instances: %d\nTesting data instances: %d' % (len(train_data), len(test_data)))

### Data exploration and pre-processing

As with the IMDB reviews, each newswire is encoded as a list of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer `3` encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: *only consider the top 10,000 most common words, but eliminate the top 20 most common words*.

As a convention, `0`[link text](https://) does not stand for a specific word, but instead is used to encode any unknown word.

In [None]:
train_data[10]

In [None]:
train_data.shape

The label associated with an example is an integer between 0 and 45: a topic index.

As reference, [here is a list of classes and labels](https://martin-thoma.com/nlp-reuters/#classes-and-labels), but not necessarily mapping the keras Reuters dataset.

`topic 3`, for instance, is `money-fx`.

In [None]:
train_labels[10]

Decoding some words (for exploratory analysis).

In [None]:
# word_index is a dictionary mapping words to an integer index
word_index = reuters.get_word_index()
# We reverse it, mapping integer indices to words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
# We decode the review; note that our indices were offset by 3
# because 0, 1 and 2 are reserved indices for "padding", "start of sequence", and "unknown".
decoded_newswire = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[10]])

In [None]:
decoded_newswire

**Vectorizing the data**.

We can vectorize the training data.

In [None]:
# custom function for data vectorization
def vectorize_sequences(sequences, dimension=10000):
    # create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.  # set specific indices of results[i] to 1s
    return results

# vectorized training data
x_train = vectorize_sequences(train_data)
# vectorized testing data
x_test = vectorize_sequences(test_data)

In [None]:
x_train[0]

To vectorize the labels, there are two possibilities: we could just cast the label list as an integer tensor, or we could use a **one-hot encoding**, which is widely used format for categorical data (also called *categorical encoding*). 

In our case, one-hot encoding of our labels consists in embedding each label as an all-zero vector with a 1 in the place of the label index.

In [None]:
def to_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1.
    return results

# vectorized training labels
one_hot_train_labels = to_one_hot(train_labels)
# vectorized testing labels
one_hot_test_labels = to_one_hot(test_labels)

In [None]:
# alternative approach to previous cell
#from keras.utils.np_utils import to_categorical
#one_hot_train_labels = to_categorical(train_labels)
#one_hot_test_labels = to_categorical(test_labels)

In [None]:
one_hot_train_labels[0]

**Validation data**

We will set apart 1,000 samples from the training data to be used as *validation data*.

In [None]:
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

### Model definition

When defining a model, two important aspects are related to **the number of layers to use** and **the dimensionality of each layer**.

Remember that our model needs to generate outputs for all the 46 topics (classes); i.e., it needs to learn how to separate 46 different classes. This demands large layers - in our case, we will be using 64 hidden units.

In [None]:
# Sequential model
model1 = models.Sequential()
model1.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model1.add(layers.Dense(64, activation='relu'))
# last layer with softmax activation function for generating a probability distribution over the 46 classes
model1.add(layers.Dense(46, activation='softmax'))

model1.summary()

**Model hyperparameters**



In [None]:
model1.compile(optimizer='rmsprop',
              # for multi-class classification problems 
              # (with labels provided in a one-hot representation)
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

### Training the model

We will be training the model for 20 epochs and using a batch of 512 samples from the `partial_x/y_train` datasets. Then, we will validate the model over the `x/y_val` dataset.

In [None]:
# training the model and keep track (history) of loss and accuracy
history = model1.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

### Visualising the results

In [None]:
# checking which keys we have in the `history` dictionary
history_dict = history.history
history_dict.keys()

In [None]:
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
plt.clf()   # clear figure

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

**What we can draw from the observed loss and accuracy?**

Do you notice any overfitting in this model? If yes, from what epoch?

---

**GROUP DISCUSSION**

Play with different hyperparameters to improve the model performance. You can change: *number of epochs*, *optimizer*, and *number of hidden units**.

A good starting point is, based on the previous results and in case you noticed overfitting, change the number of epochs to some number before the overfitting.

In [None]:
# second model
model2 = models.Sequential()
model2.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model2.add(layers.Dense(64, activation='relu'))
model2.add(layers.Dense(46, activation='softmax'))

# replace @OPT by any available optimizer
# https://keras.io/api/optimizers/
model2.compile(optimizer='@OPT',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# replace @EPO by the number of epochs you want to try
history2 = model2.fit(partial_x_train,
                     partial_y_train,
                     epochs=@EPO,
                     batch_size=512,
                     validation_data=(x_val, y_val))

# evaluate the model and get the results
results = model2.evaluate(x_test, one_hot_test_labels)

In [None]:
loss = history2.history['loss']
val_loss = history2.history['val_loss']
epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
plt.clf()   # clear figure

acc = history2.history['accuracy']
val_acc = history2.history['val_accuracy']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
# Checking the final results (loss, accuracy)
results

In [None]:
# OPTIONAL - comparing the model with a random baseline
#import copy

#test_labels_copy = copy.copy(test_labels)
#np.random.shuffle(test_labels_copy)
#float(np.sum(np.array(test_labels) == np.array(test_labels_copy))) / len(test_labels)

### Making predictions

We can compare both models (`model1` and `model2`).

In [None]:
# asking predictions from both models
predictions1 = model1.predict(x_test)
predictions2 = model2.predict(x_test)

In [None]:
predictions1[0].shape

In [None]:
np.sum(predictions1[0])

In [None]:
np.argmax(predictions1[0])

In [None]:
predictions1[0]

In [None]:
# show the inputs and predicted outputs
for i in range(3):
	print("X=%s, Predicted=%s" % (x_test[i], predictions1[i]))

**What happens if we decide for intermediate layers with few hidden units?**

In [None]:
model3 = models.Sequential()
model3.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model3.add(layers.Dense(4, activation='relu'))
model3.add(layers.Dense(46, activation='softmax'))

model3.compile(optimizer='rmsprop',
               loss='categorical_crossentropy',
               metrics=['accuracy'])

model3.fit(partial_x_train,
           partial_y_train,
           epochs=20,
           batch_size=128,
           validation_data=(x_val, y_val))

In [None]:
results = model.evaluate(x_test, one_hot_test_labels)