## Classifying newswires: a multiclass classification example

* Now, we know how to classify vector inputs into two mutually exclusive classes using a densely connected neural networks.
* Here, we will build a network to classify Reuters newswires into 46 mutually exclusive topics.
* Since we have many classes, this problem is an instance of *multi-class classification*.
* *single-label, multiclass classification* VS *multilabel, multiclass classification*

> ### The Reuters dataset

* A set of short newswires and their topics, published by Reuters in 1986

In [None]:
from tensorflow.keras.datasets import reuters

# Like IMDB, the argument num_words restricts the data to 
# the 10,000 most frequently occurring words 
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

In [None]:
len(train_data)

In [None]:
len(test_data)

In [None]:
train_data[10]

In [None]:
# decoding newswires back to text
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ' '.join([reverse_word_index.get(i-3, '?') for i in train_data[0]])

In [None]:
print(decoded_newswire)

In [None]:
train_labels[10]

In [None]:
train_labels

In [None]:
len(train_labels)

> ### Preparing the data

In [None]:
import numpy as np

def vectorize_sequences(sequences, dimension=10000):
  results = np.zeros((len(sequences), dimension))
  for i, sequence in enumerate(sequences):
    results[i, sequence] = 1.
  return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

In [None]:
x_train.shape

* To vectorize the labels, we can use one-hot encoding.
* One-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place of the label index.

In [None]:
def to_one_hot(labels, dimension=46):
  results = np.zeros((len(labels), dimension))
  for i, label in enumerate(labels):
    results[i, label] = 1.
  return results

one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)

In [None]:
train_labels[100]

In [None]:
one_hot_train_labels[100]

In [None]:
from tensorflow.keras.utils import to_categorical

one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

> ### Building the network

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

* `softmax` activation in the last layer
  * The network will ouput a *probability distribution* over the 46 classes.
  * For every input sample, the network will produce a 46-dimensional output vector, where `output[i]` is the probability that the sample belongs to class `i`.
  * The sum of `output[i]` for all `i` will be 1.
  
* `categorical_crossentropy` loss
  * It measures the distance between two probability distributions.
  * Here, between the probability distribution output by the network and the true distribution of the labels

In [None]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

> ### Validation

* Use 1,000 samples in the training data as a validation set.

In [None]:
x_val = x_train[:1000]
partial_x_train = x_train[1000:]

y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

In [None]:
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))

* Plotting the training and validation loss

In [None]:
import matplotlib.pyplot as plt

loss = history.history['loss'] 
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.legend()

plt.show()

* Plotting the training and validation accuracy

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

* We can observe that the network begins to overfit after nine epochs.
* Retraining a model from scratch

In [None]:
model.fit(partial_x_train, 
          partial_y_train, 
          epochs=9, 
          batch_size=512, 
          validation_data=(x_val, y_val)) 

results = model.evaluate(x_test, one_hot_test_labels)

In [None]:
print(results)

* Retraining a model from scratch is not a good idea if we have a large-scale training set.
* In this case, we can use `callbacks` functionality in `keras`.
  * https://keras.io/callbacks/
  
* Before, we need to mount Google Drive storage with our colab instance.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
%cd /content/gdrive

In [None]:
!ls

In [None]:
%cd 'My Drive'/exp

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint

filepath = '/content/gdrive/My Drive/exp/model.{epoch:02d}.hdf5'
modelckpt = ModelCheckpoint(filepath=filepath)

model.fit(partial_x_train, 
          partial_y_train, 
          epochs=20, 
          batch_size=512, 
          validation_data=(x_val, y_val),
          callbacks=[modelckpt]) 

results = model.evaluate(x_test, one_hot_test_labels)

* Load the trained model at epoch 9

In [None]:
best_model_path = '/content/gdrive/My Drive/exp/model.09.hdf5'
best_model = models.load_model(best_model_path)

In [None]:
results = best_model.evaluate(x_test, one_hot_test_labels)
print(results)

> ### Generating predictions on new data

In [None]:
predictions = model.predict(x_test)

In [None]:
predictions.shape

In [None]:
predictions[0]

In [None]:
np.sum(predictions[0])

In [None]:
np.argmax(predictions[0])

> ### A different way to handle the labels and the loss

In [None]:
y_train = np.array(train_labels)
y_test = np.array(test_labels)

In [None]:
model.compile(optimizer='rmsprop',
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

> ### The importance of having sufficiently large hidden layers

In [None]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))

model.compile(optimizer='rmsprop',
               loss='categorical_crossentropy',
               metrics=['accuracy'])

model.fit(partial_x_train,
          partial_y_train,
          epochs=20,
          batch_size=128,
          validation_data=(x_val, y_val))