##### Deep Lerning - Neural Network Multi Class Classifier with Keras
Exercise from: Pan,Chao.DeepLearningWithPython:StepByStepGuideWithKerasandPytorch

What that means is that the classification task would involve multiple classes and each sample can only belong to one class. 
We would use a document classification dataset from the Reuters news agency. 
The dataset contains short newswires that are annotated by their topics. The topics serve as the label of documents in the collection. We are therefore faced with a natural language understanding task as each newswire represents a story. The Reuters dataset is bundled with other datasets in Keras. This enables easy experimentation as we can access various datasets from the datasets module. Let us import the dataset, Keras and Numpy which is an efficient numerical computation library that would help us perform certain tasks.


In [7]:
import tensorflow as tf
import keras

In [8]:
import numpy as np
np.random.seed(123)

Import newswire data from Reuters

In [9]:
from keras.datasets import reuters

Load the dataset into memory separating it into data and labels for the train and tests sets respectively.
We have 8,992 samples in the training set and 2,246 sequences in the test set

In [13]:
(train_data, train_labels),(test_data, test_labels)=reuters.load_data(num_words=10000)
print(len(train_data),'train sequences')
print(len(test_data),'test sequences')

(8982, 'train sequences')
(2246, 'test sequences')


In [14]:
print(train_data[10])

[1, 245, 273, 207, 156, 53, 74, 160, 26, 14, 46, 296, 26, 39, 74, 2979, 3554, 14, 46, 4689, 4329, 86, 61, 3499, 4795, 14, 61, 451, 4329, 17, 12]


The sample is a list of integer numbers which represents word indices. We can decode the word indices above using the following suite of code.

In [16]:
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

In [17]:
decoded_newswire = ' '.join([reverse_word_index.get(i-3, '?') for i in train_data[0]])
print(decoded_newswire)

? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3


Let us check the number of classes we are to predict. They are currently stored as integer values starting from index 0. 

In [18]:
num_classes = np.max( train_labels) + 1
print( num_classes, 'classes')

(46, 'classes')


There are 46 distinct categories for newswires. The next step is data preparation as it is important that we convert the data into a desirable form. We first convert our features which are currently stored as word indices into one-hot representations using the tokenizer class from the preprocessing module.

In [19]:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer( num_words = 10000)
train_data = tokenizer.sequences_to_matrix( train_data, mode = 'binary')
test_data = tokenizer.sequences_to_matrix( test_data, mode = 'binary')

We also use convert our labels to one hot representations. These are sometimes called categorical encodings and we use to_categorical utility from Keras. 

In [21]:
from keras.utils.np_utils import to_categorical

one_hot_train_labels = to_categorical( train_labels, num_classes)
one_hot_test_labels = to_categorical( test_labels, num_classes) 

We start describing the model as a sequence of layers using the sequential class in Keras.

In [22]:
from keras import models 
from keras import layers 

model = models.Sequential() 
model.add( layers.Dense( 64, activation = 'relu', input_shape =( 10000,))) 
model.add( layers.Dense( 64, activation = 'relu')) 
model.add( layers.Dense( 46, activation = 'softmax'))


The model is a 3-layer feedforward neural network with 64 units in the hidden layers and 46 units in the output layer which corresponds to the number of classes. 
Next we compile the described model by providing a learning strategy and a success metric.


In [24]:
model.compile( optimizer = 'rmsprop', 
              loss = 'categorical_crossentropy', 
              metrics = ['accuracy'])

We use categorical_crossentropy as the loss function because this is a multi-class classification problem.
Next, we train the model on the training set, specifying that 10% of training data should be used for validation.


In [28]:
history = model.fit(train_data, train_labels, 
                    batch_size = 512, 
                    epochs = 20, 
                    verbose = 1, 
                    validation_split = 0.1)

ValueError: Error when checking target: expected dense_3 to have shape (46,) but got array with shape (1,)

We achieve a validation accuracy of 77%. Let us now plot the training and validation losses using the code below.


In [30]:
# uncomment the next line if running in Jupyter notebook to allow matplotlib inline plots 
%matplotlib inline 
import matplotlib.pyplot as plt 

loss = history.history['loss'] 
val_loss = history.history['val_loss'] 

epochs = range( 1, len( loss) + 1) 

plt.plot( epochs, loss, 'bo', label = 'Training loss') 
plt.plot( epochs, val_loss, 'b', label = 'Validation loss') 
plt.title('Training and validation loss') 
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.legend()
plt.show()


NameError: name 'history' is not defined

We can observe that the loss initially drops rapidly both in the train and validation split, 
before it starts rising in the validation set due to overfitting. 
Next we plot the accuracy of the train and validation sets. 


In [31]:
plt.clf() # clear figure 

acc = history.history['acc'] 
val_acc = history.history['val_acc'] 
plt.plot( epochs, acc, 'bo', label ='Training acc') 
plt.plot( epochs, val_acc, 'b', label ='Validation acc') 
plt.title('Training and validation accuracy') 
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.legend() 

plt.show()



NameError: name 'history' is not defined

<Figure size 432x288 with 0 Axes>

From the plot above, the training accuracy continues increasing before plateauing, 
overfitting to the training set in the process whereas the validation accuracy stagnates faster.
This indicates that we should train the model for less epochs to avoid overfitting or try 
employing some regularization techniques. 

Finally, we can evaluate our model on the test set by calling evaluate method on the trained 
model instance.


In [32]:
score = model.evaluate( test_data, test_labels, 
                       batch_size = 512, verbose = 1) 
print(' Test accuracy:', score[ 1])


ValueError: Error when checking target: expected dense_3 to have shape (46,) but got array with shape (1,)

We achieve an accuracy of 0.78 on the test set. This means that for 78% of samples contained in the test set, our trained model was able to correctly predict the topic of those newswires.
