# Lesson 10 Assignment - Keras LTSM


## Dataset

The Keras Reuters newswire topics classification dataset. This dataset contains 11,228 newswires from Reuters, labeled with over 46 topics.

## Instructions

Using the <a href='https://keras.io/datasets/#reuters-newswire-topics-classification'>Keras</a> dataset, perform each of the following data preparation tasks and answer the related questions:

1. Read Reuters dataset into training and testing.
2. Prepare dataset.
3. Build and compile 3 different models using Keras LTSM ideally improving model at each iteration.
4. Describe and explain your findings.

In [1]:
import tensorflow as tf
from tensorflow import keras

import numpy as np

## Download the Reuters dataset

In [4]:
data = tf.keras.datasets.reuters

In [3]:
np.__version__

'1.16.3'

> ### IMPORTANT if numpy version is 1.16.3
>
> Edit the reuters.py file to solve ```ValueError: Object arrays cannot be loaded when allow_pickle=False``` which arises when numpy version is '1.16.3'
>
> On my Mac the path is:
```Macintosh HD⁩ ▸ ⁨anaconda3⁩ ▸ ⁨lib⁩ ▸ ⁨python3.6⁩ ▸ ⁨site-packages⁩ ▸ ⁨tensorflow⁩ ▸ ⁨python⁩ ▸ ⁨keras⁩ ▸ ⁨datasets⁩```
>
> Change line 83 as per the diff:
>
>    ```python
>    -  with np.load(path) as f:
>    +  with np.load(path, allow_pickle=True) as f:
>    ```
> For alternative solutions, see <a>https://stackoverflow.com/questions/55890813/how-to-fix-object-arrays-cannot-be-loaded-when-allow-pickle-false-for-imdb-loa</a>

In [5]:
num_of_words=10000  # keep the top 10,000 most frequently occurring words in the training data
(x_train, y_train), (x_test, y_test) = data.load_data(num_words=num_of_words)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


## Explore the data

In [6]:
print("Training entries: {}, labels: {}".format(len(x_train), len(y_train)))

Training entries: 8982, labels: 8982


In [7]:
print('Number of words in the first and second newswires')
len(x_train[0]), len(x_train[1])

Number of words in the first and second newswires


(87, 56)

In [8]:
print('How the first newswire looks like:')
print(x_train[0])

How the first newswire looks like:
[1, 2, 2, 8, 43, 10, 447, 5, 25, 207, 270, 5, 3095, 111, 16, 369, 186, 90, 67, 7, 89, 5, 19, 102, 6, 19, 124, 15, 90, 67, 84, 22, 482, 26, 7, 48, 4, 49, 8, 864, 39, 209, 154, 6, 151, 6, 83, 11, 15, 22, 155, 11, 15, 7, 48, 9, 4579, 1005, 504, 6, 258, 6, 272, 11, 15, 22, 134, 44, 11, 15, 16, 8, 197, 1245, 90, 67, 52, 29, 209, 30, 32, 132, 6, 109, 15, 17, 12]


In [27]:
print("Testing entries: {}, labels: {}".format(len(x_test), len(y_test)))

Testing entries: 2246, labels: 2246


## Convert the integers back to words

In [9]:
# A dictionary mapping words to an integer index
word_index = tf.keras.datasets.reuters.get_word_index()

# The first indices are reserved
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json


In [10]:
decode_review(x_train[0])

'<START> <UNK> <UNK> said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3'

## Prepare the data

In [20]:
# Only consider the first 256 words within the newswire
max_newswire_length = 256
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_newswire_length)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_newswire_length)

In [21]:
print('Number of words in the first and second newswires:')
len(x_train[0]), len(x_train[1])

Number of words in the first and second newswires


(256, 256)

In [22]:
print('Padded first newswire:')
print(x_train[0])

Padded first newswire
[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    1    2    2    8   43   10  447    5   25  207  270    5 3095
  111   16  369  186   90   67    7   89    5   19  102

## Build the models

In [32]:
# Model 0
embedding_vecor_length = 32
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=num_of_words,
                                 output_dim=embedding_vecor_length,
                                 input_length=max_newswire_length))
model.add(keras.layers.LSTM(100))
model.add(keras.layers.Dense(46, activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 256, 32)           320000    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_2 (Dense)              (None, 46)                4646      
Total params: 377,846
Trainable params: 377,846
Non-trainable params: 0
_________________________________________________________________


In [41]:
model.compile(loss='sparse_categorical_crossentropy',  # expects integer targets
              optimizer='adam',
              metrics=['accuracy'])

In [76]:
# Model 1
model_1 = keras.models.Sequential()
model_1.add(keras.layers.Embedding(input_dim=num_of_words, # vocabulary size
                                   output_dim=32,
                                   input_length=max_newswire_length)
           )
model_1.add(keras.layers.LSTM(64, return_sequences=True))
model_1.add(keras.layers.LSTM(64))
    model_1.add(keras.layers.Dense(46, activation='sigmoid'))

model_1.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 256, 32)           320000    
_________________________________________________________________
lstm_8 (LSTM)                (None, 256, 64)           24832     
_________________________________________________________________
lstm_9 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dense_5 (Dense)              (None, 46)                2990      
Total params: 380,846
Trainable params: 380,846
Non-trainable params: 0
_________________________________________________________________


In [80]:
model_1.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',  # rmsprop Loss: 1.9, Accuracy: 0.5
              metrics=['accuracy'])

In [84]:
# Model 2
num_classes = np.max(y_train) + 1  # 46
# For use with loss='categorical_crossentropy',
binary_y_train = keras.utils.to_categorical(y_train, num_classes)
binary_y_test = keras.utils.to_categorical(y_test, num_classes)
print('binary_y_train shape:', binary_y_train.shape)
print('binary_y_test shape:', binary_y_test.shape)

model_2 = keras.models.Sequential()
model_2.add(keras.layers.Embedding(input_dim=num_of_words, # vocabulary size
                                   output_dim=32,
                                   input_length=max_newswire_length)
           )
model_2.add(keras.layers.LSTM(128, return_sequences=True))
model_2.add(keras.layers.Dropout(0.5))
model_2.add(keras.layers.LSTM(64))
model_2.add(keras.layers.Dense(46, activation='sigmoid'))

model_2.summary()

binary_y_train shape: (8982, 46)
binary_y_test shape: (2246, 46)
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_7 (Embedding)      (None, 256, 32)           320000    
_________________________________________________________________
lstm_10 (LSTM)               (None, 256, 128)          82432     
_________________________________________________________________
dropout (Dropout)            (None, 256, 128)          0         
_________________________________________________________________
lstm_11 (LSTM)               (None, 64)                49408     
_________________________________________________________________
dense_6 (Dense)              (None, 46)                2990      
Total params: 454,830
Trainable params: 454,830
Non-trainable params: 0
______________________

In [85]:
model_2.compile(loss='categorical_crossentropy',
                optimizer='adam',
                metrics=['accuracy'])

## Train the models

In [65]:
# Model 0
model.fit(x_train,
          y_train,
          epochs=4,
          batch_size=32,
          validation_split=0.1,  # Do not use test data for validation
          verbose=1)

Train on 8083 samples, validate on 899 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0xb352c40b8>

In [81]:
# Model 1
model_1.fit(x_train,
            y_train,
            epochs=4,
            batch_size=64,
            validation_split=0.1,  # Do not use test data for validation
            verbose=1)

Train on 8083 samples, validate on 899 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0xb3af5c128>

In [89]:
# Model 2
model_2.fit(x_train,
            binary_y_train,
            epochs=4,
            batch_size=64,
            validation_split=0.1,  # Do not use test data for validation
            verbose=1)

Train on 8083 samples, validate on 899 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0xb3da00400>

## Evaluate the models

In [67]:
# Use the test data only once for the final evaluation
model0_res = model.evaluate(x_test, y_test)

print('Model 0:\nLoss: {}, Accuracy: {}'.format(*model0_res))

Model 0:
Loss: 1.6141657850515174, Accuracy: 0.5979518890380859


In [79]:
model_1_res = model_1.evaluate(x_test, y_test)

print('Model 1:\nLoss: {}, Accuracy: {}'.format(*model_1_res))

Model 1:
Loss: 1.904869069186896, Accuracy: 0.5066785216331482


In [91]:
model_2_res = model_2.evaluate(x_test, binary_y_test)

print('Model 2:\nLoss: {}, Accuracy: {}'.format(*model_2_res))

Model 2:
Loss: 2.417198654380316, Accuracy: 0.36197686195373535


## Summary

I have built 3 models with arbitrarily chosen parameters. The simplest model of roughy the following form:


         +-------------------+   +----------------+   +------------+
    x -->| Embedding         |-->| LSTM (100)     |-->| DenseLayer |--> y
         | (out_shape=256,32)|   |                |   | (softmax)  |
         +-------------------+   +----------------+   +------------+

with `loss_function='sparse_categorical_crossentropy'`, `optimizer='adam'`, `batch_size=32`, and `epochs=4`, has <u><b>accuracy</b></u> of approximately <u><b>0.6</b></u>.


Then I made random modifications to this model to produce two more models of the following forms:

         +-------------------+   +----------------+   +----------------+   +------------+
    x -->| Embedding         |-->| LSTM (256, 64) |-->| LSTM (64)      |-->| DenseLayer |--> y
         | (out_shape=256,32)|   |                |   |                |   | (sigmoid)  |
         +-------------------+   +----------------+   +----------------+   +------------+

and


         +-------------------+   +----------------+   +----------------+   +------------+
    x -->| Embedding         |-->| LSTM (256, 128)|-->| LSTM (64)      |-->| DenseLayer |--> y
         | (out_shape=256,32)|   | Dropout (0.5)  |   |                |   | (sigmoid)  |
         +-------------------+   +----------------+   +----------------+   +------------+

which have the accuracy scores of 0.5 and 0.4, respectively.

My conclusion is that the simplest model works best. It is probably possible to improve it but not by randomly modifying its parameters without experience and intuition about what might work better. A more fruitful approach might be to use grid search for parameters but it would take an enormous amount of time so I decided not to.