## Data

This dataset consists of 11,228 newswires from the Reuters news agency. Each wire is encoded as a sequence of word indexes, just as in the IMDB data we encountered in lecture 5 of this series. Moreover, each wire is categorised into one of 46 topics, which will serve as label. This dataset is available through the Keras API.

## Goal

To create a Multi-layer perceptron (MLP) using Keras which we can train to classify news items into the specified 46 topics.

In [1]:
import pip

try:
    __import__('keras')
except ImportError:
    pip.main(['install', 'keras']) 
    
try:
    __import__('h5py')
except ImportError:
    pip.main(['install', 'h5py']) 

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import to_categorical

seed = 1337
np.random.seed(seed)

Using TensorFlow backend.


In [2]:
from keras.datasets import reuters

max_words = 1000
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=max_words,
                                                         test_split=0.2,
                                                         seed=seed)
num_classes = np.max(y_train) + 1  

Downloading data from https://s3.amazonaws.com/text-datasets/reuters.npz

In [3]:
from keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

In [4]:
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

In [5]:
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(1000,)))
model.add(Dropout(0.5)) 
model.add(Dense(num_classes, activation='softmax')) 

In [11]:
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])

In [12]:
from keras import backend as K

K.set_session(K.tf.Session(config=K.tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)))


In [13]:
batch_size = 32
model.fit(x_train, y_train, batch_size=32, epochs=5)
score = model.evaluate(x_test, y_test, batch_size=32)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

In [14]:
score[1]

0.80365093499554763