## Classifying Newswire Topics
*A multiclass classification example*
### Loading to Dataset
Load the Reuters dataset containing 11.228 newswires (reduced to the 10.000 most occurring words), in 46 mutually 
exclusive topics.

In [2]:
# for running on CPU only (Laptop)
import os
# (opt.) force CPU use
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [3]:
from keras.datasets import reuters

# load data and split into train (8.982/~80%) and test (2.246/~20%) data
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)

# reverse index mapping and decode sample
word_index = reuters.get_word_index()
reverse_word_index = dict([(val, key) for (key, val) in word_index.items()])
def decode_newswire(newswire):
    return " ".join([reverse_word_index.get(i-3, "?") for i in newswire])
print(decode_newswire(train_data[0]))

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/reuters_word_index.json
? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3


### Preparing the Data

In [4]:
import numpy as np

# function to vectorize data
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

# vectorize data
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

# function to "one-hot" encode
def encode_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1
    return results

# vectorize labels
one_hot_train_labels = encode_one_hot(train_labels)
one_hot_test_labels = encode_one_hot(test_labels)