# Ch. 18 - Temporal Order Matters

In language, the order of words matters. The sentences 'The dog lies on the couch' and 'The couch lies on the dog' contain the exact same words yet they describe two very different situations. Our previous model did not take the order of words into account. In this chapter we will take a look at two methods to ensure that your model can access information from the order of words.

## 1D Convolutions
You might remember convolutional neural networks from computer vision week. In computer vision, convolutional filters slide over the image two dimensionally. There is also a version of convolutional filters that can slide over a sequence one dimensionally. The output is another sequence, much like the output of a two dimensional convolution was another 'image'. Everything else about 1D convolutions is exactly the same as 2D convolutions. 

To make it a bit easier we can download the IMDB dataset directly through Keras with tokenization already done:

In [3]:
from keras.datasets import imdb
from keras.preprocessing import sequence

max_words = 10000  # Our 'vocabulary of 10K words
max_len = 500  # Cut texts after 500 words

# Get data from Keras
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

25000 train sequences
25000 test sequences


In [4]:
# Pad sequences
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

x_train shape: (25000, 500)
x_test shape: (25000, 500)


## Building the conv model

Now we build our convolutional model. You will notice a couple new layers next to ``Conv1D``

- [``MaxPooling1D``](https://keras.io/layers/pooling/#maxpooling1d) works exactly like ``MaxPooling2D`` which we used earlier. It takes a piece of the sequence with specified length and returns the maximum element in the sequence much like it returned the maximum element of a small window in 2D convolutional networks. Note that MaxPooling always returns the maximum element for each channel. 
- [``GlobalMaxPooling2D``](https://keras.io/layers/pooling/#globalmaxpooling1d) returns the maximum over the entire sequence. 

You can see the difference between the two in the model summary below. While ``MaxPooling1D`` significantly shortens the sequence, ``GlobalMaxPooling2D`` removes the temporal dimension entirely:

In [6]:
from keras.models import Sequential
from keras.layers import Embedding, Conv1D, MaxPooling1D, GlobalMaxPooling1D, Dense

model = Sequential()
model.add(Embedding(max_words, 100, input_length=max_len)) # We train our own embeddings
model.add(Conv1D(32, 7, activation='relu')) # 1D Convolution, 32 channels, windows size 7
model.add(MaxPooling1D(5)) # Pool windows of size 5
model.add(Conv1D(32, 7, activation='relu')) # Another 1D Convolution, 32 channels, windows size 7
model.add(GlobalMaxPooling1D()) # Global Pooling
model.add(Dense(1)) # Final Output Layer

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 500, 100)          1000000   
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 494, 32)           22432     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 92, 32)            7200      
_________________________________________________________________
global_max_pooling1d_3 (Glob (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 33        
Total params: 1,029,665
Trainable params: 1,029,665
Non-trainable params: 0
_________________________________________________________________


In [7]:
from keras.optimizers import Adam


model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['acc'])

In [8]:
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10

KeyboardInterrupt: 