In [1]:
import numpy as np
import matplotlib.pyplot as plt
from jupyterthemes import jtplot
jtplot.style()
%matplotlib notebook

# One Hot Encoding of Words

The purpose of this technique is to transform a list of documents into a tensor (either a vector or a matrix) that can then be fed into a Neural Net. The output of this process can either be a 2D matrix or a 3D tensor. Both of these are binary, i.e, they only have 0s and 1s as their elements. The *hyper parameters* to this process are -

* The tokenization scheme (single words, 2-grams, 3-grams, etc.)
* Maximum size of the vocabulary (*v*)
* [Optionally] The maximum document length (in number of tokens) (*n*)

#### Create the vocabulary
Create a frequency distribution of tokens. Assign each token a unique id or index. Typically, the tokens are given indexes depending on their frequency in the corpus. So the most frequent token will have index 1, the next most frequent token will have index 2, and so on. The output of this step are two maps - `tokens` which, given an index, gives the corresponding token at that index, i.e., `tokens[index] = token_at_index`. And `indexes` which, given a token, gives the index of that token, i.e., `indexes[token] = index_of_token`. Note, index 0 is not assigned to anything.

The final vocabulary is supposed to contain the top *v* tokens. One way to implement this is to only consider tokens whose indexes are less than *v* and drop everything else. This is however, not done at this step.

#### Transform documents into integer vectors
Replace each token in the document with its index. Then drop the indexes that are not part of the vocabulary. Each document vector will potentially have a different length. 

#### Transform documents into a 3D tensor
Each document is first truncated to the first *n* indexes. Then each index is replaced with a one-hot vector that is *v* in size. All elements of this vector are 0 except the element at index which is 1. Think of this as transforming each document into a matrix of *n* x *v*. Each row of this document matrix is the one-hot vector of the word at that position. If a word occurs multiple times (say at position *i* and *j*), the same one-hot vector will be repeated at rows *i* and *j*. This is done for all documents resulting in a tensor that is *m x n x v* in size.

#### Transform documents into a 2D matrix
As an alternate representation, instead of a 3D tensor, the output is a 2D matrix. Here there is no need to truncate the document. Each document is represented as a vector *v* in size where all the elements are 0 except indexes that appear in that document, which are set to 1. In this representation, neither the position of the word nor its frequency are relevant.

## Creating sample documents
In order to better understand how word embeddings work, it is nice to have a synthetic set of documents with specific words and frequencies. Let us decide on having 3 documents and 10 words in the entire corpus. Let the 10 words be *pedantic, fruit, ornament, magic, laptop, ipad, book, console, piano, hugs*. And let the total frequency of these words in the entire corpus (and the documents) be as shown in the code cell below.

In [2]:
# Create sample documents
word_freqs = {
    'pedantic': (3, 2, 4),
    'fruit': (0, 3, 5),
    'ornament': (2, 0, 6),
    'magic': (2, 0, 5),
    'laptop': (2, 3, 0),
    'ipad': (3, 1, 0),
    'book': (1, 1, 2),
    'console': (1, 1, 2),
    'piano': (2, 0, 0),
    'hugs': (1, 0, 0),
}

raw_docs = [[], [], []]
for i in range(len(raw_docs)):
    for word, freqs in word_freqs.items():
        words = [word] * freqs[i]
        raw_docs[i] += words

docs = []
for raw_doc in raw_docs:
    np.random.shuffle(raw_doc)
    doc = ' '.join(raw_doc)
    docs.append(doc)

tp = [print(doc, len(doc.split()), end='\n\n') for doc in docs]

pedantic ornament magic pedantic pedantic piano book ipad piano hugs laptop ipad magic console ornament laptop ipad 17

console fruit laptop ipad fruit fruit book laptop pedantic laptop pedantic 11

magic magic pedantic console book fruit pedantic fruit ornament pedantic magic fruit ornament console ornament magic pedantic fruit ornament book magic ornament ornament fruit 24



In [3]:
# Create the vocuabulary 
from collections import Counter

def create_vocab(docs):
    all_tokens = []
    for doc in docs:
        for word in doc.split():
            all_tokens.append(word)
    
    word_freqs = Counter(all_tokens)
    tokens = []
    for word, freq in word_freqs.most_common():
        tokens.append(word)
    tokens = [None] + tokens
    
    indexes = {token: index for index, token in enumerate(tokens)}
    del indexes[None]
    
    return tokens, indexes

tokens, indexes = create_vocab(docs)
print(tokens)
print(indexes)

[None, 'pedantic', 'ornament', 'fruit', 'magic', 'laptop', 'book', 'ipad', 'console', 'piano', 'hugs']
{'pedantic': 1, 'ornament': 2, 'fruit': 3, 'magic': 4, 'laptop': 5, 'book': 6, 'ipad': 7, 'console': 8, 'piano': 9, 'hugs': 10}


In [4]:
# Transform documents to vectors
raw_vecs = [[indexes[token] for token in doc.split()] for doc in docs]

v = 7
vecs = [[indexes[token] for token in doc.split() if indexes[token] < v] for doc in docs]

for doc, raw_vec, vec in zip(docs, raw_vecs, vecs):
    print('\n')
    print(doc)
    print(raw_vec)
    print(vec, len(vec))



pedantic ornament magic pedantic pedantic piano book ipad piano hugs laptop ipad magic console ornament laptop ipad
[1, 2, 4, 1, 1, 9, 6, 7, 9, 10, 5, 7, 4, 8, 2, 5, 7]
[1, 2, 4, 1, 1, 6, 5, 4, 2, 5] 10


console fruit laptop ipad fruit fruit book laptop pedantic laptop pedantic
[8, 3, 5, 7, 3, 3, 6, 5, 1, 5, 1]
[3, 5, 3, 3, 6, 5, 1, 5, 1] 9


magic magic pedantic console book fruit pedantic fruit ornament pedantic magic fruit ornament console ornament magic pedantic fruit ornament book magic ornament ornament fruit
[4, 4, 1, 8, 6, 3, 1, 3, 2, 1, 4, 3, 2, 8, 2, 4, 1, 3, 2, 6, 4, 2, 2, 3]
[4, 4, 1, 6, 3, 1, 3, 2, 1, 4, 3, 2, 2, 4, 1, 3, 2, 6, 4, 2, 2, 3] 22


In [5]:
# Create 3D tensor
m = len(docs)
n = 15
X_3d = np.zeros((m, n, v))
for i, vec in enumerate(vecs):
    for j, index in enumerate(vec[:n]):
        X_3d[i, j, index] = 1
print(X_3d)

[[[ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  1.  0.  0.]
  [ 

In [6]:
# Create 2D matrix
X_2d = np.zeros((m, v))
for i, vec in enumerate(vecs):
    for index in vec:
        X_2d[i, index] = 1
print(X_2d)

[[ 0.  1.  1.  0.  1.  1.  1.]
 [ 0.  1.  0.  1.  0.  1.  1.]
 [ 0.  1.  1.  1.  1.  0.  1.]]


## Using Keras for one-hot encoding

### 2D matrix representation
Keras has in-built support for this.

In [7]:
from keras.preprocessing.text import Tokenizer

# del indexes
# del tokens
# del vecs
# del raw_vecs
# del X_2d

tokenizer = Tokenizer(num_words=v)
tokenizer.fit_on_texts(docs)

# Keras only has the indexes map, so create the tokens map by hand
indexes_k = tokenizer.word_index
tokens_k = [None] * (len(indexes_k) + 1)
for word, index in indexes_k.items():
    tokens_k[index] = word
print(indexes_k)
print(tokens_k)

raw_vecs_k = [[indexes_k[token] for token in doc.split()] for doc in docs]
vecs_k = tokenizer.texts_to_sequences(docs)
for doc, raw_vec, vec in zip(docs, raw_vecs_k, vecs_k):
    print('\n')
    print(doc)
    print(raw_vec)
    print(vec)

X_2d_k = tokenizer.texts_to_matrix(docs)
print('\n', X_2d_k)
print(np.array_equal(X_2d, X_2d_k))

Using TensorFlow backend.


{'pedantic': 1, 'ornament': 2, 'fruit': 3, 'magic': 4, 'laptop': 5, 'book': 6, 'ipad': 7, 'console': 8, 'piano': 9, 'hugs': 10}
[None, 'pedantic', 'ornament', 'fruit', 'magic', 'laptop', 'book', 'ipad', 'console', 'piano', 'hugs']


pedantic ornament magic pedantic pedantic piano book ipad piano hugs laptop ipad magic console ornament laptop ipad
[1, 2, 4, 1, 1, 9, 6, 7, 9, 10, 5, 7, 4, 8, 2, 5, 7]
[1, 2, 4, 1, 1, 6, 5, 4, 2, 5]


console fruit laptop ipad fruit fruit book laptop pedantic laptop pedantic
[8, 3, 5, 7, 3, 3, 6, 5, 1, 5, 1]
[3, 5, 3, 3, 6, 5, 1, 5, 1]


magic magic pedantic console book fruit pedantic fruit ornament pedantic magic fruit ornament console ornament magic pedantic fruit ornament book magic ornament ornament fruit
[4, 4, 1, 8, 6, 3, 1, 3, 2, 1, 4, 3, 2, 8, 2, 4, 1, 3, 2, 6, 4, 2, 2, 3]
[4, 4, 1, 6, 3, 1, 3, 2, 1, 4, 3, 2, 2, 4, 1, 3, 2, 6, 4, 2, 2, 3]

 [[ 0.  1.  1.  0.  1.  1.  1.]
 [ 0.  1.  0.  1.  0.  1.  1.]
 [ 0.  1.  1.  1.  1.  0.  1.]]
True


### Note
The astute reader will have observed that there is a off-by-one bug in the code above. Even though I wanted the top 7 words in the vocab, I am only getting the top 6 words because the first token vocab is always None. To keep things consistent with Keras, I have deliberately introduced this bug in my code in cell 4 line 6, the less-than sign should be replaced with the less-than-equal-to sign -
```
vecs = [[indexes[token] for token in doc.split() if indexes[token] <= v] for doc in docs]
```

### 3D tensor representation
Keras does not have in-built support for this, so need to do some pre-processing. First convert all document vectors into equal size of length *n*. Then turn each same-length document vector into 2D matrix of size *n* x *v*. Do this for all the *m* documents to get a tensor of size *m* x *n* x *v*.

Our documents are of unequal lengths - 17, 11, and 24 and after throwing some tokens (using *v* as 7) we end up document vectors that are 12, 9, and 20 in size. Lets choose *n* to be 15. This means that the first and second documents will have to be padded and the third document will have to be cut. By default the Keras padder pads/cuts from the front. But to show equivalence with the earlier 3D tensor, we will ask Keras to trucnate/pad from the end.

In [8]:
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer

v = 7
tokenizer = Tokenizer(num_words=v)
tokenizer.fit_on_texts(docs)

indexes = tokenizer.word_index
tokens = {index: word for word, index in indexes.items()}

vecs = tokenizer.texts_to_sequences(docs)
tp = [print(len(vec)) for vec in vecs]
padded_vecs = pad_sequences(vecs, maxlen=15, padding='post', truncating='post')
for padded_vec, vec in zip(padded_vecs, vecs):
    print(np.array(vec))  # Converting to nupy array for better printing
    print(padded_vec)
    print('\n')

10
9
22
[1 2 4 1 1 6 5 4 2 5]
[1 2 4 1 1 6 5 4 2 5 0 0 0 0 0]


[3 5 3 3 6 5 1 5 1]
[3 5 3 3 6 5 1 5 1 0 0 0 0 0 0]


[4 4 1 6 3 1 3 2 1 4 3 2 2 4 1 3 2 6 4 2 2 3]
[4 4 1 6 3 1 3 2 1 4 3 2 2 4 1]




Now take each same-length document vector and turn it into a 2D matrix. Do this for all documents.

In [9]:
n = 15  # All vecs are of this length
v = 7
def vec2mat(vec):
    mat = []
    for index in vec:
        one_hot_vec = np.zeros(v)
        if index > 0:
            one_hot_vec[index] = 1
        mat.append(one_hot_vec)
    return np.array(mat)

X_3d_k = []
for padded_vec in padded_vecs:
    mat = vec2mat(padded_vec)
    X_3d_k.append(mat)
X_3d_k = np.array(X_3d_k)
print(X_3d_k)
print(np.array_equal(X_3d, X_3d_k))

[[[ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  1.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  1.  0.  0.]
  [ 

# Embeddings

In [10]:
import numpy as np
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Flatten, Dense, Embedding

## Preprocess the data
Take the top 10,000 words/tokens as part of the vocabulary. Note, that the imdb dataset does not discard words that are not in the vocabulary. Instead, it replaces them with the special *unknown* token.

Take the first 30 words in each review. If a review has less than 20 words, pad the rest. This includes *unknown* tokens. So if the word vector had a token *unknown*, it would be counted when taking the first 30 words. Remember Keras by default pads/truncate from the front. I have to explicitly ask it to pad/truncate from the back so as to take the first 30 words instead of the last 30 words. In reality it does not matter where I pad/truncate, but because I have been examining this dataset from the front, I'll continue to pad/truncate from the front.

In the first sample, index for the *unknown* token (2) is included in the padded sample.

In [11]:
v = 10000
n = 30
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=v)

In [15]:
x_train_padded = pad_sequences(x_train, maxlen=n, padding='post', truncating='post')
x_test_padded = pad_sequences(x_test, maxlen=n, padding='post', truncating='post')
print(np.array(x_train[0])[:30])
print(x_train_padded[0])

[   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941    4
  173   36  256    5   25  100   43  838  112   50  670    2    9   35  480]
[   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941    4
  173   36  256    5   25  100   43  838  112   50  670    2    9   35  480]


## Learn the embeddings from the input data
The Embedding layer converts a document vector (or any vector of integers) of size *n* into a dense 2D matrix of size *n* x *d*. Here *n* is the fixed size of the document, and *d* is the dimensionality of the embedding, i.e., each **word (index)** will be converted into a dense float vector of size *d*. Also, each element of the incoming document vector is a token index in the range [1, *v*). So all the *m* documents will be converted into a 3D tensor of size *m* x *n* x *d*. Notice the similarity between embeddings and 3D tensor representation of one-hot vectors.

The Embedding layer needs two manadatory arguments - the length of the vocabulary *v* and the output dimensionality *d*. If the Embedding layer is the first layer in the network (as it usually is) we need to provide the input_shape of the incoming sample. The Embedding layer also accepts a named argument called `input_length` instead `input_shape` if the input is going to be 1D. So `input_length=n` is equivalent to `input_shape=(n,)`. I prefer to use input_shape to keep things consistent.

Each 2D document of size *n* x *d* then needs to be flattened. The Flatten layer will take each row, starting from the first row, and lay them side-by-side to create a big row vector. The output of the Flatten layer is a row vector of size *n*.*d*.

Finally, add a Dense single unit classification layer with sigmoid activation so the network will learn to classify a review as either 0 or 1.

In [16]:
model = Sequential()
model.add(Embedding(v, 8, input_shape=(n,)))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 30, 8)             80000     
_________________________________________________________________
flatten_2 (Flatten)          (None, 240)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 241       
Total params: 80,241
Trainable params: 80,241
Non-trainable params: 0
_________________________________________________________________


In [17]:
history = model.fit(x_train_padded, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


I get a final validation accuracy of 73%. Interestingly enough if I pad/truncate from the start (instead from the end as I have done so far) the validation accuracy improves to around 76%.

In [18]:
x_train_padded_pre = pad_sequences(x_train, maxlen=n)
x_test_padded_pre = pad_sequences(x_test, maxlen=n)

In [19]:
history = model.fit(x_train_padded_pre, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Use external embeddings

### Create the embeddings matrix
Download the Glove embeddings db from http://nlp.stanford.edu/data/glove.6B.zip and extract it in /data/learn-keras. The dataset consists of 400,000 words with the embedding dimension being 50, 100, 200, and 300. Each embedding dimension is in a different file. So the file glove.6B.100d.txt contains the embeddings of all 400,000 words, with each embedding vector being 100 in size.

We need to create the embeddings matrix from this db to be used in Keras. The embedding matrix is a
matrix of size *v x d* where *v* is the vocabulary size and *d* is the dimenstionality of the embeddings. Each row of the matrix contains the embeddings for the token with that rownum as the index. E.g, `mat[i] = [embedding vector of token with index i]`

In [20]:
indexes = imdb.get_word_index()
embeddings_indexes = {}
with open('/data/learn-keras/glove.6B.100d.txt', 'rt') as f:
    for line in f:
        flds = line.split()
        word = flds[0]
        embeddings_vec = np.array(flds[1:], dtype=np.float32)
        embeddings_indexes[word] = embeddings_vec

v = 10000
d = 100
embeddings_matrix = np.zeros((v, d))
words_no_embeddings = []
for word, index in indexes.items():
    if index < v:
        if word in embeddings_indexes:
            embeddings_vec = embeddings_indexes[word]
            embeddings_matrix[index] = embeddings_vec
        else:
            words_no_embeddings.append(word)
print(f'Unable to find embeddings for {len(words_no_embeddings)} words')
print(words_no_embeddings[:10])

Unable to find embeddings for 203 words
["else's", "miyazaki's", "victoria's", "paul's", "chan's", "show's", "wife's", "character's", "hadn't", "isn't"]


In [21]:
sorted(indexes.items(), key=lambda kv: kv[1])[:10]

[('the', 1),
 ('and', 2),
 ('a', 3),
 ('of', 4),
 ('to', 5),
 ('is', 6),
 ('br', 7),
 ('in', 8),
 ('it', 9),
 ('i', 10)]

In [22]:
np.array_equal(embeddings_matrix[1], embeddings_indexes['the'])

True

In [23]:
embeddings_matrix[indexes['the']]

array([-0.038194  , -0.24487001,  0.72812003, -0.39961001,  0.083172  ,
        0.043953  , -0.39140999,  0.3344    , -0.57545   ,  0.087459  ,
        0.28786999, -0.06731   ,  0.30906001, -0.26383999, -0.13231   ,
       -0.20757   ,  0.33395001, -0.33848   , -0.31742999, -0.48335999,
        0.1464    , -0.37303999,  0.34577   ,  0.052041  ,  0.44946   ,
       -0.46970999,  0.02628   , -0.54154998, -0.15518001, -0.14106999,
       -0.039722  ,  0.28277001,  0.14393   ,  0.23464   , -0.31020999,
        0.086173  ,  0.20397   ,  0.52623999,  0.17163999, -0.082378  ,
       -0.71787   , -0.41531   ,  0.20334999, -0.12763   ,  0.41367   ,
        0.55186999,  0.57907999, -0.33476999, -0.36559001, -0.54856998,
       -0.062892  ,  0.26583999,  0.30204999,  0.99774998, -0.80480999,
       -3.0243001 ,  0.01254   , -0.36941999,  2.21670008,  0.72201002,
       -0.24978   ,  0.92136002,  0.034514  ,  0.46744999,  1.10790002,
       -0.19358   , -0.074575  ,  0.23353   , -0.052062  , -0.22

### Define the model
Define the model with the embeddings layer as usual. But then similar to transfer learning, set the weights of the embeddings layer manually and then freeze it so they won't change during training time.

In [24]:
v = 10000
n = 30
d = 100
model = Sequential()
model.add(Embedding(v, d, input_shape=(n,)))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.layers[0].set_weights([embeddings_matrix])
model.layers[0].trainable = False
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 30, 100)           1000000   
_________________________________________________________________
flatten_3 (Flatten)          (None, 3000)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 3001      
Total params: 1,003,001
Trainable params: 1,003,001
Non-trainable params: 0
_________________________________________________________________


In [26]:
history = model.fit(x_train_padded_pre, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
