In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

import numpy as np
import os


In [10]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten,Dense,Embedding
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

tf.__version__

'2.3.0'

### Embedding layer in Tensorflow

The Embedding layer takes at least two arguments:
- The number of samples (i.e. words, tokens)
- The dimensionality (length) of the embedding vector

This layer returns a 3D floating point tensor, of shape (num_samples, sequence_length, embedding_dimensionality). 

When this layer is instantiated, its weights are initially random. During training, these word vectors will be gradually adjusted via backpropagation, structuring the space into a representation that can be used in other layers.



### Sentiment prediction using IMDB Movie Reviews

https://www.tensorflow.org/datasets/catalog/imdb_reviews

The number of words is restricted to the top 10,000 most common words in the reviews.

The reviews are truncated to only 20 words. 

The model will consist of:

- An 8-dimensional embedding for each of the 10,000 words
- Flatten layer the tensor to 2D, and  
- A single Dense layer to classify a review as favorable or not.

In [11]:
max_features = 10000 # Number of words to consider as features

maxlen = 20 # Cut text after maxlen (among top max_features most common words)

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
type(X_train[0]),len(X_train[0]),max(X_train[0])

X_train[0][0:10]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
(25000,) (25000,) (25000,) (25000,)


[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65]

In [12]:
len(X_train[1])

189

#### Create 2D integer tensor of shape (samples, maxlen)

In [13]:
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)
print(X_train.shape,X_test.shape)
X_train[0,:]

(25000, 20) (25000, 20)


array([  65,   16,   38, 1334,   88,   12,   16,  283,    5,   16, 4472,
        113,  103,   32,   15,   16, 5345,   19,  178,   32], dtype=int32)

#### Create Model


In [14]:
model = Sequential()

model.add(Embedding(10000, 8, input_length=maxlen))
# Output shape (samples, maxlen, 8)
model.add(Flatten()) # Flatten into a 2D tensor of shape (samples, maxlen * 8)
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 20, 8)             80000     
_________________________________________________________________
flatten (Flatten)            (None, 160)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________


#### Compile and fit

In [15]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Compute classification accuracy

In [16]:
def accuracy(x,y):
    y_hat = model.predict(x)
    foo = np.array([1 if i > .5 else 0 for i in y_hat[:,0] ])
    return np.sum(foo == y)/len(y)


In [17]:
accuracy(X_train,y_train),accuracy(X_test,y_test)

(0.85516, 0.76136)

This model with a single Dense layer treats each word in the input sequence separately. Inter-word relationships and structure sentence are not taken into account.

We could add recurrent layers or 1D convolutional layers after the embedded sequences to learn features that take into account each sequence as a whole.