*** Word Embedding layers in Keras***<br>
Word embeddings provide a dense representation of words and their relative meanings.
They are an improvement over sparse representations used in simpler bag of word model representations.
Word embeddings can be learned from text data and reused among projects. They can also be learned 
as part of fitting a neural network on text data.

In [1]:
from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding


# define documents
docs = ['Well done!','Good work','Great effort','nice work','Excellent!','Weak','Poor effort!',
'not good','poor work','Could have done better.']

# define class labels
labels = array([1,1,1,1,1,0,0,0,0,0])

# integer encode the document
# we can integer encode each document. This means that as input the Embedding layer will have 
# sequences of integers. We could experiment with other more sophisticated bag of word model 
# encoding like counts or TF-IDF.

vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)

# The sequences have different lengths and Keras prefers inputs to be vectorized and all inputs to 
# have the same length.
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)

# The Embedding has a vocabulary of 50 and an input length of 4. We will choose a small embedding 
# space of 8 dimensions.
# the output from the Embedding layer will be 4 vectors of 8 dimensions each, one for each word.
# We flatten this to a one 32-element vector to pass on to the Dense output layer.
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
print(model.summary())

model.fit(padded_docs, labels, epochs=50, verbose=0)
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Using TensorFlow backend.


[[6, 24], [11, 43], [17, 30], [4, 43], [32], [26], [34, 30], [39, 11], [34, 43], [24, 28, 24, 4]]
[[ 6 24  0  0]
 [11 43  0  0]
 [17 30  0  0]
 [ 4 43  0  0]
 [32  0  0  0]
 [26  0  0  0]
 [34 30  0  0]
 [39 11  0  0]
 [34 43  0  0]
 [24 28 24  4]]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 4, 8)              400       
_________________________________________________________________
flatten_1 (Flatten)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________
None
Accuracy: 89.999998
