<a href="https://colab.research.google.com/github/cagBRT/SentimentTextAnalysis/blob/master/3b_EmbeddingLayeripynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Embedding layers are an essential part of neural networks.
In this notebook we will look at a simple example of an embedded layer.

In [None]:
from keras.models import Sequential
from keras.layers import Embedding
import numpy as np

Embedding layers are mainlyused in NLP.
With Embedding layers you can use pre-trained embeddings (like GloVe) or train your own.

There are three parameters to the embedding layer<br>

input_dim : Size of the vocabulary<br>
output_dim : Length of the vector for each word<br>
input_length : Maximum length of a sequence<br>

In [None]:
model = Sequential()
embedding_layer = Embedding(input_dim=10,output_dim=4,input_length=2)
model.add(embedding_layer)
model.compile('adam','mse')

Embedding layer enables us to convert each word into a fixed length vector of defined size. The resultant vector is a dense one with having real values instead of just 0’s and 1’s. The fixed length of word vectors helps us to represent words in a better way along with reduced dimensions.

Doing this method allows us to thig of the embedding layer like a lookup table. The words are the keys in this table, while the dense word vectors are the values

As you will see in the output below, each word (1 and 2) is represented by a vector of length 4

In [None]:
input_data = np.array([[1,2]])
pred = model.predict(input_data)
print(input_data.shape)
print(pred)



---



---



An example with a small dataset

In [None]:
from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Flatten,Embedding,Dense

In this small dataset we have 10 reviews, 5 postive and 5 negative


In [None]:
# Define 10 restaurant reviews
reviews =[
          'Never coming back!',
          'horrible service',
          'rude waitress',
          'cold food',
          'horrible food!',
          'awesome',
          'awesome services!',
          'rocks',
          'poor work',
          'couldn\'t have done better'
]
#Define labels
labels = array([1,1,1,1,1,0,0,0,0,0])

Here you can see the length of each encoded review is equal to the number of words in that review. Keras one_hot is basically converting each word into its one-hot encoded index. Now we need to apply padding so that all the encoded reviews are of same length. Let’s define 4 as the maximum length and pad the encoded vectors with 0’s in the end.

In [None]:
Vocab_size = 50
encoded_reviews = [one_hot(d,Vocab_size) for d in reviews]
print(f'encoded reviews: {encoded_reviews}')

Here you can see the length of each encoded review is equal to the number of words in that review. Keras one_hot is basically converting each word into its one-hot encoded index. Now we need to apply padding so that all the encoded reviews are of same length. Let’s define 4 as the maximum length and pad the encoded vectors with 0’s in the end.

In [None]:
max_length = 4
padded_reviews = pad_sequences(encoded_reviews,maxlen=max_length,padding='post')
print(padded_reviews)

Create the model

In [None]:
model = Sequential()
embedding_layer = Embedding(input_dim=Vocab_size,output_dim=8,input_length=max_length)
model.add(embedding_layer)
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['acc'])
print(model.summary())

Train the model

Once the training is completed, embedding layer has learnt the weights which are nothing but the vector representations of each word. Lets check the shape of the weight matrix

In [None]:
model.fit(padded_reviews,labels,epochs=100,verbose=0)

This embedding matrix is essentially a lookup table of 50 rows and 8 columns, as evident by the output.



In [None]:
print(embedding_layer.get_weights()[0].shape)

In [None]:
print(embedding_layer.embeddings)