<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="360" height="160" /></center>


## Embedding Layer in Neural Network 

## Table of Contents

1. [What is word embedding?](#section1)<br>
     1.1 [Why You Need to Start Using Embedding Layers](#section101)<br>
     1.2 [How to obtaion word embedding](#section102)<br>
2. [Learning Word Embeddings for IMDB Movie Review Data](#section2)<br>
     2.1 [Imdb Movie Review data](#section201)<br>
      - 2.1.1 [Overview](#section20101)<br>
      - 2.1.2 [Dataset](#section20102)<br>
3. [Instantiating an Embedding layer](#section3)<br>
     - 3.1 [Loading the IMDB data for use with an Embedding layer](#section301)<br>
     - 3.2 [Let's Train the model](#section302)<br>
     
4. [Conclusion](#section4)<br>

<a id="section1"></a>
## 1. What is word embedding?

A __word embedding__ is a learned representation for text where words that have the same meaning have a similar representation.__Word embeddings__ provide a dense representation of words and their relative meanings.

Because of word embedding words of similar context are close to each other.
<img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/Word-Vectors.png"/>


### 1.1 Why You Need to Start Using Embedding Layers?

So why should you use an __embedding layer__? Here are the two main reasons:

- __One-hot encoded vectors__ are high-dimensional and sparse. Let’s assume that we are doing __Natural Language Processing (NLP)__ and have a dictionary of __2000 words__. This means that, when using __one-hot encoding__, each word will be represented by a vector containing __2000 integers__. And __1999__ of these integers are __zeros__. In a big dataset this approach is not computationally efficient.

- The vectors of each __embedding__ get updated while training the neural network. If you have seen the image at the top of this post you can see how __similarities__ between words can be found in a __multi-dimensional space__. This allows us to visualize __relationships between words__, but also between everything that can be turned into a __vector__ through an __embedding layer__.

### 1.2 How to obtain word embedding?

There are multiple ways to obtain __word embeddings__ few are here:

- Learn __word embeddings__ jointly with the main task you care about(such as document __classification or sentiment prediction__). In this setup, you start with __random word vectors__ and then learn __word vector__ in the same way you learn the __weight__ of a __neural network__.
- Load into your model __word embedidng__ that were precomputed using different machine-learning task tha the one you're trying to solve. These are called __pretrained word embeddings__
    Two popular word embeddings are
    - __GloVe__ 
    - __fastText__
    
- Word Embeddings with __Gensim__

<a id="section2"></a>
## 2. Learning Word Embeddings for IMDB Movie Review Data

<a id="section201"></a>
### 2.1 Imdb Movie Review data

<a id="section20101"></a>
#### 2.1.1 Overview
This dataset contains __movie reviews__ along with their associated binary __sentiment polarity labels__. It is intended to serve as a benchmark for __sentiment classification__. This document outlines how the dataset was gathered, and how to use the files provided.



<a id="section20102"></a>
#### 2.1.2 Dataset
The core dataset contains __50,000 reviews__ split evenly into __25k train and 25k test sets__. The overall distribution of labels is balanced (__25k pos__ and __25k neg__).

In the entire collection, no more than __30 reviews__ are allowed for any given movie because reviews for the same movie tend to have __correlated ratings__. Further, the __train and test__ sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their associated with observed labels. In the labeled __train/test__ sets, a __negative review__ has a __score <= 4__ out of __10__, and a __positive review__ has a __score >= 7__ out of __10__. Thus reviews with more __neutral ratings__ are not included in the __train/test__ sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

<a id="section3"></a>
## 3.  Instantiating an Embedding layer

In [0]:
# Import tensorflow 2.x
# This code block will only work in Google Colab.
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

TensorFlow 2.x selected.


In [0]:
from tensorflow.keras.layers import Embedding
embedding_layer = Embedding(1000, 64)

<a id="section301"></a>
### 3.1 Loading the IMDB data for use with an Embedding layer

Here we first need to __install__ the required version of __numpy__ for __keras__. Because the current numpy version is not __compatible__ with __keras__.

Until then, try __downgrading__ your numpy version to __1.16.2.__ It seems to solve the problem.

If your current version of numpy works properly with the upcoming code, then there's no need to downgrade it.

In [0]:
# !pip install numpy==1.16.1
import numpy as np

In [0]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras import preprocessing

 - We will consider __10000 most common words__

In [0]:
max_features = 10000  

 - We will consider only __first 20 words__ for each __movie review__


In [0]:
maxlen = 20

- Load the data as __lists of integers__, which is already done in keras 

In [0]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

- Transform the lists of integers into a __2D integer tensor__ of shape (__samples, maxlen__)

In [0]:
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)  
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

- Print the __shape__ for both __train__ and __test__ data

In [0]:
print(x_train.shape)

print(x_test.shape)

(25000, 20)
(25000, 20)


<a id="section302"></a>
### 3.2 Let's Train the model

- Import __keras sequential api__ to create models __layer-by-layer__

In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
model = Sequential()

- Specifies the maximum input length to the __Embedding layer__ so you can later flatten the embedded inputs. After the __Embedding layer__, the activations have shape __(samples, maxlen, 8)__


In [0]:
model.add(Embedding(10000, 8, input_length=maxlen))

-  __Flatten__ layer in Keras 

In [0]:
model.add(Flatten())

- Adds the __classifier__ on top

In [0]:
model.add(Dense(1, activation='sigmoid'))

#### What does compile do?

- __Compile__ defines the __loss function__, the optimizer and the metrics. 

In [0]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

- Print the __summary representation__ of your model.

In [0]:
model.summary()

- Train the model on your __custom movie data__.

In [0]:
history = model.fit(x_train, y_train,epochs=10,batch_size=32,validation_split=0.2)

W0827 07:15:19.396104 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0827 07:15:19.415315 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0827 07:15:19.463059 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0827 07:15:19.483021 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

W0827 07:15:19.489022 139633655900032 deprecation.py:323] From /usr/local/lib/python3.6

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
flatten_1 (Flatten)          (None, 160)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________


W0827 07:15:19.725079 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<a id="section4"></a>
## 4. Conclusion

- You get to a __validation accuracy__ of __~76%__, which is pretty good considering that you’reonly looking at the first __20 words__ in every review. 

- But note that merely flattening the __embedded sequences__ and training a __single Dense layer__ on top leads to a model that treats each word in the input sequence separately, without considering __inter-word relationships__ and __sentence structure__ (for example, this model would likely treat both “this movie is a bomb” and “__this movie is the bomb__” as being __negative reviews__). 
-
It’s much better to add __recurrent layers__ or __1D convolutional layers__ on top of the embedded sequences to learn features that take into account each sequence as a whole.