<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="240" height="100" /></center>

<center><h1>Embedding Layer in Neural Network</center> 

---
# **Table of Contents**
---

**1.** [**What is word embedding?**](#section1)<br>
  - **1.1** [**Why You Need to Start Using Embedding Layers**](#section101)
  - **1.2** [**How to obtaion word embedding**](#section102)

**2.** [**Learning Word Embeddings for IMDB Movie Review Data**](#section2)<br>
  - **2.1** [**Imdb Movie Review data**](#section201)
    - **2.1.1** [**Overview**](#section20101)
    - **2.1.2** [**Dataset**](#section20102)

**3.** [**Instantiating an Embedding layer**](#section3)<br>
  - **3.1** [**Loading the IMDB data for use with an Embedding layer**](#section301)
  - **3.2** [**Let's Train the model**](#section302)
     
**4.** [**Conclusion**](#section4)<br>

---
<a name = Section1></a>
# **1. What is word embedding?**
---

- A **word embedding** is a learned representation for text where words that have the same meaning have a similar **representation**.

- It provide a **dense representation** of words and their relative meanings.

- Because of word embedding words of similar **context** are close to **each** **other**.

<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/Word-Vectors.png" width="840" height="300" /></center>


<a name = Section11></a>
### **1.1 Why You Need to Start Using Embedding Layers?**

- **One-hot encoded** vectors are **high-dimensional** and **sparse**.

- Let’s **assume** that we are doing Natural Language Processing (NLP) and have a dictionary of **2000** words. 

- This means that, when using **one-hot encoding**, each word will be represented by a **vector** containing **2000 integers**. And **1999** of these integers are **zeros**.

- In a big dataset this **approach** is not **computationally** efficient.

- The vectors of each **embedding** get updated while **training** the neural network. 

- If you have seen the **image** at the top of this post you can see how **similarities** between words can be found in a **multi-dimensional** space. 

- This allows us to **visualize** relationships between words, but also between everything that can be **turned** into a **vector** through an **embedding layer**.

<a name = Section12></a>
### **1.2 How to obtain word embedding?**

- Learn word embeddings **jointly** with the main task you care about(such as document **classification** or **sentiment prediction**). 

- In this setup, you start with **random** word vectors and then **learn** word vector in the same way you learn the **weight** of a **neural network**.

- Load into your model **word embeddings** that were **pre-computed** using different machine-learning task tha the one you're trying to solve. These are called **pretrained** word embeddings

- Two **popular** word embeddings are
    - **GloVe** 
    
    - **fastText**

---
<a name = Section2></a>
# **2. Learning Word Embeddings for IMDB Movie Review Data**
---

<a id="section201"></a>
### **2.1 Imdb Movie Review data**

<a id="section20101"></a>
#### **2.1.1 Overview**

- This dataset contains movie reviews along with their associated binary **sentiment polarity labels**. 

- It is intended to serve as a benchmark for **sentiment classification**. 

- This **document** outlines how the dataset was gathered, and how to use the files **provided**.



<a id="section20102"></a>
#### **2.1.2 Dataset**

- The core dataset contains **50,000** reviews split evenly into **25k** train and 25k test sets. 

- The overall **distribution** of labels is balanced (**25k pos** and **25k neg**).

- In the entire **collection**, no more than **30** reviews are allowed for any given movie because **reviews** for the same movie tend to have **correlated ratings**.

- Further, the train and test sets contain a **disjoint** set of **movies**, so no significant **performance** is obtained by **memorizing** movie-unique terms and their **associated** with **observed** labels.

- In the labeled **train/test** sets, a **negative** review has a **score <= 4** out of 10, and a **positive** review has a **score >= 7** out of 10. 

- Thus reviews with more **neutral ratings** are not included in the **train/test** sets.

- In the **unsupervised** set, reviews of any **rating** are included and there are an even number of **reviews** > 5 and <= 5.

---
<a name = Section3></a>
# **3. Instantiating an Embedding layer**
---

In [None]:
# Import tensorflow 2.x
# This code block will only work in Google Colab.
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

from tensorflow.keras.layers import Embedding
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense

TensorFlow 2.x selected.


In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras import preprocessing
embedding_layer = Embedding(1000, 64)

- Here we first need to install the **required** version of numpy for keras.

- Because the current numpy version is not **compatible** with keras.

- Until then, try **downgrading** your numpy version to **1.16.2.** It seems to solve the **problem**.

- If your **current** version of numpy works **properly** with the upcoming code, then there's no need to **downgrade** it.

In [None]:
# !pip install numpy==1.16.1
import numpy as np

<a id="section301"></a>
### **3.1 Loading the IMDB data for use with an Embedding layer**

 - We will consider **10000** most common words.

In [None]:
max_features = 10000  

 - We will consider only first **20** words for each movie review.


In [None]:
maxlen = 20

- Load the data as **lists of integers**, which is already done in keras 

In [None]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

- Transform the lists of integers into a **2D integer** tensor of shape (samples, maxlen)

In [None]:
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)  
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

- Print the **shape** for both **train** and **test** data

In [None]:
print(x_train.shape)

print(x_test.shape)

(25000, 20)
(25000, 20)


<a id="section302"></a>
### **3.2 Let's Train the model**

- Import **keras sequential api** to create models **layer-by-layer**.

In [None]:
model = Sequential()

- Specifies the **maximum** input length to the **Embedding layer** so you can later flatten the **embedded** inputs.

- After the **Embedding layer**, the activations have **shape** (samples, maxlen, 8).

In [None]:
model.add(Embedding(10000, 8, input_length=maxlen))

-  **Flatten** layer in Keras 

In [None]:
model.add(Flatten())

- Adds the **classifier** on top

In [None]:
model.add(Dense(1, activation='sigmoid'))

#### What does compile do?

- **Compile** defines the **loss function**, the **optimizer** and the metrics. 

In [None]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

In [None]:
model.summary() #Print the summary representation of your model.

- Training the model on the **custom movie data**.

In [None]:
history = model.fit(x_train, y_train,epochs=10,batch_size=32,validation_split=0.2)

W0827 07:15:19.396104 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0827 07:15:19.415315 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0827 07:15:19.463059 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0827 07:15:19.483021 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

W0827 07:15:19.489022 139633655900032 deprecation.py:323] From /usr/local/lib/python3.6

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
flatten_1 (Flatten)          (None, 160)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________


W0827 07:15:19.725079 139633655900032 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


---
<a name = Section4></a>
# **4. Conclusion**
---

- You get to a validation accuracy of **~76%**, which is pretty good considering that you’re only looking at the first **20** words in every review. 

- But note that merely **flattening** the **embedded sequences** and training a **single Dense layer** on top leads to a model that **treats** each word in the input sequence **separately**, without considering **inter-word** relationships and **sentence structure** (for example, this model would likely treat both **`this movie is a bomb`** and **`this movie is the bomb`** as **being negative reviews**.
.
- It’s much better to add **recurrent layers** or **1D convolutional layers** on top of the **embedded** sequences to **learn** features that take into account each **sequence** as a whole.