# Word Embedding

1. Word embedding is a core concept in Natural Language Processing (NLP) where words are represented as numerical vectors.
2. This technique is crucial for NLP tasks, enabling machines to understand word meanings and relationships.
Significance of Word Embedding:

3. Word embedding captures semantic relationships, making it a key component in NLP applications like sentiment analysis and machine translation.
4. It reduces dimensionality, improving model performance and facilitating the processing of large text corpora.

### Import relevant libraries

In [38]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding

### Data

In [39]:
reviews = {
    'very nice food',
    'amazing restaurant',
    'very bad',
    'too good',
    'just loved it',
    'will go again',
    'horrible food',
    'never go there',
    'poor service',
    'poor quality',
    'very good',
    'needs improvement'
}
sentiments = np.array([1,1,0,1,1,1,0,0,0,0,1,0])

### convert into one hot vector

1. 'encoded_review' will contain lists of integers, where each integer corresponds to a word in the input text. 
2. The vocabulary size is set to 30, and the function assigns unique integers to the words in the input text within that vocabulary size constraint. 
3. The specific integer assigned to each word is determined by the hashing function used by one_hot

In [40]:
from tensorflow.keras.preprocessing.text import one_hot

vocabulary_size = 30
encoded_review = [one_hot(sentence, vocabulary_size) for sentence in reviews]
encoded_review

[[20, 9, 27],
 [15, 5],
 [14, 19, 28],
 [14, 16],
 [24, 15],
 [8, 20, 24],
 [1, 24],
 [17, 19, 26],
 [8, 7],
 [12, 14],
 [12, 29],
 [8, 16]]

### Padding
1. Some sentences are 3 word long and some ar 4.
2. So we need padding to have uniform size

In [41]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_length = 3
padded_reviews = pad_sequences(encoded_review, maxlen = max_length, padding = 'post')
padded_reviews

array([[20,  9, 27],
       [15,  5,  0],
       [14, 19, 28],
       [14, 16,  0],
       [24, 15,  0],
       [ 8, 20, 24],
       [ 1, 24,  0],
       [17, 19, 26],
       [ 8,  7,  0],
       [12, 14,  0],
       [12, 29,  0],
       [ 8, 16,  0]])

### Model


In [42]:
embedded_vector_size = 4
train = padded_reviews
targets = sentiments

model = tf.keras.Sequential([
    Embedding(vocabulary_size, embedded_vector_size, input_length = max_length, name = 'embedding'),
    Flatten(),
    Dense(1, activation = 'sigmoid')
])

model.compile('adam', loss = 'binary_crossentropy', metrics =['accuracy'])
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 3, 4)              120       
                                                                 
 flatten_3 (Flatten)         (None, 12)                0         
                                                                 
 dense_3 (Dense)             (None, 1)                 13        
                                                                 
Total params: 133
Trainable params: 133
Non-trainable params: 0
_________________________________________________________________


In [43]:
model.evaluate(train,targets)



[0.6918697357177734, 0.5833333134651184]