Method: Take an NLP problem and try to solve it.
In that pursuit as a **side-effect** you get word embeddings.

Ref: https://towardsdatascience.com/hands-on-nlp-deep-learning-model-preparation-in-tensorflow-2-x-2e8c9f3c7633

In [2]:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding

reviews = ['nice food',
        'amazing restaurant',
        'too good',
        'just loved it!',
        'will go again',
        'horrible food',
        'never go there',
        'poor service',
        'poor quality',
        'needs improvement']

sentiment = np.array([1,1,1,1,1,0,0,0,0,0]) #target variable, 
#since this is a supervised learning mehtod to provide word embeddings

In [3]:
one_hot("amazing restaurant",30)  #one hot encoding
#out of 30 words, first UID is assined to first word and seconf UID is assigned to second word

[18, 9]

In [4]:
vocab_size = 30
encoded_reviews = [one_hot(e, vocab_size) for e in reviews]  #encoding all the reviews
print(encoded_reviews)


[[2, 7], [18, 9], [27, 11], [20, 2, 28], [12, 23, 7], [26, 7], [12, 23, 20], [13, 23], [13, 20], [17, 28]]


**word vectors(or weights) are multiplied with "encoded_reviews", which gives the result, then loss is calculated and then due to backpropagation the weights or word vectors are balanced.**

## Word-Embeddings
- words are represented by dense vectors where a vector represents the projection of the word into a continuous -vector space.

- The position of a word within the vector space is learned from text and is based on the words that surround the word when it is used.

- The position of a word in the learned vector space is referred to as its embedding.

![Word%20Embeddings.png](attachment:Word%20Embeddings.png)

In [5]:
max_length = 3   #maximum of 3 words in each review
# "post" means padding is added at the end or POST
padded_reviews = pad_sequences(encoded_reviews, maxlen=max_length, padding='post')
print(padded_reviews)

[[ 2  7  0]
 [18  9  0]
 [27 11  0]
 [20  2 28]
 [12 23  7]
 [26  7  0]
 [12 23 20]
 [13 23  0]
 [13 20  0]
 [17 28  0]]


### KERAS EMBEDDING LAYER
It requires that the input data be integer encoded, so that each word is represented by a unique integer. This data preparation step can be performed using the Tokenizer API also provided with Keras.

The Embedding layer is initialized with random weights and will learn an embedding for all of the words in the training dataset. **Cosine similarity measures the similarity between two vectors of an inner product space.**

In [6]:
embeded_vector_size = 4   #vector size of each word, in the example above we can see it has 4 rows

model = Sequential()
model.add(Embedding(vocab_size, embeded_vector_size, input_length=max_length,name="embedding")) #name parameter is used later
model.add(Flatten())   #converts a multi-dimensional array to one long array -> check model.summary()
model.add(Dense(1, activation='sigmoid'))

In [7]:
X = padded_reviews
y = sentiment

Difference Between **Sparse vs Categorical Crossentropy**
ref: https://stackoverflow.com/questions/58565394/what-is-the-difference-between-sparse-categorical-crossentropy-and-categorical-c

In [8]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 3, 4)              120       
_________________________________________________________________
flatten (Flatten)            (None, 12)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 13        
Total params: 133
Trainable params: 133
Non-trainable params: 0
_________________________________________________________________
None


In [26]:
model.fit(X, y, epochs=20, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7fab40d54ed0>

In [27]:
# evaluate the model
loss, accuracy = model.evaluate(X, y)
accuracy



1.0

In [43]:
weights = model.get_layer('embedding').get_weights()[0]
weights

array([[ 4.44393642e-02, -4.77505699e-02, -1.65100209e-02,
         3.01998518e-02],
       [-9.31416824e-03,  4.68207337e-02,  4.93119694e-02,
        -4.42053005e-03],
       [-4.02335636e-02, -4.38875444e-02, -2.57445499e-03,
         2.61601694e-02],
       [ 1.12099275e-02,  4.80826534e-02,  2.72949375e-02,
         3.43712904e-02],
       [ 2.45679170e-04, -2.05160510e-02,  9.22038406e-03,
         7.83105940e-03],
       [-2.46209148e-02,  1.52280219e-02,  5.23287058e-03,
         4.66066264e-02],
       [ 3.47434394e-02, -1.67060979e-02,  4.60567363e-02,
         6.03913143e-03],
       [-4.80580330e-02, -2.57457737e-02, -5.28658554e-03,
         4.06263582e-02],
       [ 8.23104382e-03, -9.40399244e-03, -1.06810108e-02,
         4.43654694e-02],
       [ 2.17006914e-02,  2.84026600e-02, -1.89636480e-02,
         4.58294153e-03],
       [-3.50370184e-02,  9.63791460e-03,  1.70732476e-02,
         4.23573144e-02],
       [ 3.19525860e-02, -1.85184591e-02, -2.33719945e-02,
      

In [50]:
weights[24]  #these are the weights or word vectors of embed_size 4

array([-0.04079238,  0.01222418,  0.02516855,  0.00234531], dtype=float32)

In [51]:
weights[1]

array([-0.00931417,  0.04682073,  0.04931197, -0.00442053], dtype=float32)

In [52]:
weights[16]

array([ 0.02437563, -0.03933533, -0.03430591, -0.01619029], dtype=float32)