***Cosine Similarity and Cosine Distance:***<br>
Cosine similarity and Cosine Distance is widely used in Recommendation systems. Suppose you have two words p and q. If the distance between the two words is increased then the similarity between them is decreased. Hence, **Cosine Similarity is Inversely Proportional to Cosine Distance.***
### [1-(Cosine Similarity)] = (Cosine Distance)<br>
*Where, Cosine Similarity = Cos(angle between the vectors p and q)*<br>
Hence, when the angle between the words is 90 deg, there is no similarity between the words. If the angle is 0 degrees, then the words are similar to a maximum extent. If the words are contrasting, then the value is -1 (180 degrees). Hence, the values of the Cosine Similarity lies between -1 and +1.<br>
Ex: Consider that we are building a Movie Recommendation System. We have watched one movie and based on that our model will recommend new movies to us. If our first movie was Avengers (Genre: Action) then the feature for this will be leaning towards action. Now, consider our new movie to be of comedy genre. This movie will be leaning towards comedy. Since, the angle between the action and comedy feature is 90 degrees, the second movie would not be recommended to us. If our second movie was also an action movie, it will also lean towards the action genre and hence, the angle would be zero and the cosine similarity will be 1. Hence, this movie will be recommended to us.<br>
<img src='b.jpg'>

***Word Embedding:***<br>
To overcome the disadvantages of Bag Of Words and TF-IDF, we have come over a concept called Word Embedding.Word Embedding can be of two types:<br>
1. Word2Vec
2. Glove

***Word Embedding:***<br>
You know that one cannot show the similarity between the words when one uses BOW (Count Vectorizer) (similar to one-hot encoding and hence it is called a sparse matrix) and TF-IDF to convert the words into vectors. There is also a problem of increase in dimensionality. Suppose that there are about 10,000 words (**here, vocabulary size=10,000**) then your dimensions will be 10,000 which is very large and hence training process will become difficult. To overcome these issues, we use Word Embedding technique.<br>
There is a term in Word Embedding called as Feature Representation. In Feature Representation, we create the vectors of the words based on the features. Words that are similar w.r.t the feature will have similar vector coefficients.
<br><img src='a.jpg'><br>
In the above image, you can see that Boy is given '-1' and girl is given '+1' w.r.t the feature 'Gender'. This is because boy and girl and contrasting words when it is w.r.t gender. Similarly, King and Queen are given similar values when the feature is considered as Royal. Consider that we have around 300 features (Generally, the number of words is greater than the number of features) and hence the dimensions are reduced with a  certain extent. Hence, Word Embedding's matrix is **Dense with Low Dimensions rather than sparse with high dimensions.** Based on the features, the vectors are made.

When you convert these 300 dimensions into 2 dimensions (say, converting them into a graph to represent the vecors) you will observe that the words that are similar will have have their vectors near to each other and hence can be grouped accordingly.

**While embedding the words, parameters like Vocabulary size is important. While using the keras one_hot method, the first step is to convert the words into one_hot encoded matrix according to the vocabulary size. one_hot method will assign the word an index and then that index will have the value as 1 and the others will be zero. The size of the matrix will be that of the vocabulary size. Hence, the output for a sentence will be the indices of the particular words of that sentence. Now, we pass the output into a Embedding layer. Here, we specify a parameter which will specify the number of features you want to take under consideration (known as dimensions). For example: Gender, age,food,.. were the features in our previous example and the number of features were 300. Hence the embedding layer is creating a feature representation of our words. Both, the embedding layer and one_hot method is present in Keras.**

***Steps to follow to create Word Embeddings:***<br>
1. Decide the Vocabulary size
2. Convert the sentences to one_hot matrix using the one_hot method from keras.preprocessing.text
3. Now we will use padding so that our sentences are of equal length. If you use the parameter is 'pre' then the zeroes will be added in the front. We can use pad_sequences from keras.preprocessing.sequence. You add padding since your one_hot matrix size will be according to the number of words in each sentence. Though the index of each word will be based on the size of the vocabulary dictionary.
4. Now create an Embedding layer and set the number of features you want (dimensions). Then use the predict method to convert our padded sentences into featurized vectors.

In [1]:
### sentences
sent=[  'the glass of milk',
     'the glass of juice',
     'the cup of tea',
    'I am a good boy',
     'I am a good developer',
     'understand the meaning of words',
     'your videos are good',]

In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import one_hot

In [3]:
# Our first step is to initialize the vocabulary size. Vocabulary size is basically the size of a dictionary.
voc_size=10000

In [4]:
one_hot('I am Sagar',voc_size)

[3288, 890, 5095]

In [5]:
# Hence, you pass the text and the vocabulary size to the one_hot method.
# After initializing the vocabulary size, our next step is to convert the sentences into one_hot matrix with indices as output
oh_representation=[one_hot(sentence,voc_size) for sentence in sent]
oh_representation

[[3991, 3707, 7145, 4038],
 [3991, 3707, 7145, 7488],
 [3991, 110, 7145, 9504],
 [3288, 890, 5514, 4039, 6797],
 [3288, 890, 5514, 4039, 2781],
 [4797, 3991, 1354, 7145, 131],
 [495, 6142, 9481, 4039]]

In [6]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [7]:
import numpy as np

In [8]:
# We give the maximum length of the sentence while padding
sent_length=8
embeded_docs=pad_sequences(oh_representation,padding='pre',maxlen=sent_length)
print(embeded_docs)

[[   0    0    0    0 3991 3707 7145 4038]
 [   0    0    0    0 3991 3707 7145 7488]
 [   0    0    0    0 3991  110 7145 9504]
 [   0    0    0 3288  890 5514 4039 6797]
 [   0    0    0 3288  890 5514 4039 2781]
 [   0    0    0 4797 3991 1354 7145  131]
 [   0    0    0    0  495 6142 9481 4039]]


In [9]:
# You can see that all the zeroes are now at the starting of the arrays and the size is now 8.

In [10]:
dim=10
model=tf.keras.Sequential([
    tf.keras.layers.Embedding(voc_size,dim,input_length=sent_length,)
])
model.compile('adam','mse')

In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 8, 10)             100000    
Total params: 100,000
Trainable params: 100,000
Non-trainable params: 0
_________________________________________________________________


In [12]:
feature_represented_vectors=model.predict(embeded_docs)

In [13]:
feature_represented_vectors

array([[[ 1.9293156e-02,  3.8332593e-02,  2.2653710e-02,  2.6543166e-02,
         -4.9123466e-02,  2.0348396e-02, -2.6128352e-02,  3.3831704e-02,
          2.9787149e-02,  4.4139709e-02],
        [ 1.9293156e-02,  3.8332593e-02,  2.2653710e-02,  2.6543166e-02,
         -4.9123466e-02,  2.0348396e-02, -2.6128352e-02,  3.3831704e-02,
          2.9787149e-02,  4.4139709e-02],
        [ 1.9293156e-02,  3.8332593e-02,  2.2653710e-02,  2.6543166e-02,
         -4.9123466e-02,  2.0348396e-02, -2.6128352e-02,  3.3831704e-02,
          2.9787149e-02,  4.4139709e-02],
        [ 1.9293156e-02,  3.8332593e-02,  2.2653710e-02,  2.6543166e-02,
         -4.9123466e-02,  2.0348396e-02, -2.6128352e-02,  3.3831704e-02,
          2.9787149e-02,  4.4139709e-02],
        [ 1.0768436e-02, -3.4895707e-02,  1.4324192e-02,  3.4971882e-02,
         -3.1082774e-02,  1.1571575e-02, -3.3895634e-02, -8.2213059e-03,
          4.5744363e-02,  1.9421551e-02],
        [ 2.4482731e-02, -2.4074649e-02, -4.4010174e-02, -2.

In [15]:
embeded_docs[0]

array([   0,    0,    0,    0, 3991, 3707, 7145, 4038])

In [14]:
feature_represented_vectors[0]

array([[ 0.01929316,  0.03833259,  0.02265371,  0.02654317, -0.04912347,
         0.0203484 , -0.02612835,  0.0338317 ,  0.02978715,  0.04413971],
       [ 0.01929316,  0.03833259,  0.02265371,  0.02654317, -0.04912347,
         0.0203484 , -0.02612835,  0.0338317 ,  0.02978715,  0.04413971],
       [ 0.01929316,  0.03833259,  0.02265371,  0.02654317, -0.04912347,
         0.0203484 , -0.02612835,  0.0338317 ,  0.02978715,  0.04413971],
       [ 0.01929316,  0.03833259,  0.02265371,  0.02654317, -0.04912347,
         0.0203484 , -0.02612835,  0.0338317 ,  0.02978715,  0.04413971],
       [ 0.01076844, -0.03489571,  0.01432419,  0.03497188, -0.03108277,
         0.01157157, -0.03389563, -0.00822131,  0.04574436,  0.01942155],
       [ 0.02448273, -0.02407465, -0.04401017, -0.02465328, -0.01108985,
        -0.01455438,  0.02049061, -0.00998353, -0.02761997, -0.02756898],
       [ 0.03553217, -0.00067854, -0.00828873,  0.04081236,  0.02427158,
        -0.04688884,  0.02112849, -0.00976638

In [16]:
# You can see that each word is converted into 10 vectors (showing the similarity between the words and the features).

In [17]:
# Since the total number of words are 8, we have a total of 8 arrays for each sentence.

In [18]:
feature_represented_vectors.shape

(7, 8, 10)

In [19]:
# 7: sentences.....8: Words in a sentence.......10: features

[link](https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/)