# What is a Word Embedding? 

> Word embeddings is a technique where individual words are transformed into a numerical representation of the word (a vector). Where each word is mapped to one vector, this vector is then learned in a way which resembles a neural network. The vectors try to capture various characteristics of that word with regard to the overall text. These characteristics can include the semantic relationship of the word, definitions, context, etc. With these numerical representations, you can do many things like identify similarity or dissimilarity between words.

>> Limitations : However, there are multiple limitations of simple embeddings such as this, as they do not capture characteristics of the word, and they can be quite large depending on the size of the corpus.

# Word2Vec



# Word2Vec Architecture
> The effectiveness of Word2Vec comes from its ability to group together vectors of similar words. Given a large enough dataset, Word2Vec can make strong estimates about a word’s meaning based on their occurrences in the text. These estimates yield word associations with other words in the corpus. For example, words like “King” and “Queen” would be very similar to one another. When conducting algebraic operations on word embeddings you can find a close approximation of word similarities. For example, the 2 dimensional embedding vector of "king" - the 2 dimensional embedding vector of "man" + the 2 dimensional embedding vector of "woman" yielded a vector which is very close to the embedding vector of "queen". Note, that the values below were chosen arbitrarily.

>> There are two main architectures which yield the success of word2vec. \
    1. CBOW architectures and \
    2. Skip Grams .

# CBOW (Continuous Bag of Words)
>This architecture is very similar to a feed forward neural network. \
This model architecture essentially tries to predict a target word from a list of context words. The intuition behind this model is quite simple: given a phrase "Have a great day" , we will choose our target word to be “a” and our context words to be [“have”, “great”, “day”]. What this model will do is take the distributed representations of the context words to try and predict the target word.



<img  src="CBOW.png">

<img src="WordTwoVec.png">

#  Skip-Gram Model
>The skip-gram model is a simple neural network with one hidden layer trained in order to predict the probability of a given word being present when an input word is present. Intuitively, you can imagine the skip-gram model being the opposite of the CBOW model. In this architecture, it takes the current word as an input and tries to accurately predict the words before and after this current word. This model essentially tries to learn and predict the context words around the specified input word. Based on experiments assessing the accuracy of this model it was found that the prediction quality improves given a large range of word vectors, however it also increases the computational complexity. The process can be described visually as seen below.

<img src='word1.png'>

### Gensim
>>Gensim is an open source python library for natural language processing and it was developed and is maintained by the Czech natural language processing researcher Radim Řehůřek. Gensim library will enable us to develop word embeddings by training our own word2vec models on a custom corpus either with CBOW of skip-grams algorithms.

In [1]:
import gensim

In [2]:
from gensim.models import word2vec, KeyedVectors

In [3]:
import gensim.downloader as api
wv = api.load('word2vec-google-news-300')  # google news data set
vec_king = wv['king']

In [4]:
vec_king

array([ 1.25976562e-01,  2.97851562e-02,  8.60595703e-03,  1.39648438e-01,
       -2.56347656e-02, -3.61328125e-02,  1.11816406e-01, -1.98242188e-01,
        5.12695312e-02,  3.63281250e-01, -2.42187500e-01, -3.02734375e-01,
       -1.77734375e-01, -2.49023438e-02, -1.67968750e-01, -1.69921875e-01,
        3.46679688e-02,  5.21850586e-03,  4.63867188e-02,  1.28906250e-01,
        1.36718750e-01,  1.12792969e-01,  5.95703125e-02,  1.36718750e-01,
        1.01074219e-01, -1.76757812e-01, -2.51953125e-01,  5.98144531e-02,
        3.41796875e-01, -3.11279297e-02,  1.04492188e-01,  6.17675781e-02,
        1.24511719e-01,  4.00390625e-01, -3.22265625e-01,  8.39843750e-02,
        3.90625000e-02,  5.85937500e-03,  7.03125000e-02,  1.72851562e-01,
        1.38671875e-01, -2.31445312e-01,  2.83203125e-01,  1.42578125e-01,
        3.41796875e-01, -2.39257812e-02, -1.09863281e-01,  3.32031250e-02,
       -5.46875000e-02,  1.53198242e-02, -1.62109375e-01,  1.58203125e-01,
       -2.59765625e-01,  

In [6]:
type(wv)

gensim.models.keyedvectors.KeyedVectors

In [7]:
vec_king.shape

(300,)

In [8]:
wv

<gensim.models.keyedvectors.KeyedVectors at 0x2511f3e60a0>

In [None]:
wv['man']

array([ 0.32617188,  0.13085938,  0.03466797, -0.08300781,  0.08984375,
       -0.04125977, -0.19824219,  0.00689697,  0.14355469,  0.0019455 ,
        0.02880859, -0.25      , -0.08398438, -0.15136719, -0.10205078,
        0.04077148, -0.09765625,  0.05932617,  0.02978516, -0.10058594,
       -0.13085938,  0.001297  ,  0.02612305, -0.27148438,  0.06396484,
       -0.19140625, -0.078125  ,  0.25976562,  0.375     , -0.04541016,
        0.16210938,  0.13671875, -0.06396484, -0.02062988, -0.09667969,
        0.25390625,  0.24804688, -0.12695312,  0.07177734,  0.3203125 ,
        0.03149414, -0.03857422,  0.21191406, -0.00811768,  0.22265625,
       -0.13476562, -0.07617188,  0.01049805, -0.05175781,  0.03808594,
       -0.13378906,  0.125     ,  0.0559082 , -0.18261719,  0.08154297,
       -0.08447266, -0.07763672, -0.04345703,  0.08105469, -0.01092529,
        0.17480469,  0.30664062, -0.04321289, -0.01416016,  0.09082031,
       -0.00927734, -0.03442383, -0.11523438,  0.12451172, -0.02

In [None]:
wv.most_similar('man') #
# Find the top-N most similar keys.
# Positive keys contribute positively towards the similarity, negative keys negatively.

# This method computes cosine similarity between a simple mean of the projection
# weight vectors of the given keys and the vectors for each key in the model.
# The method corresponds to the `word-analogy` and `distance` scripts in the original
# word2vec implementation.

[('woman', 0.7664012908935547),
 ('boy', 0.6824870109558105),
 ('teenager', 0.6586930155754089),
 ('teenage_girl', 0.6147903800010681),
 ('girl', 0.5921714305877686),
 ('suspected_purse_snatcher', 0.5716364979743958),
 ('robber', 0.5585119128227234),
 ('Robbery_suspect', 0.5584409236907959),
 ('teen_ager', 0.5549196600914001),
 ('men', 0.5489763021469116)]

In [9]:
vec = wv['king']- wv['man'] + wv['woman']
vec

array([ 4.29687500e-02, -1.78222656e-01, -1.29089355e-01,  1.15234375e-01,
        2.68554688e-03, -1.02294922e-01,  1.95800781e-01, -1.79504395e-01,
        1.95312500e-02,  4.09919739e-01, -3.68164062e-01, -3.96484375e-01,
       -1.56738281e-01,  1.46484375e-03, -9.30175781e-02, -1.16455078e-01,
       -5.51757812e-02, -1.07574463e-01,  7.91015625e-02,  1.98974609e-01,
        2.38525391e-01,  6.34002686e-02, -2.17285156e-02,  0.00000000e+00,
        4.72412109e-02, -2.17773438e-01, -3.44726562e-01,  6.37207031e-02,
        3.16406250e-01, -1.97631836e-01,  8.59375000e-02, -8.11767578e-02,
       -3.71093750e-02,  3.15551758e-01, -3.41796875e-01, -4.68750000e-02,
        9.76562500e-02,  8.39843750e-02, -9.71679688e-02,  5.17578125e-02,
       -5.00488281e-02, -2.20947266e-01,  2.29492188e-01,  1.26403809e-01,
        2.49023438e-01,  2.09960938e-02, -1.09863281e-01,  5.81054688e-02,
       -3.35693359e-02,  1.29577637e-01,  2.41699219e-02,  3.48129272e-02,
       -2.60009766e-01,  

In [None]:
wv.most_similar('cricket')  # semmantic meaning is captured

[('cricketing', 0.8372225165367126),
 ('cricketers', 0.8165745735168457),
 ('Test_cricket', 0.8094818592071533),
 ('Twenty##_cricket', 0.8068488240242004),
 ('Twenty##', 0.7624266147613525),
 ('Cricket', 0.7541396617889404),
 ('cricketer', 0.7372579574584961),
 ('twenty##', 0.7316356897354126),
 ('T##_cricket', 0.7304614782333374),
 ('West_Indies_cricket', 0.698798656463623)]

# Cosine similarity :
> Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.

>The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

<img src='cos_sim.png'>

### Compute cosine similarity between two keys. 

In [12]:
wv.similarity('cricket','hocky') # how much similar is cricket to hocky

0.20070238

In [None]:
conclusion based on Cosine similarity :
    1. the value of cosine more towards the 1 , that menas two keys are very much similar.
    2. the value of cosine more towards the 0, that mens tow keys are not very much similar.
    
    eg. above example, the two keys 'cricket' and 'hocky' are 20% similiar to each other.

In [11]:
#### calculate distance between  tow keys.
dist = 1 - 0.20070238
dist

0.7992976199999999

In [None]:
wv.similarity('cricket','sports') # cricket and sports are 40% similar to each other

0.40087253

In [None]:
wv.similarity('cricket','football') # cricket and football are 45% similar to each other

0.45974636

In [None]:
vec = wv['king'] - wv['man'] + wv['women']   # simple operations

In [None]:
vec  

array([-0.33984375,  0.06298828, -0.00994873,  0.3305664 ,  0.10327148,
       -0.25854492,  0.19677734, -0.32476807, -0.15917969,  0.45752716,
       -0.39208984, -0.22460938,  0.02929688,  0.31982422, -0.0958252 ,
       -0.05932617,  0.01855469, -0.03945923,  0.01101685,  0.33984375,
        0.10351562,  0.14396667, -0.07641602,  0.06640625, -0.10839844,
       -0.1953125 , -0.3564453 , -0.02124023, -0.16503906, -0.16247559,
       -0.05519104, -0.1003418 ,  0.01464844,  0.23449707, -0.26611328,
       -0.07080078, -0.26904297, -0.00292969, -0.06738281, -0.02050781,
        0.04418945, -0.09326172,  0.12304688,  0.10626221,  0.10290527,
        0.01660156, -0.10791016, -0.0065918 ,  0.17578125,  0.10028076,
        0.22363281, -0.05761719, -0.31743622,  0.3922119 , -0.35498047,
       -0.23643494, -0.01074219, -0.01334572, -0.15283203, -0.00793457,
        0.08203125,  0.09985352,  0.003479  ,  0.11608887,  0.14550781,
       -0.125     , -0.11254883,  0.2548828 , -0.04345703,  0.34

In [15]:
wv.most_similar([vec])

[('king', 0.8449392318725586),
 ('queen', 0.7300517559051514),
 ('monarch', 0.645466148853302),
 ('princess', 0.6156251430511475),
 ('crown_prince', 0.5818676352500916),
 ('prince', 0.5777117609977722),
 ('kings', 0.5613663792610168),
 ('sultan', 0.5376775860786438),
 ('Queen_Consort', 0.5344247817993164),
 ('queens', 0.5289887189865112)]

### Thank you for Reading :

##### addition resource and the resources which I read :
https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa \
https://arxiv.org/pdf/1411.2738.pdf  \
https://towardsdatascience.com/a-beginners-guide-to-word-embedding-with-gensim-word2vec-model-5970fa56cc92 \
https://www.machinelearningplus.com/nlp/cosine-similarity/

                                                                            Best regards,
                                                                            Pankaj Kumar Barman, MSc.CSMI
                                                                            Ramakrishna Mission Vidyamandira, howrah, belur
                                                                                                