## <b style="color:green">Word2Vec</b>
- ### <b style="color:red">Word Embeddings</b>
  - In natural language processing, word embedding is a term used for the representation of words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.
- ### <b style="color:red">Types of Word Embedding</b>
  1. __Frequency Based__
     - Bag of Word
     - TF-IDF
     - Glove : Global Vector (Matrix Factorization)
  2. __Prediction Based__
     - Word2Vec 

### <b style="color:blue">Word2Vec</b>
- Created by Google in 2013.
- It is a word embedding technique. Given word convert into collection of vector.
- **Advantage :**
  - You can find the semantic meaning of words. eg : happy == joy
  - Dimension of vector will be smaller than other vectorization techniques. Dense Vector.
  - Low sparsity, no overfitting.
- Deep learning technique.
- Two way for Word2Vec
  1. Pre-trained Model
  2. Self- Trained
- **Pre-Trained Model :**  \
  We will use the pre-trained weights of `word2vec` that was trained on __Google News__ Corpus containing 3 billion words. This model consists of 300-dimensional vectors for 3 million words and phrases.  \
  We use `gensim` library.

In [1]:
import gensim
from gensim.models import Word2Vec, KeyedVectors

In [2]:
!pip install wget



In [3]:
import gensim.downloader as api

# Download the Word2Vec model
model = api.load("word2vec-google-news-300")


In [4]:
model['man'].shape

(300,)

In [5]:
model['man']

array([ 0.32617188,  0.13085938,  0.03466797, -0.08300781,  0.08984375,
       -0.04125977, -0.19824219,  0.00689697,  0.14355469,  0.0019455 ,
        0.02880859, -0.25      , -0.08398438, -0.15136719, -0.10205078,
        0.04077148, -0.09765625,  0.05932617,  0.02978516, -0.10058594,
       -0.13085938,  0.001297  ,  0.02612305, -0.27148438,  0.06396484,
       -0.19140625, -0.078125  ,  0.25976562,  0.375     , -0.04541016,
        0.16210938,  0.13671875, -0.06396484, -0.02062988, -0.09667969,
        0.25390625,  0.24804688, -0.12695312,  0.07177734,  0.3203125 ,
        0.03149414, -0.03857422,  0.21191406, -0.00811768,  0.22265625,
       -0.13476562, -0.07617188,  0.01049805, -0.05175781,  0.03808594,
       -0.13378906,  0.125     ,  0.0559082 , -0.18261719,  0.08154297,
       -0.08447266, -0.07763672, -0.04345703,  0.08105469, -0.01092529,
        0.17480469,  0.30664062, -0.04321289, -0.01416016,  0.09082031,
       -0.00927734, -0.03442383, -0.11523438,  0.12451172, -0.02

In [6]:
model['bottle']

array([-0.02941895,  0.11767578, -0.15039062,  0.00192261, -0.11230469,
        0.30273438,  0.38476562, -0.2734375 ,  0.23828125,  0.29492188,
       -0.08056641, -0.48242188,  0.05664062, -0.01623535, -0.31054688,
        0.14941406, -0.16308594,  0.26757812,  0.09521484, -0.109375  ,
        0.24414062,  0.09228516, -0.02404785, -0.15136719, -0.13769531,
        0.18066406,  0.02270508,  0.26171875,  0.11230469,  0.12695312,
        0.12060547, -0.00159454, -0.24511719,  0.04833984, -0.08935547,
        0.00787354,  0.14257812, -0.04736328, -0.15039062, -0.00081253,
       -0.33984375,  0.109375  ,  0.34570312, -0.11669922,  0.10253906,
       -0.15234375, -0.02868652,  0.07324219,  0.11669922, -0.25390625,
       -0.38867188, -0.10693359,  0.02441406, -0.31054688,  0.0534668 ,
       -0.07275391,  0.14453125,  0.15332031, -0.06225586, -0.07177734,
       -0.27929688,  0.05224609,  0.13964844, -0.02282715, -0.08642578,
       -0.11572266,  0.05932617, -0.3359375 ,  0.34179688, -0.03

In [7]:
# vector representation of cricket word in 300 dimension
model['cricket']

array([-3.67187500e-01, -1.21582031e-01,  2.85156250e-01,  8.15429688e-02,
        3.19824219e-02, -3.19824219e-02,  1.34765625e-01, -2.73437500e-01,
        9.46044922e-03, -1.07421875e-01,  2.48046875e-01, -6.05468750e-01,
        5.02929688e-02,  2.98828125e-01,  9.57031250e-02,  1.39648438e-01,
       -5.41992188e-02,  2.91015625e-01,  2.85156250e-01,  1.51367188e-01,
       -2.89062500e-01, -3.46679688e-02,  1.81884766e-02, -3.92578125e-01,
        2.46093750e-01,  2.51953125e-01, -9.86328125e-02,  3.22265625e-01,
        4.49218750e-01, -1.36718750e-01, -2.34375000e-01,  4.12597656e-02,
       -2.15820312e-01,  1.69921875e-01,  2.56347656e-02,  1.50146484e-02,
       -3.75976562e-02,  6.95800781e-03,  4.00390625e-01,  2.09960938e-01,
        1.17675781e-01, -4.19921875e-02,  2.34375000e-01,  2.03125000e-01,
       -1.86523438e-01, -2.46093750e-01,  3.12500000e-01, -2.59765625e-01,
       -1.06933594e-01,  1.04003906e-01, -1.79687500e-01,  5.71289062e-02,
       -7.41577148e-03, -

- 3.19824219e-02 >--->> Vector >--->>> 30 Lakh words >--->>> vector representation >--->>> each vector construct using 300 numbers >--->>> dimension of each word in 300.

In [8]:
# similar words to the man in dataset
model.most_similar('man')

[('woman', 0.7664011716842651),
 ('boy', 0.6824870109558105),
 ('teenager', 0.6586930155754089),
 ('teenage_girl', 0.6147903203964233),
 ('girl', 0.5921714305877686),
 ('suspected_purse_snatcher', 0.571636438369751),
 ('robber', 0.5585119128227234),
 ('Robbery_suspect', 0.5584409832954407),
 ('teen_ager', 0.5549196600914001),
 ('men', 0.5489763021469116)]

In [9]:
model.most_similar('cricket')

[('cricketing', 0.8372224569320679),
 ('cricketers', 0.8165745735168457),
 ('Test_cricket', 0.8094818592071533),
 ('Twenty##_cricket', 0.8068488240242004),
 ('Twenty##', 0.7624266147613525),
 ('Cricket', 0.75413978099823),
 ('cricketer', 0.7372578382492065),
 ('twenty##', 0.7316356301307678),
 ('T##_cricket', 0.7304613590240479),
 ('West_Indies_cricket', 0.6987985968589783)]

In [10]:
model.most_similar('facebook')

[('Facebook', 0.7563533186912537),
 ('FaceBook', 0.7076998949050903),
 ('twitter', 0.6988552212715149),
 ('myspace', 0.6941817998886108),
 ('Twitter', 0.664244532585144),
 ('twitter_facebook', 0.6572229266166687),
 ('Facebook.com', 0.6529868841171265),
 ('myspace_facebook', 0.6370643973350525),
 ('facebook_twitter', 0.6367618441581726),
 ('linkedin', 0.6356593370437622)]

In [11]:
# how much both word are similar to each other >--->>> cos(theta)
model.similarity('man', 'women')

0.2883053

In [12]:
# get odd one
model.doesnt_match(['python', 'java', 'monkey'])

'java'

In [13]:
# get odd one
model.doesnt_match(['c++', 'java', 'php', 'monkey'])

'monkey'

In [14]:
# arithematic operation with words
vec = model['king'] - model['man'] + model['women']
model.most_similar([vec])

[('king', 0.6478991508483887),
 ('queen', 0.5354938507080078),
 ('women', 0.52336585521698),
 ('kings', 0.5162314176559448),
 ('queens', 0.499536395072937),
 ('kumaris', 0.4923847019672394),
 ('princes', 0.46233269572257996),
 ('monarch', 0.45280295610427856),
 ('monarchy', 0.4293173551559448),
 ('kings_princes', 0.42342403531074524)]

In [15]:
vec = model['INR'] - model['India'] + model['England']
model.most_similar([vec])

[('INR', 0.6442340612411499),
 ('GBP', 0.5040826797485352),
 ('£_##.###m', 0.45408377051353455),
 ('England', 0.44649267196655273),
 ('£', 0.43341001868247986),
 ('Â_£', 0.4307198226451874),
 ('stg###', 0.4299262464046478),
 ('£_#.##m', 0.42561304569244385),
 ('Pounds_Sterling', 0.42512622475624084),
 ('GBP##', 0.42464494705200195)]

### <b style="color:blue">Intution</b>
- Words : King, Queen, Man, Women, Monkey   \
  Features : gender, wealth, power, weight, speak   \
  Features decides by programmer.  
  <pre>
    ______________________________________________________________ 
    | <b>features</b> |  King  |  Queen  |   Man   |  Women  |  Monkey  |
    |__________|________|_________|_________|_________|__________|
    | gender   |   1    |    0    |    1    |    0    |    1     |
    |__________|________|_________|_________|_________|__________|
    | wealth   |   1    |    1    |   0.3   |   0.3   |    0     |
    |__________|________|_________|_________|_________|__________|
    | power    |   1    |   0.7   |   0.2   |   0.2   |    0     |
    |__________|________|_________|_________|_________|__________|
    | weight   |  0.8   |   0.4   |   0.6   |   0.5   |   0.3    |
    |__________|________|_________|_________|_________|__________|
    | speak    |   1    |    1    |    1    |    1    |    0     |
    |__________|________|_________|_________|_________|__________|
  </pre>   
- King = [1, 1, 1, 0.8, 1]   \
  Monkey = [1, 0, 0, 0.3, 0]  \
  Man = [1, 0.3, 0.2, 0.6, 1]
- King - Man + Women   
  <pre>
      _____________________________________________
      |           |  King - Man + Women |    =    |          
      |___________|_____________________|_________|
      |  gender   |    1 - 1 + 0        |    0    |
      |___________|_____________________|_________|
      |  wealth   |    1 - 0.3 + 0.3    |    1    |
      |___________|_____________________|_________|
      |  power    |    1 - 0.2 + 0.2    |    1    |
      |___________|_____________________|_________|
      |  weight   |    0.8 - 0.6 + 0.5  |   0.7   |
      |___________|_____________________|_________|
      |  speak    |    1 - 1 + 1        |    1    |
      |___________|_____________________|_________|   
  </pre>  
  Queen = [0, 1, 0.7, 0.4, 1]   \
  King - Man + Women = [0, 1, 1, 0.7, 1]   \
  3 Similar and 2 unsimilar
- We use __Nural Network__ to create the __Features__.  \
  Feature will not be like _'gender'_, _'power'_ and _'speak'_ but it will be like _'f1'_, _'f2'_, _'f2'_ etc.
- How the features are decided.  \
  The underlying assmption of `Word2Vec` is that two words sharing similar contexts also share a similar meaning and consequently a similar vector representation from the model.


### <b style="color:blue">Types of Word2Vec</b>
1. __CBow__ : Continuous Bag of Word
2. __Strip-gram__
- Both are Neural Network. Shallow neural network.

### **CBow** : Continuous Bag of Word
- Dummy(Fake) Problem >---> Solve >---> Get Vector in Byproduct
- Convert every word to OHE.
- Decide the window size.
- Example : \
  statement = watch campusx for data science     \
  corpus = [watch, campusx, for, data, science]  \
  vocabulary = [watch, campusx, for, data, science] 
  <pre>
      <b>OHE</b> = watch : [1,0,0,0,0]
            campusx : [0,1,0,0,0]
            for : [0,0,1,0,0]
            data : [0,0,0,1,0]
            science : [0,0,0,0,1]
      <b>Size of Window</b> = 3, stride = 1
         [watch campusx for]
      <b>context word</b> = watch, for
      <b>target word</b> = campusx
      <b>Training Data : </b>
                          _______________________________
                         |      X          |      Y      |
                         |_________________|_____________|
                         |  watch, for     |  campusx    |
                         |_________________|_____________|
                         |  campusx, data  |     for     |
                         |_________________|_____________|
                         |  for, science   |     data    |
                         |_________________|_____________|

                          _____________________________________________________________
                         |                  X                    |          Y          |
                         |_______________________________________|_____________________|
                         | watch[1,0,0,0,0], for[0,0,1,0,0]      | campusx[0,1,0,0,0]  |
                         |_______________________________________|_____________________|
                         |  campusx[0,1,0,0,0], data[0,0,0,1,0]  |     for[0,0,1,0,0]  |
                         |_______________________________________|_____________________|
                         |  for[0,0,1,0,0], science[0,0,0,0,1]   |     data[0,0,0,1,0] |
                         |_______________________________________|_____________________|
  </pre>
- Architechture of __CBow__. 
  <pre>
      context
        ___
    w1 |___|
    w2 |___| watch [1,0,0,0,0]
    w3 |___|\
    w4 |___| \                        target  
    w5 |___|  \       window              ___
             w \ b     ___               |___|
          (5x3) \   w1|___|  w, b (3x5)  |___| predicted [0, 0.3, 0.1, 0, 0] >----->>> Calculate Loss
                 \  w2|___| -----------> |___| campusx [0, 1, 0, 0, 0]
                 /  w3|___|  embedding   |___| 
                /                        |___|
        ___    /
    w1 |___|  /  w, b (5x3)
    w2 |___| /
    w3 |___|/
    w4 |___| for [0,0,1,0,0]
    w5 |___|
   >-------------------forward propagation--------------------->
   <---------------------back propagation----------------------<
  </pre>
- Vector representation of any word like `watch = [w1, w2, w3]` at the position of embedding.
- It is a fully connected nural netwok.
- For starting take random w(weight).
- Calculate the __Loss__ and do ___backpropagation___ and ___forward propagation___ with new updated w(weight).


### **Skip-gram**
- Statement = watch campusx for data science
- Convert every word to OHE.
- Decide the window size.
- Example : \
  statement = watch campusx for data science      \
  corpus = [watch, campusx, for, data, science]   \
  vocabulary = [watch, campusx, for, data, science] 
  <pre>
      <b>OHE</b> = watch : [1,0,0,0,0]
            campusx : [0,1,0,0,0]
            for : [0,0,1,0,0]
            data : [0,0,0,1,0]
            science : [0,0,0,0,1]
      <b>Size of Window</b> = 3, stride = 1
         [watch campusx for]
      <b>context word</b> = campusx
      <b>target word</b> = watch, for
      <b>Dataset : </b>
      <b>Training Data : </b>
                          ______________________________________________________________
                         |           X          |                   Y                   |
                         |______________________|_______________________________________|
                         |  campusx[0,1,0,0,0]  |   watch[1,0,0,0,0], for[0,0,1,0,0]    |
                         |______________________|_______________________________________|
                         |    for[0,0,1,0,0]    |  campusx[0,1,0,0,0], data[0,0,0,1,0]  |
                         |______________________|_______________________________________|
                         |    data[0,0,0,1,0]   |  for[0,0,1,0,0], science[0,0,0,0,1]   |
                         |______________________|_______________________________________|
  </pre>
- Architecture of __skip-ngram__.
  <pre>
                                         ___
                                        |___|  
                                      / |___|
                               w, b  /  |___|   predicted >---> Calculate Loss >---> update w(weight)
      context                 (3x5) /   |___|
        ___             embedding  /    |___|
    w1 |___|                ___   /
    w2 |___|  w, b (5x3) w1|___| /
    w3 |___| ----------> w2|___| \  
    w4 |___|             w3|___|  \      ___
    w5 |___|                       \    |___|
                              w, b  \   |___|
                              (3x5)  \  |___|   predicted >---> Calculate Loss  >---> update w(weight)
                                      \ |___|
                                        |___|

    >---------------------------forward propagation--------------------------------->
    <-----------------------------back propagation----------------------------------<
  </pre>
- Vector representation of any word like `watch = [w1, w2, w3]` at the position of embedding.
- It is a fully connected nural netwok.
- For starting take random w(weight).
- Calculate the __Loss__ and do ___backpropagation___ and ___forward propagation___ with new updated w(weight).


### When to use what?
- It is research prooven that.
- Use CBow for small dataset. It is fast and give accurate result with small dataset.
- Use Skip-gram for big dataset.
- Word2Vec to impove quality by :
  - Increase training data.
  - Increase the dimension of hidden layer.
  - Increase the window size.