 <div style="background-color: #99CD4E; text-align:center; vertical-align: middle; padding:40px 0;"> 
  <h1 style="color: white;"> *Incorporating semantics in Vector Space Models (VSM)* </h1>.
 </div>
 
 Word vectors are called word embedings
 
 # References
 - [Natural Language Processing with Deep Learning (Stanford University) Lecture 2](http://web.stanford.edu/class/cs224n/lectures/lecture2.pdf)
 - [An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec](https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/)
 - [Vector Representations of Words](https://www.tensorflow.org/tutorials/word2vec)

<div style="background-color: #99CD4E; padding:5px 0;"> 
  <h2 style="color: white;"> Challenges </h1>
 </div>
 
- Thesaurus or dictionary based approach like wordnet do not capture the nuances of words, have to be manually created and updated.
- Count based approaches do not have a notion of similarity of words


<div style="background-color: #99CD4E; padding:5px 0;"> 
  <h2 style="color: white;"> Distributional Semantics </h1>
 </div>
 
 A word’s meaning is given by the words that frequently appear close-by
 
 > “You shall know a word by the company it keeps” (J. R. Firth 1957: 11)
 
- When a word w appears in a text, its context is the set of words that appear nearby (within a fixed-size window).
- Use the many contexts of w to build up a representation of w

<div style="background-color: #99CD4E; padding:5px 0;"> 
  <h2 style="color: white;"> Taxonomy of distributional approaches </h1>
 </div>
 
 - Count-based methods compute the statistics of how often some word co-occurs with its neighbor words in a large text corpus, and then map these count-statistics down to a small, dense vector for each word. e.g., [Glove](https://nlp.stanford.edu/projects/glove/)
    - Dimensionality reduction technique like Principal Component Analysis (PCA) and t-SNE can be used
 
 
 - Predictive models directly try to predict a word from its neighbors in terms of learned small, dense embedding vectors (considered parameters of the model).

<div style="background-color: #99CD4E; padding:5px 0;"> 
  <h2 style="color: white;"> Word2Vec </h1>
 </div>

Word2vec [Mikolov et al. 2013](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) is a framework for learning word vectors. Idea:
- We have a large corpus of text
- Every word in a fixed vocabulary is represented by a vector
- Go through each position t in the text, which has a center word c and context (“outside”) words o
- Use the similarity of the word vectors for c and o to calculate the probability of o given c (or vice versa)
- Keep adjusting the word vectors to maximize this probability


# Train Word2Vec model using Shakespeare

In [11]:
from gensim.models import Word2Vec
from nltk.corpus import gutenberg

In [15]:
books = ['shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt',]

In [25]:
sents = gutenberg.sents('shakespeare-caesar.txt') + gutenberg.sents('shakespeare-hamlet.txt') + gutenberg.sents('shakespeare-macbeth.txt')
type(sents)

nltk.util.LazyConcatenation

In [33]:
bard = Word2Vec(sents, size=100, window=5, min_count=5, workers=4)

In [34]:
len(bard.wv.vocab)

1588

# Get vector of word

In [43]:
bard.wv['Macbeth']

array([-0.10742793, -0.01956181,  0.13975601,  0.07485832, -0.17751361,
       -0.06080489,  0.1121189 , -0.04487086, -0.24221551, -0.14671983,
        0.11934487,  0.02752467, -0.35037538, -0.17839876, -0.36048526,
       -0.00681653, -0.2306423 , -0.06548589,  0.37666312,  0.08077748,
       -0.34400108, -0.32902193, -0.3022477 ,  0.11491542, -0.2893151 ,
        0.21328667, -0.26541919,  0.56787801, -0.06792197, -0.05740376,
        0.13355957, -0.45053831,  0.18514419, -0.31926861, -0.1611011 ,
        0.10380171,  0.24726281, -0.19478726, -0.18347853,  0.31361559,
        0.26796308, -0.12502605,  0.11905139, -0.41982856, -0.35766694,
       -0.02638668,  0.09754109,  0.22023383, -0.05532344, -0.09578989,
       -0.06329821,  0.13999194,  0.1025404 , -0.1021321 ,  0.01795489,
        0.15535243,  0.04837716,  0.10837361, -0.32653791, -0.10438953,
       -0.04949269,  0.48600715, -0.57529658,  0.07215577, -0.18409459,
        0.54216701, -0.05354522,  0.01323699, -0.4829801 ,  0.19

# Exercise

- How will you get the vector representation of a sentence

# Similarity

In [42]:
bard.most_similar('Macbeth')

[('Hamlet', 0.9997541904449463),
 ('Antony', 0.999751091003418),
 ('Queene', 0.9997130632400513),
 ('Banquo', 0.9997089505195618),
 ('Horatio', 0.9997077584266663),
 ('Oh', 0.999681293964386),
 ('&', 0.9996806979179382),
 ('Cassius', 0.999677836894989),
 ('In', 0.9996768832206726),
 ('three', 0.999654233455658)]

In [41]:
bard.wv.most_similar(positive=['Macbeth'], negative=['woman'])

[('.', 0.17482200264930725),
 ('Len', 0.14525464177131653),
 ('Messa', 0.0918571725487709),
 ('Ment', 0.09179973602294922),
 ('Cin', 0.08558960258960724),
 ('Osr', 0.08130502700805664),
 ('Tit', 0.06718209385871887),
 ('Hora', 0.06625272333621979),
 ('Cask', 0.060132596641778946),
 ('Murth', 0.04870552569627762)]

 <div style="background-color: #99CD4E; text-align:center; vertical-align: middle; padding:40px 0;"> 
  <h1 style="color: white;"> *The End* </h1>.
 </div>