<a href="https://colab.research.google.com/github/ShaunakSen/Deep-Learning/blob/master/Understanding_Word2Vec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Word2Vec word embedding tutorial

[tutorial link](https://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/)

### Why do we need Word2Vec?

If we want to feed words into machine learning models, unless we are using tree based methods, we need to convert the words into some set of numeric vectors.  A straight-forward way of doing this would be to use a “one-hot” method of converting the word into a sparse representation with only one element of the vector set to 1, the rest being zero. 

So, for the sentence “the cat sat on the mat” we would have the following vector representation:

\begin{equation} 
\begin{pmatrix} 
the \\ 
cat \\ 
sat \\ 
on \\ 
the \\ 
mat \\ 
\end{pmatrix} 
= 
\begin{pmatrix} 
1 & 0 & 0 & 0 & 0 \\ 
0 & 1 & 0 & 0 & 0 \\ 
0 & 0 & 1 & 0 & 0 \\ 
0 & 0 & 0 & 1 & 0 \\ 
1 & 0 & 0 & 0 & 0 \\ 
0 & 0 & 0 & 0 & 1 
\end{pmatrix} 
\end{equation}


Here we have transformed a six word sentence into a 6×5 matrix, with the 5 being the size of the vocabulary (“the” is repeated).  In practical applications, however, we will want machine and deep learning models to learn from gigantic vocabularies i.e. 10,000 words plus.  You can begin to see the efficiency issue of using “one hot” representations of the words – the input layer into any neural network attempting to model such a vocabulary would have to be at least 10,000 nodes.  Not only that, this method strips away any local context of the words – in other words, it strips away information about words which commonly appear close together in sentences (or between sentences).

For instance, we might expect to see “United” and “States” to appear close together, or “Soviet” and “Union”.  Or “food” and “eat”, and so on.  This method loses all such information, which, if we are trying to model natural language, is a large omission.  Therefore, we need an efficient representation of the text data which also conserves information about local word context.  This is where the Word2Vec methodology comes in.

### The Word2Vec methodology

As mentioned previously, there is two components to the Word2Vec methodology.  The first is the mapping of a high dimensional one-hot style representation of words to a lower dimensional vector. **This might involve transforming a 10,000 columned matrix into a 300 columned matrix, for instance. This process is called word embedding.**  The second goal is to do this while still maintaining word context and therefore, to some extent, meaning. **One approach to achieving these two goals in the Word2Vec methodology is by taking an input word and then attempting to estimate the probability of other words appearing close to that word.  This is called the skip-gram approach.**  The alternative method, called Continuous Bag Of Words (CBOW), does the opposite – it takes some context words as input and tries to find the single word that has the highest probability of fitting that context.  In this tutorial, we will concentrate on the skip-gram method.

What’s a gram?  A gram is a group of n words, where n is the gram window size.  So for the sentence “The cat sat on the mat”, a 3-gram representation of this sentence would be “The cat sat”, “cat sat on”, “sat on the”, “on the mat”.  The “skip” part refers to the number of times an input word is repeated in the data-set with different context words (more on this later).  These grams are fed into the Word2Vec context prediction system. For instance, assume the input word is “cat” – the Word2Vec tries to predict the context (“the”, “sat”) from this supplied input word.  **The Word2Vec system will move through all the supplied grams and input words and attempt to learn appropriate mapping vectors (embeddings) which produce high probabilities for the right context given the input words.**

> What is this Word2Vec prediction system?  Nothing other than a neural network.





