### Word Embeddings
- Word embeddings are a way to represent words as dense numerical vectors so that words with similar meanings have similar numerical representations.
- Word embeddings convert words into numbers that capture meaning and relationships.

### Vector with One Hot Encoding
- "cat"  = [0, 1, 0, 0]
- "dog"  = [0, 0, 1, 0]
- Problem with OHE - → No relationship between cat and dog.

### Vector with Word Embedddings
- "cat"  → [0.12, -0.98, 0.45, ...]
- "dog"  → [0.11, -0.95, 0.47, ...]
- OHE problem of No relationship between cat and dog is solved with 'Word Embeddings' - ***Similar words have similar vectors***

- <image src='./resources/images/word_embeddings_1.png' width="600" height="400"/>

### Common Word Embedding Models
- Word2Vec
    - CBOW - Architecture #1 used in Word2Vec
    - Skip-Gram - Architecture #2 used in Word2Vec
- GloVe
- FastText
- Gensim

- <image src='./resources/images/word_2_vec_architecture_types.png' width="500" height="200"/>


### Dataset
- The dataset used here has two columns
- Input column = Email message
- Output column = spam/ham : 
    - Spam = spam email message
    - ham = non-spam email message

### Word2Vec 
- Word2Vec is a technique used to learn word embeddings — numerical vector representations of words that capture semantic meaning and relationships between words.
- It was introduced by Google in 2013.
- Word2Vec input : corpus / documents / words
- Word2Vec output: Vector matrix where two similar vectors will be close by in the graph, and dissimilar vectors will be far apart.
- Google has created a 1.5 GB Model trained on 3 billion words which gives us a Vector Matrix ready to use.
- Positive and Negative values in the Vector indicate opposite relationships e.g. boy and girl or king and queen, etc
- Values closer to ZERO means no relationship or weak relationship
- Values close to ONE means very strong relationship.
- Word2Vec example Vector Matrix
- <image src='./resources/images/word_2_vec_example_vector_matrix.png' width="500" height="200"/>

### Cosine similarity and Word2Vec
- 1) Cosine Similarity of similar vectors
- <image src='./resources/images/cosine_similarity_of_similar_vectors.png' width="500" height="200"/>
<br/>

- 2) Cosine Similarity of dissimilar vectors
- <image src='./resources/images/cosine_similarity_of_disimilar_vectors.png' width="500" height="200"/>
<br/>

- 3) Cosine Similarity of same vectors
- <image src='./resources/images/cosine_similarity_of_same_vectors.png' width="500" height="200"/>
<br/>

- 4) Real life example (using movies) of similar vectors
- <image src='./resources/images/movie_example_of_similar_vectors.png' width="500" height="200"/>


### Advantages of Word2Vec
- 1) Dense Vector Matrix
    - With Word2Vec we get a Dense Matrix instead of Sparse matrix.
    - ML algorithms can train in a a better way if we have a Dense Vector matrix.
- 2) Semantic information getting captured
    - We can also find the Cosine similarity between similar and dissimilar words
    - Using this Cosine Similarity we can find similar words (e.g. honest and good) and also find dissimilar words (e.g. sad and happy)
- 3) Vector Matrix Size is Manageable
    - Earlier if we had a huge vocabulary size, we used to get a huge Vector Matrix size.
    - But now, when we use the Google's pre-trained model for Word2Vec, we always get a fixed vector matrix size of 300 dimensions.
- 4) Not Out of Vocabulary (OOV) 
    - Using Google's Word2Vec pre-trained model, we would cover all the word permutation - combinations, so we have a very less chance of getting any new Out of Vocabulary words.

### AvgWord2Vec
- AvgWord2Vec (Average Word2Vec) is a simple technique for converting an entire sentence or document into a single vector by averaging the Word2Vec embeddings of all words in that text.
- Example
    - Sentence = "good food tasty"
    - Word embeddings = 
        - "good"  → [0.4, 0.6, 0.8]
        - "food"  → [0.2, 0.5, 0.7]
        - "tasty" → [0.6, 0.7, 0.9]
    - Average = ([0.4,0.6,0.8] + [0.2,0.5,0.7] +[0.6,0.7,0.9]) / 3 = [0.4, 0.6, 0.8]
    - The resulting vector represents the ***entire sentence/document***.

### Google's Word2Vec and AvgWord2Vec
- Google's Word2Vec converts ***each word*** in the Document into a 300 dimension vector.    
- Using such large vectors for every word is too much to handle for any ML algorithm.
- So we take an average the vectors of all the words in each sentence/document and get as output one vector for every sentence/document.

***Vector creation WITHOUT using Average Word to Vec using Google's Word2Vec***
- One 300 Dimension Vector for every word in the Sentence/Document
- <image src='./resources/images/vector_without_average_word_to_vec_1.png' width="500" height="200"/>
 
***Vector creation WITH using Average Word to Vec using Google's Word2Vec***
- One 300 Dimension Vector for every Sentence/Document in the Corpus
- <image src='./resources/images/vector_with_word_to_vec_1.png' width="500" height="200"/>


### Process to follow for Word2Vec
- Step-1) Load the dataset
- Step-2) Feature Engineering
     - Perform cleaning and text pre-processing of your corpus.
- Step-3) Train Test Split
    - Do a Train Test Split before you apply Bag of Words (BOW) and/or TF-IDF.
    - This is because if you do BOW and/or TF-IDF then it will be applied on all of the data, test and train data.
    - This will cause ***Data Leakage*** - which is when Test Data and Train Data are not totally unrelated. 
    - Doing Train Test Split before BOW and/or TF-IDF is a ***Best Practice***.
- Step-4) Apply Bag of Words and/or TF-IDF (using Bag of Words here)
- Step-5) Train the model and do Prediction using an ML algorithm 
- Step-6) Test the performance of the model (by using classification_report (here))

