# 8.2.3 Word2Vec

## Introduction

**Word2Vec** is a popular word embedding technique developed by researchers at Google. It represents words in a continuous vector space where semantically similar words are mapped to nearby points. This model can capture linguistic relationships between words, such as synonyms and analogies. **Word2Vec** uses neural networks to learn word representations from large text corpora. There are two main architectures for training Word2Vec:

- **Continuous Bag-of-Words (CBOW)**: Predicts the current word based on the context (surrounding words).
- **Skip-gram**: Predicts the context (surrounding words) based on the current word.

## Benefits of Word2Vec
- **Captures Semantics**: Effectively captures the meaning of words and their relationships.
- **Efficient**: Can handle large vocabularies and corpora.
- **Versatile**: Useful in various NLP tasks, such as text classification

___
___
### **Readings**:
- [Word2Vec For Word Embeddings -A Beginner’s Guide](https://www.analyticsvidhya.com/blog/2021/07/word2vec-for-word-embeddings-a-beginners-guide/)
- [The Illustrated Word2vec](https://jalammar.github.io/illustrated-word2vec/)
- [Word2Vec Explained](https://readmedium.com/en/https:/towardsdatascience.com/word2vec-explained-49c52b4ccb71)
- [A Dummy’s Guide to Word2Vec](https://medium.com/@manansuri/a-dummys-guide-to-word2vec-456444f3c673)
- [Introduction to Word Embedding and Word2Vec](https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa)
___
___

In [1]:
from gensim.models import Word2Vec
from nltk.corpus import brown

# Load the Brown corpus
sentences = brown.sents()

# Train the Word2Vec model using the CBOW architecture
model_cbow = Word2Vec(sentences, vector_size=100, window=5, min_count=5, workers=4, sg=0)

# Train the Word2Vec model using the Skip-gram architecture
model_skipgram = Word2Vec(sentences, vector_size=100, window=5, min_count=5, workers=4, sg=1)

# Example: Find similar words to 'king'
similar_words_cbow = model_cbow.wv.most_similar('king')
similar_words_skipgram = model_skipgram.wv.most_similar('king')

print("CBOW model - Words similar to 'king':\n")
for word in similar_words_cbow:
    print(word)
print("\nSkip-gram model - Words similar to 'king':\n")
for word in similar_words_skipgram:
    print(word)

CBOW model - Words similar to 'king':

('Birmingham', 0.9642075300216675)
('master', 0.9562150239944458)
('graceful', 0.9525375366210938)
('rousing', 0.9525105953216553)
('Vienna', 0.9507602453231812)
('singing', 0.9505473971366882)
('tail', 0.9495171904563904)
('guitar', 0.9481744170188904)
('skin', 0.947297990322113)
('Model', 0.9471037983894348)

Skip-gram model - Words similar to 'king':

('aunt', 0.9303045272827148)
('tease', 0.9267407655715942)
('comrades', 0.9239540100097656)
('Alvin', 0.9182977080345154)
('adventurous', 0.9182565808296204)
('numbered', 0.9167513847351074)
('embarrassed', 0.9160632491111755)
('Dick', 0.9150828123092651)
('Ritter', 0.9150470495223999)
('followers', 0.9146966338157654)


___
___
## Conclusion

Word2Vec is a powerful tool for transforming text into meaningful vector representations. By capturing the semantic relationships between words, it enables more sophisticated natural language processing tasks. The CBOW and Skip-gram architectures offer flexibility in training models based on different context prediction approaches. Word2Vec's efficiency and versatility make it a valuable asset in various applications, from text classification to machine translation.