# Word Embedding and Word2Vec

## What is Word Embedding?

Word embedding is a technique in natural language processing (NLP) that transforms words or phrases into numerical vectors, allowing computers to understand and process human language more effectively. Instead of representing words as isolated symbols, word embeddings map each word to a point in a high-dimensional space, where the position of each word reflects its meaning and relationship to other words. This approach enables models to capture semantic similaritiesâ€”words with similar meanings are located close to each other in this vector space.

## What is Word2Vec?

Word2Vec is a popular technique for learning word embeddings, and it offers two main model architectures: **CBOW (Continuous Bag of Words)** and **Skip-Gram**. Both models are designed to capture the relationships between words based on their context in a sentence, but they do so in opposite ways.

## CBOW (Continuous Bag of Words)

### How CBOW Works

- **Goal:** Predict the target word (the word in the middle) using the surrounding context words.
- **Input:** The context words around a missing or target word within a specified window size.
- **Output:** The model tries to guess the target word that fits best in the given context.

### Example

Suppose you have the sentence:  
*"The cat sat on the mat."*

If your window size is 2 and the target word is "sat", the context words are ["the", "cat", "on", "the"]. The CBOW model takes these context words as input and tries to predict "sat" as the output.

### Visual Explanation

In the first figure, you see several context words as input nodes (e.g., w(t-2), w(t-1), w(t+1), w(t+2)). These are combined (projected and summed) to predict the target word w(t) in the output. The model learns to associate groups of context words with the most likely target word that appears in the middle.

## Skip-Gram

### How Skip-Gram Works

- **Goal:** Predict the surrounding context words given a single target word.
- **Input:** The current (target) word.
- **Output:** The model tries to predict each of the context words that appear around the target word within a specified window.

### Example

Using the same sentence:  
*"The cat sat on the mat."*

If the target word is "sat" and the window size is 2, the Skip-Gram model takes "sat" as input and tries to predict the context words ["the", "cat", "on", "the"].

### Visual Explanation

In the second figure, the input is a single word (w(t)), and the model projects this word to predict multiple output words (w(t-2), w(t-1), w(t+1), w(t+2)). The model learns to use the target word to generate the most likely context words that surround it.


## Key Differences

| Feature         | CBOW                                      | Skip-Gram                                  |
|-----------------|-------------------------------------------|--------------------------------------------|
| Input           | Context words                             | Target word                                |
| Output          | Target word                               | Context words                              |
| Use Case        | Faster, works well with frequent words    | Better for rare words and small datasets   |
| Training Speed  | Generally faster                         | Slower, but more accurate for rare words   |

## Why Use Word2Vec?

Word2Vec is widely used in NLP for several reasons:

- **Semantic Representation:** It captures the meaning and relationships between words, so similar words have similar vectors.
- **Distributional Semantics:** Based on the idea that words used in similar contexts have similar meanings, Word2Vec learns from the distribution of words in large text datasets.
- **Vector Arithmetic:** The learned vectors can be combined using arithmetic operations to reveal relationships. For example, the vector for "king" minus "man" plus "woman" results in a vector close to "queen".
- **Efficiency:** Word2Vec is computationally efficient, making it feasible to train on large datasets with extensive vocabularies.
- **Transfer Learning:** Pre-trained Word2Vec models can be used as a starting point for various NLP tasks, saving time and resources.
- **Wide Applications:** Word2Vec embeddings are used in tasks like text classification, sentiment analysis, information retrieval, machine translation, and question answering.
- **Scalability:** The method can handle very large text corpora, which is essential for modern NLP applications.
- **Open Source:** Libraries like Gensim provide easy-to-use implementations of Word2Vec, making it accessible for both research and industry.

## How Does Word2Vec Work?

Word2Vec learns word vectors by training a neural network on a large text corpus. The network is trained to perform one of two tasks, depending on the chosen architecture:

- **CBOW:** The model receives several context words as input and tries to predict the target word that appears in the middle of those context words. For example, given the context "the cat on the," the model tries to predict "mat."
- **Skip-Gram:** The model receives a single word as input and tries to predict the words that appear around it within a certain window size. For example, given the word "cat," the model tries to predict words like "the," "on," and "mat."

During training, the model adjusts the word vectors so that words appearing in similar contexts end up with similar vectors. After training, these vectors can be used to measure the similarity between words, find related words, or serve as input features for other machine learning models.

## Example: Training Word2Vec in Python

Training a Word2Vec model in Python involves several clear steps, from preparing your text data to evaluating the semantic similarity between words. Below is a detailed, practical guide using the full text of "Alice's Adventures in Wonderland" from Project Gutenberg, available at [this link](https://www.gutenberg.org/files/11/11-0.txt).

### Step 1: Download and Prepare the Text

- **Download the Text File (Optional):**  
  Use the plain text version of "Alice's Adventures in Wonderland" from Project Gutenberg:  
  https://www.gutenberg.org/files/11/11-0.txt.

- **Text Preprocessing:**  
  - Read the text file into Python.
  - Replace escape characters (like `\n`) with spaces to ensure clean sentence boundaries.
  - Split the text into sentences, then tokenize each sentence into lowercase words for consistency.
  - This step ensures the data is in the right format for training the Word2Vec model.

### Step 2: Train the Word2Vec Models

- **CBOW Model (Continuous Bag of Words):**  
  - Trains the model to predict a target word based on its surrounding context words.
  - Useful for larger datasets and more frequent words.

- **Skip-Gram Model:**  
  - Trains the model to predict the context words given a single target word.
  - Especially effective for learning representations of rare words.

- **Implementation Example:**

In [None]:
import requests
from gensim.models import Word2Vec
from nltk.tokenize import sent_tokenize, word_tokenize
import nltk
nltk.download('punkt_tab')

# Step 1: Download the text directly from the URL
url = "https://www.gutenberg.org/files/11/11-0.txt"
response = requests.get(url)
text = response.text

# Step 2: Preprocess the text
text = text.replace("\n", " ")
data = []
for sentence in sent_tokenize(text):
    words = [word.lower() for word in word_tokenize(sentence)]
    data.append(words)

# Step 3: Train the CBOW model
cbow_model = Word2Vec(data, min_count=1, vector_size=100, window=5)

# Step 4: Train the Skip-Gram model
skipgram_model = Word2Vec(data, min_count=1, vector_size=100, window=5, sg=1)

# Step 5: Compute cosine similarities
print("Cosine similarity between 'alice' and 'wonderland' - CBOW:",
      cbow_model.wv.similarity('alice', 'wonderland'))
print("Cosine similarity between 'alice' and 'machines' - CBOW:",
      cbow_model.wv.similarity('alice', 'machines'))

print("Cosine similarity between 'alice' and 'wonderland' - Skip-Gram:",
      skipgram_model.wv.similarity('alice', 'wonderland'))
print("Cosine similarity between 'alice' and 'machines' - Skip-Gram:",
      skipgram_model.wv.similarity('alice', 'machines'))

### Step 3: Evaluate Word Similarity

After training your Word2Vec models, you can measure how closely related two words are by calculating the cosine similarity between their vector representations. Cosine similarity values range from -1 (completely dissimilar) to 1 (identical), with values closer to 1 indicating a stronger semantic relationship between the words in the embedding space.

**Updated Sample Output:**

| Word Pair                | CBOW Similarity | Skip-Gram Similarity |
|--------------------------|-----------------|----------------------|
| 'alice' & 'wonderland'   | 0.9866          | 0.8666               |
| 'alice' & 'machines'     | 0.9544          | 0.8534               |

#### Interpretation

- **'alice' & 'wonderland':**
  - **CBOW Similarity (0.9866):** This high value means that, according to the CBOW model, "alice" and "wonderland" appear in very similar contexts throughout the text, reflecting a strong semantic connection between the two words.
  - **Skip-Gram Similarity (0.8666):** This is also a high value, though slightly lower than CBOW, indicating that "alice" and "wonderland" are still closely related, but the Skip-Gram model may capture slightly different contextual nuances.

- **'alice' & 'machines':**
  - **CBOW Similarity (0.9544):** While still relatively high, this value is lower than the similarity between "alice" and "wonderland," suggesting that "alice" and "machines" are less frequently found in similar contexts, and thus less semantically related in the story[#].
  - **Skip-Gram Similarity (0.8534):** This is the lowest among the four, indicating that "alice" and "machines" are not strongly related in the text, as expected, since "machines" is not a central theme in "Alice's Adventures in Wonderland".

**Summary:**  
Higher cosine similarity values indicate stronger semantic relationships. The results show that "alice" and "wonderland" are much more semantically related than "alice" and "machines," which aligns with the content and themes of the book.

### Experimentation

- **Change Model Parameters:** You can experiment by adjusting parameters like `vector_size` (the number of dimensions in the embedding), `window` (the context window size), or `min_count` (minimum word frequency) to see how these affect the similarity scores and the quality of the learned relationships.
- **Try Different Word Pairs:** Test other word pairs to explore how the model captures various relationships in the text. For example, compare "queen" and "king," or "rabbit" and "hole" to see if the model reflects their narrative connections.
- **Observe Effects:** Changing these parameters or word pairs can help you understand how Word2Vec models learn and represent semantic meaning from text data.

## Applications of Word Embedding

Word embeddings like those produced by Word2Vec are used in many NLP tasks:

- **Text Classification:** Improve the accuracy of categorizing documents or detecting sentiment by providing rich word representations.
- **Named Entity Recognition (NER):** Help identify names, locations, and other entities by leveraging semantic context.
- **Information Retrieval:** Enable more accurate search results by matching documents based on semantic similarity rather than just keyword matching.
- **Machine Translation:** Facilitate translation by capturing relationships between words in different languages.
- **Question Answering:** Enhance the ability of systems to understand and answer questions by providing context-aware word representations.