# Understanding Contextualized Token Embeddings: A Step-by-Step Guide

This document explains key concepts in **word embeddings** using **GloVe, BERT, and CrossEncoder models**, along with code snippets for implementation.

---

## **1. Import Required Libraries**
```python
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import torch

from transformers import BertTokenizer, BertModel
from sklearn.metrics.pairwise import cosine_similarity
```
### **Explanation**
- `warnings.filterwarnings('ignore')`: Suppresses unnecessary warnings.
- `numpy`: Used for numerical operations.
- `matplotlib.pyplot`: Used for visualization.
- `sklearn.decomposition.PCA`: Used for reducing dimensions in vector embeddings.
- `torch`: PyTorch library for deep learning models.
- `transformers`: Contains pre-trained NLP models like BERT.
- `sklearn.metrics.pairwise.cosine_similarity`: Measures the similarity between vectors.

---

## **2. GloVe Word Embeddings**
```python
import gensim.downloader as api
word_vectors = api.load('glove-wiki-gigaword-100')
#word_vectors = api.load('word2vec-google-news-300')
```
### **Explanation**
- `gensim.downloader.api.load('glove-wiki-gigaword-100')`: Downloads **GloVe embeddings** (100-dimensional).
- `word2vec-google-news-300`: Alternative model (300-dimensional Word2Vec embeddings).

```python
word_vectors['king'].shape
word_vectors['king'][:20]  # Displays first 20 values of 'king' embedding
```
- Each word is represented as a **vector of numbers** capturing its meaning.

---

## **3. Visualizing Word Embeddings with PCA**
```python
words = ["king", "princess", "monarch", "throne", "crown",
         "mountain", "ocean", "tv", "rainbow", "cloud", "queen"]

vectors = np.array([word_vectors[word] for word in words])
```
### **Explanation**
- Defines a list of words.
- Retrieves word embeddings for each word.

```python
pca = PCA(n_components=2)
vectors_pca = pca.fit_transform(vectors)
```
- `PCA(n_components=2)`: Reduces the dimensions of word embeddings to 2D for visualization.

```python
fig, axes = plt.subplots(1, 1, figsize=(5, 5))
axes.scatter(vectors_pca[:, 0], vectors_pca[:, 1])
for i, word in enumerate(words):
    axes.annotate(word, (vectors_pca[i, 0]+.02, vectors_pca[i, 1]+.02))
axes.set_title('PCA of Word Embeddings')
plt.show()
```
- Plots words in **2D space** using PCA.

---

## **4. Word2Vec Algebra (Word Analogies)**
```python
result = word_vectors.most_similar(positive=['king', 'woman'],
                                   negative=['man'], topn=1)
```
- Finds the word that best fits **"king - man + woman"**, which is expected to be **"queen"**.

```python
print(f"""
    The word closest to 'king' - 'man' + 'woman' is: '{result[0][0]}'
    with a similarity score of {result[0][1]}
""")
```
- Displays the result with similarity score.

---

## **5. GloVe vs. BERT: Words in Context**
```python
tokenizer = BertTokenizer.from_pretrained('./models/bert-base-uncased')
model = BertModel.from_pretrained('./models/bert-base-uncased')
```
- Loads **BERT tokenizer and model**.

```python
def get_bert_embeddings(sentence, word):
    inputs = tokenizer(sentence, return_tensors='pt')
    outputs = model(**inputs)
    last_hidden_states = outputs.last_hidden_state
    word_tokens = tokenizer.tokenize(sentence)
    word_index = word_tokens.index(word)
    word_embedding = last_hidden_states[0, word_index + 1, :]
    return word_embedding
```
### **Explanation**
- Tokenizes the sentence.
- Extracts **last hidden state** from BERT.
- Finds the embedding of the specified word (`bat`).

```python
sentence1 = "The bat flew out of the cave at night."
sentence2 = "He swung the bat and hit a home run."
word = "bat"

bert_embedding1 = get_bert_embeddings(sentence1, word).detach().numpy()
bert_embedding2 = get_bert_embeddings(sentence2, word).detach().numpy()
word_embedding = word_vectors[word]
```
- Gets embeddings for **"bat"** in two different contexts.

```python
print("BERT Embedding for 'bat' in sentence 1:", bert_embedding1[:5])
print("BERT Embedding for 'bat' in sentence 2:", bert_embedding2[:5])
print("GloVe Embedding for 'bat':", word_embedding[:5])
```
- Displays first **5 values** of embeddings.

```python
bert_similarity = cosine_similarity([bert_embedding1], [bert_embedding2])[0][0]
word_embedding_similarity = cosine_similarity([word_embedding], [word_embedding])[0][0]
```
- Computes **cosine similarity** between embeddings.

```python
print(f"Cosine Similarity between BERT embeddings in different contexts: {bert_similarity}")
print(f"Cosine Similarity between GloVe embeddings: {word_embedding_similarity}")
```
- Compares **BERT vs. GloVe** in capturing word meanings.

---

## **6. Cross Encoder for Sentence Similarity**
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', max_length=512,
                     default_activation_function=torch.nn.Sigmoid())
```
- Loads a **CrossEncoder** model for **sentence similarity scoring**.

```python
question = "Where is the capital of France?"
answers = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain."
]
```
- Defines a **question** and **candidate answers**.

```python
scores = model.predict([(question, answers[0]), (question, answers[1]),
                        (question, answers[2])])
```
- Computes **similarity scores** between question and each answer.

```python
most_relevant_idx = torch.argmax(torch.tensor(scores)).item()
print(f"The most relevant passage is: {answers[most_relevant_idx]}")
```
- Identifies the most **relevant answer**.

---

## **Summary**
- **GloVe**: Provides fixed word embeddings.
- **BERT**: Provides **contextualized** embeddings.
- **PCA**: Used for visualization.
- **CrossEncoder**: Finds best answers based on similarity.
