## GloVe

**GloVe (Global Vectors for Word Representation)** is a **word embedding** method that represents each word as a **numeric vector** by learning from **global word co-occurrence statistics** across a large text corpus.

## Core Idea

- Words that **appear together often** tend to have **similar meanings**.
- **GloVe** analyzes **how frequently words co-occur across the entire dataset**, not just within a small context window.

## How GloVe Learns (Intuition)

- Builds a **co-occurrence matrix**  
  (how often **word A** appears with **word B**).
- Learns word vectors such that:
  - **Frequently co-occurring words** are **closer together** in vector space.

This allows GloVe to capture **global semantic relationships** between words.

## Example

If these word pairs often appear together:

- **king ↔ queen**
- **man ↔ woman**

### What GloVe Learns
GloVe can capture relationships like:
**king − man + woman ≈ queen**


This shows that GloVe learns **semantic relationships** between words.


## Key Characteristics
- One **vector per word**
- Uses **global co-occurrence statistics**
- **Context-free** representation


## Main Limitation (Important)

**GloVe does not understand context**

So the word **“bank”** has the same vector in all cases:

- bank (money)
- bank (river)

The same word always gets **one fixed vector**, regardless of its meaning in a sentence.

## Why GloVe Is Not Used in LLMs

- Modern NLP systems require **context-aware representations**  
  (understanding **sentence meaning** and **user intent**).
- **GloVe** produces **context-free word embeddings**.
- It cannot change a word’s meaning based on **sentence context**.

### Result
- GloVe has been largely **replaced by contextual embeddings** produced by **Transformer-based models**.

This is why **GloVe is not used in modern LLM or RAG systems**.

In [5]:
import gensim.downloader as api

In [6]:
glove_model = api.load("glove-wiki-gigaword-50")

In [7]:
vector = glove_model["bank"]

print("Vector length:", len(vector))
print("First 10 values:")
print(vector[:10])

Vector length: 50
First 10 values:
[ 0.66488 -0.11391  0.67844  0.17951  0.6828  -0.47787 -0.30761  0.17489
 -0.70512 -0.55022]


In [8]:
glove_model.most_similar("bank", topn=6)

[('banks', 0.869862973690033),
 ('securities', 0.7996813654899597),
 ('banking', 0.7965160012245178),
 ('investment', 0.7849708199501038),
 ('exchange', 0.7808825969696045),
 ('financial', 0.7670274972915649)]