### What is GloVe?

GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm developed by researchers at Stanford University, including Jeffrey Pennington, Richard Socher, and Christopher Manning. It is used to generate dense vector representations of words, capturing semantic relationships between them. Unlike traditional one-hot encoding, which creates sparse vectors, GloVe produces low-dimensional, continuous vectors that encode meaningful information about words and their contexts.

### Key Concepts of GloVe

1. **Co-occurrence Matrix**: GloVe starts by constructing a co-occurrence matrix from a large corpus of text. This matrix counts how often words appear together within a given context window. Each entry in the matrix represents the frequency of co-occurrence between two words.

2. **Weighted Least Squares**: The core idea of GloVe is to factorize this co-occurrence matrix using a weighted least squares objective. This objective ensures that words frequently appearing together in similar contexts have similar vector representations.

3. **Weighting Function**: To balance the influence of frequent and rare co-occurrences, GloVe employs a weighting function that gives less importance to very frequent co-occurrences, preventing them from dominating the optimization process.

4. **Training Objective**: The training objective minimizes the difference between the dot product of word vectors and the logarithm of their co-occurrence count. This objective ensures that similar words (based on their context) have similar vector representations.

### GloVe Training Objective

The objective function of GloVe is designed to capture the statistical information encoded in the co-occurrence matrix. For two words $i$ and $j$ with vector representations $w_i$ and $w_j$, the objective function can be expressed as:

$$ 
J = \sum_{i,j=1}^{V} f(X_{ij}) \left( w_i^T \tilde{w}_j + b_i + \tilde{b}_j - \log X_{ij} \right)^2 
$$

where:
- $ V $ is the size of the vocabulary.
- $ X_{ij} $ is the co-occurrence count of words $i$ and $j$.
- $ f(X_{ij}) $ is the weighting function.
- $ w_i $ and $ \tilde{w}_j $ are the word vectors for words $i$ and $j$, respectively.
- $ b_i $ and $ \tilde{b}_j $ are bias terms.

The weighting function $ f(X_{ij}) $ is defined as:

$$ 
f(X_{ij}) = \left( \frac{X_{ij}}{X_{\text{max}}} \right)^\alpha \quad \text{if } X_{ij} < X_{\text{max}}, \text{otherwise } f(X_{ij}) = 1 
$$

where:
- $ X_{\text{max}} $ and $ \alpha $ are hyperparameters.


### Advantages of GloVe

1. **Efficient Training**: GloVe is trained on a global co-occurrence matrix, which allows it to capture long-range dependencies and semantic relationships effectively.
2. **Meaningful Vectors**: The resulting word vectors capture semantic relationships, such as word analogies (e.g., "king" - "man" + "woman" ≈ "queen").
3. **Flexibility**: GloVe can be applied to various tasks, including word similarity, named entity recognition, and machine translation.

### Applications

- **Natural Language Processing (NLP)**: GloVe vectors are used as input features for various NLP tasks, such as sentiment analysis, named entity recognition, and question answering.
- **Information Retrieval**: Word vectors help in improving search engine algorithms by providing better semantic matching of queries and documents.
- **Machine Translation**: GloVe embeddings are used to improve the performance of machine translation models by capturing semantic similarities between words in different languages.

### Conclusion

GloVe is a powerful technique for generating word embeddings that capture semantic relationships between words. Its ability to learn meaningful representations from large corpora makes it a valuable tool in various NLP and machine learning applications.

In [6]:
import numpy as np
import pandas as pd
from collections import Counter
import itertools
import string
from sklearn.preprocessing import normalize
import matplotlib.pyplot as plt

In [2]:
def preprocess(text):
    """
    Preprocess the text by removing punctuation, converting to lowercase, and splitting into words.
    
    Args:
    text (str): Input text string.
    
    Returns:
    list: List of words (tokens).
    """
    # Remove punctuation and convert to lowercase
    text = text.translate(str.maketrans('', '', string.punctuation)).lower()
    # Split into words (tokens)
    tokens = text.split()
    return tokens

# Example text corpus
corpus = ["I love NLP", "NLP is a fascinating field", "Natural language processing with GloVe"]

# Preprocess the corpus
tokens = list(itertools.chain(*[preprocess(sentence) for sentence in corpus]))

print(tokens)

['i', 'love', 'nlp', 'nlp', 'is', 'a', 'fascinating', 'field', 'natural', 'language', 'processing', 'with', 'glove']


In [3]:
def build_vocab(tokens):
    """
    Build vocabulary from tokens and create a word-to-index mapping.
    
    Args:
    tokens (list): List of words (tokens).
    
    Returns:
    dict: Word-to-index mapping.
    """
    vocab = Counter(tokens)
    word_to_index = {word: i for i, word in enumerate(vocab)}
    return word_to_index

def build_cooccurrence_matrix(tokens, word_to_index, window_size=2):
    """
    Build the co-occurrence matrix from tokens using a specified window size.
    
    Args:
    tokens (list): List of words (tokens).
    word_to_index (dict): Word-to-index mapping.
    window_size (int): Context window size.
    
    Returns:
    np.ndarray: Co-occurrence matrix.
    """
    vocab_size = len(word_to_index)
    cooccurrence_matrix = np.zeros((vocab_size, vocab_size))
    
    for i, word in enumerate(tokens):
        word_index = word_to_index[word]
        context_start = max(0, i - window_size)
        context_end = min(len(tokens), i + window_size + 1)
        
        for j in range(context_start, context_end):
            if i != j:
                context_word = tokens[j]
                context_word_index = word_to_index[context_word]
                cooccurrence_matrix[word_index, context_word_index] += 1
    
    return cooccurrence_matrix

# Build vocabulary and co-occurrence matrix
word_to_index = build_vocab(tokens)
cooccurrence_matrix = build_cooccurrence_matrix(tokens, word_to_index)

print(cooccurrence_matrix)

[[0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 2. 2. 2. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 2. 0. 1. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.]]


In [4]:
class GloVe:
    def __init__(self, vocab_size, embedding_dim=50, x_max=100, alpha=0.75):
        """
        Initialize the GloVe model with the specified parameters.
        
        Args:
        vocab_size (int): Size of the vocabulary.
        embedding_dim (int): Dimensionality of the word vectors.
        x_max (int): Maximum value for the weighting function.
        alpha (float): Exponent for the weighting function.
        """
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.x_max = x_max
        self.alpha = alpha
        self.W = np.random.rand(vocab_size, embedding_dim)
        self.W_tilde = np.random.rand(vocab_size, embedding_dim)
        self.b = np.random.rand(vocab_size)
        self.b_tilde = np.random.rand(vocab_size)
        self.gradsq_W = np.ones((vocab_size, embedding_dim))
        self.gradsq_W_tilde = np.ones((vocab_size, embedding_dim))
        self.gradsq_b = np.ones(vocab_size)
        self.gradsq_b_tilde = np.ones(vocab_size)
    
    def weighting_function(self, x):
        """
        Weighting function for the GloVe model.
        
        Args:
        x (float): Co-occurrence value.
        
        Returns:
        float: Weight.
        """
        if x < self.x_max:
            return (x / self.x_max) ** self.alpha
        return 1.0
    
    def train(self, cooccurrence_matrix, epochs=100, learning_rate=0.05):
        """
        Train the GloVe model using the specified co-occurrence matrix.
        
        Args:
        cooccurrence_matrix (np.ndarray): Co-occurrence matrix.
        epochs (int): Number of training epochs.
        learning_rate (float): Learning rate for gradient updates.
        """
        for epoch in range(epochs):
            total_cost = 0
            for i in range(self.vocab_size):
                for j in range(self.vocab_size):
                    if cooccurrence_matrix[i, j] == 0:
                        continue
                    X_ij = cooccurrence_matrix[i, j]
                    weight = self.weighting_function(X_ij)
                    cost = weight * (np.dot(self.W[i], self.W_tilde[j]) + self.b[i] + self.b_tilde[j] - np.log(X_ij)) ** 2
                    total_cost += cost
                    
                    # Compute gradients
                    grad_common = weight * (np.dot(self.W[i], self.W_tilde[j]) + self.b[i] + self.b_tilde[j] - np.log(X_ij))
                    grad_W = grad_common * self.W_tilde[j]
                    grad_W_tilde = grad_common * self.W[i]
                    grad_b = grad_common
                    grad_b_tilde = grad_common
                    
                    # Update parameters
                    self.W[i] -= learning_rate * grad_W / np.sqrt(self.gradsq_W[i])
                    self.W_tilde[j] -= learning_rate * grad_W_tilde / np.sqrt(self.gradsq_W_tilde[j])
                    self.b[i] -= learning_rate * grad_b / np.sqrt(self.gradsq_b[i])
                    self.b_tilde[j] -= learning_rate * grad_b_tilde / np.sqrt(self.gradsq_b_tilde[j])
                    
                    # Update squared gradients
                    self.gradsq_W[i] += grad_W ** 2
                    self.gradsq_W_tilde[j] += grad_W_tilde ** 2
                    self.gradsq_b[i] += grad_b ** 2
                    self.gradsq_b_tilde[j] += grad_b_tilde ** 2
            
            if epoch % 10 == 0:
                print(f'Epoch: {epoch}, Cost: {total_cost}')
    
    def get_word_vector(self, word):
        """
        Get the word vector for the specified word.
        
        Args:
        word (str): Input word.
        
        Returns:
        np.ndarray: Word vector.
        """
        if word in word_to_index:
            word_index = word_to_index[word]
            return self.W[word_index]
        return None

# Initialize and train the GloVe model
glove = GloVe(vocab_size=len(word_to_index), embedding_dim=50)
glove.train(cooccurrence_matrix, epochs=100)

# Get the word vector for 'nlp'
word_vector = glove.get_word_vector('nlp')
print(word_vector)

Epoch: 0, Cost: 212.64408285952362
Epoch: 10, Cost: 33.981420342290654
Epoch: 20, Cost: 11.90269477883924
Epoch: 30, Cost: 5.175079400732769
Epoch: 40, Cost: 2.5389622248442816
Epoch: 50, Cost: 1.3609105311592509
Epoch: 60, Cost: 0.7840870656955667
Epoch: 70, Cost: 0.47969041286440744
Epoch: 80, Cost: 0.3082160979317296
Epoch: 90, Cost: 0.20596709468705968
[ 0.44207137  0.76081516 -0.24090223  0.4042175   0.2024028   0.35407782
  0.21832759  0.0165039  -0.03980088 -0.15351597  0.34953039  0.12826313
 -0.30591215 -0.13168328 -0.14439805 -0.05113714  0.1194515   0.43186841
  0.29130757 -0.03181844 -0.24558297  0.18159045  0.14357545  0.17335863
  0.37875114  0.28158238 -0.24018128  0.02932703 -0.28275963 -0.42715426
 -0.15118422 -0.29932214 -0.18210339  0.47112105  0.3413519   0.41564289
  0.17038835  0.11816887  0.60066647  0.30423619 -0.33602663 -0.39995275
  0.33467779  0.1667833   0.35331386  0.32621722  0.38033372 -0.32044388
  0.01662089 -0.12178101]


### Explanation
1. **Preprocessing**: Clean and tokenize the text.
2. **Building Vocabulary**: Create a vocabulary and word-to-index mapping.
3. **Co-occurrence Matrix**: Build a co-occurrence matrix using a sliding window approach.
4. **GloVe Model**:
    > **Initialization**: Initialize the parameters and hyperparameters.
    
    > **Weighting Function**: Define a function to weight the co-occurrences.
    
    > **Training**: Train the model using gradient descent with AdaGrad for optimization.
    
    > **Get Word Vector**: Retrieve the word vector for a given word from the learned embeddings.
    
This code provides a basic implementation of the GloVe model. For a more comprehensive implementation, consider using libraries like TensorFlow or PyTorch, which offer additional functionalities and optimizations.