# Understanding GloVe 
GloVe or Global Vectors for Word Representation is another technique to train word embedding in an unsupervised way. GloVe was proposed by 3 Stanford University researcher Jeffrey Pennington, Richard Socher and Christopher D. Manning.  The GloVe is a much more principled approach then Word2Vec.

In [None]:
## Author: Sunil Patel
## Copyright: Copyright 2018-2019, Packt Publishing Limited
## Version: 0.0.1
## Maintainer: Sunil Patel
## Email: snlpatel01213@hotmail.com
## Linkedin: https://www.linkedin.com/in/linus1/
## Contributor : {if you debug, append your name here}
## Contributor Email : {if you debug, append your email here}
## Status: active

In [None]:
import matplotlib.pyplot as plt
import nltk
import numpy as np
import torch
import torch.optim as optim
from nltk.tokenize import word_tokenize
from tensorboardX import SummaryWriter
from torch.autograd import Variable

writer = SummaryWriter()
from tqdm import tqdm

nltk.download('popular')
% matplotlib inline


# Set parameters


In [None]:
context_size = 3
embed_size = 50
xmax = 2
alpha = 0.75
batch_size = 20
l_rate = 0.001
num_epochs = 10

# Open and read in text
Reading only first few lines due to memory constrains 

In [None]:
text_file = open('data/testdata_en.txt', 'r')
text = text_file.read()[:1000000].lower()
text_file.close()

# Create vocabulary and word lists


In [None]:

word_list = word_tokenize(text)
vocab = np.unique(word_list)
w_list_size = len(word_list)
vocab_size = len(vocab)


In [None]:
vocab_size

## Create word to index mapping

In [None]:
w_to_i = {word: ind for ind, word in enumerate(vocab)}

# Construct co-occurence matrix
There is a differentiating factor between GloVe and Word2Vec implementation. Unlike Word2Vec which operates by streaming sentences, GloVe operates by co-occurrence matrix. In GloVe the loss is based on word frequency. GloVe and Word2Vec both are having different approaches but often their end results are similar. They generate vectors of similar quality, in some cases, GloVe wins in some Word2Vec.  

In GloVe, we start off with building the co-occurrence matrix. We refer the co-occurrence matrix as $ X $. Such that each element $ X_{ij} $ represents how many time a token  appearing with a token . Such a matrix will be bilaterally asymmetric. The co-occurrence matrix is constructed by keeping the window of some size.  Unlike SkipGram techniques we don't give constant weights to all word in the window. In GloVe, less weight is given to the distant words. This weight change is defined by the following formula:

$$ decay=1/offset  $$

Offset means the distance of context word from the target word. As the offset increases the decay in weight will be proportionally more. 

In [None]:
cooccurrence_matrix = np.zeros((vocab_size, vocab_size))
for i in range(w_list_size):
    ind = w_to_i[word_list[i]]
    for j in range(1, context_size + 1):
        if i - j > 0:
            lind = w_to_i[word_list[i - j]]
            cooccurrence_matrix[ind, lind] += 1.0 / j
        if i + j < w_list_size:
            rind = w_to_i[word_list[i + j]]
            cooccurrence_matrix[ind, rind] += 1.0 / j

In [None]:
# Non-zero co-occurrences
nonzero_occurrence_matrix = np.transpose(np.nonzero(cooccurrence_matrix))

# Glove Model
Please observe how critical element of GloVe like loss function, weight and updates are defined.

In [None]:
# Weight function
def weight_function(x):
	if x < xmax:
		return (x/xmax)**alpha
	return 1

The function looks like as given below when plotted. As shown below, after the fragment $ (x/x_{max})^a  $ grow beyond 1 the weight for such tokens no more increases and applies the same weight to all the frequent word.
![](figures/Weighting_Function.png)


In [None]:
# Set up word vectors and biases
left_weights, right_weights = [
	[Variable(torch.from_numpy(np.random.normal(0, 0.01, (embed_size, 1))),
		requires_grad = True) for j in range(vocab_size)] for i in range(2)]
left_biases, right_biases = [
	[Variable(torch.from_numpy(np.random.normal(0, 0.01, 1)), 
		requires_grad = True) for j in range(vocab_size)] for i in range(2)]

# Set up optimizer
optimizer = optim.Adam(left_weights + right_weights + left_biases + right_biases, lr = l_rate)

In [None]:
# Batch sampling function
def gen_batch():	
	sample = np.random.choice(np.arange(len(nonzero_occurrence_matrix)), size=batch_size, replace=False)
	l_vecs, r_vecs, covals, l_v_bias, r_v_bias = [], [], [], [], []
	for chosen in sample:
		ind = tuple(nonzero_occurrence_matrix[chosen])
		l_vecs.append(left_weights[ind[0]])
		r_vecs.append(right_weights[ind[1]])
		covals.append(cooccurrence_matrix[ind])
		l_v_bias.append(left_biases[ind[0]])
		r_v_bias.append(right_biases[ind[1]])
	return l_vecs, r_vecs, covals, l_v_bias, r_v_bias



# Train model


In [None]:
for epoch in range(num_epochs):
    num_batches = int(w_list_size/batch_size)
    avg_loss = 0.0
    for batch in tqdm(range(num_batches)):
        optimizer.zero_grad()
        l_vecs, r_vecs, covals, l_v_bias, r_v_bias = gen_batch()
        loss = sum([torch.mul((torch.dot(l_vecs[i].view(-1), r_vecs[i].view(-1)) +
                               l_v_bias[i] + r_v_bias[i] - np.log(covals[i]))**2,
                              weight_function(covals[i])) for i in range(batch_size)])
        avg_loss += loss.data[0]/num_batches
        loss.backward()
        optimizer.step()
    print("Average loss for epoch "+str(epoch+1)+": ", avg_loss)

# Writting toTensorBoard

In [None]:
word_array = []
embed_array = []
word_inds = np.random.choice(np.arange(len(vocab)), size=1000, replace=True)
for word_ind in word_inds:
    w_embed = (left_weights[word_ind].data + right_weights[word_ind].data).numpy()
    word_array.append(vocab[word_ind])
    embed_array.append(torch.transpose(torch.Tensor(w_embed),0, 1).numpy())
writer.add_embedding(np.asarray(embed_array).reshape(-1,50), metadata=word_array)
writer.export_scalars_to_json("./all_scalars.json")
writer.close()

When plotted such vectors it looks like as given below.
![](figures/glove_tensorbord.png)

This is just a basic implementation, There are many optimized implementation avaialable for to train  GloVe in the contrained memory. Pleae refer to reference given in the recipe. 