## Finding Similar Words Using GloVe Embeddings ##
The purpose of this assignment is to find the most similar words using the GloVe model. While the CBOW model typically predicts a target word based on its context words, this project leverages pre-trained word embeddings from the GloVe model to enhance similarity comparisons.  
Project Overview
This implementation:   
* Utilizes pre-trained GloVe embeddings that capture rich semantic relationships  
* Applies vector similarity metrics (cosine similarity) to identify words with similar meanings.



In [1]:
# Import required libraries
import torch
import torchtext
from torchtext.vocab import GloVe as GloVeVectors
from scipy.spatial.distance import cosine
import numpy as np
import pandas as pd

In [2]:
# Load pre-trained word embeddings 
# Load pre-trained GloVe embeddings (50-dimensional)
glove = GloVeVectors(name="6B", dim=50)

In [12]:
# function to get word vectors 
def words_vectors(word):
    # ensure word index exists 
    if word in glove.stoi:
        # Get an index of the word in the vocabulary
        index = glove.stoi[word]
        return glove.vectors[index].numpy()
    else:
        print(word, "is not in the vocabulary")
        # return zero vector with same dimensions 
        return np.zeros(glove.vectors.shape[1])

# function to calculate cosine similaries 
def cosine_similarity(word1, word2):
    vec1 = words_vectors(word1)
    vec2 = words_vectors(word2)
    # Convert numpy arrays to Pytorch tensors
    vec1_tensor = torch.tensor(vec1)
    vec2_tensor = torch.tensor(vec2)
    # unsequeeze 
    vec1_tensor = vec1_tensor.unsqueeze(0)
    vec2_tensor = vec2_tensor.unsqueeze(0)
    # using building built-in function 
    cos_sim = torch.cosine_similarity(vec1_tensor,vec2_tensor)
    return cos_sim.item()
    


In [13]:
# words for testing
words = ['dog', 'whale', 'before', 'however', 'fabricate']
# print each pair with similaries 
for i, word1 in enumerate(words):
    for j, word2 in enumerate(words):
        if i != j:
            similarity = cosine_similarity(word1, word2)
            print("Cosine similarity: ", word1, " and ", word2, "is: ", similarity)
        

Cosine similarity:  dog  and  whale is:  0.6506090760231018
Cosine similarity:  dog  and  before is:  0.419243723154068
Cosine similarity:  dog  and  however is:  0.36520859599113464
Cosine similarity:  dog  and  fabricate is:  -0.02227747067809105
Cosine similarity:  whale  and  dog is:  0.6506090760231018
Cosine similarity:  whale  and  before is:  0.20935004949569702
Cosine similarity:  whale  and  however is:  0.17823582887649536
Cosine similarity:  whale  and  fabricate is:  -0.09092274308204651
Cosine similarity:  before  and  dog is:  0.419243723154068
Cosine similarity:  before  and  whale is:  0.20935004949569702
Cosine similarity:  before  and  however is:  0.78252112865448
Cosine similarity:  before  and  fabricate is:  -0.1622106283903122
Cosine similarity:  however  and  dog is:  0.36520859599113464
Cosine similarity:  however  and  whale is:  0.17823582887649536
Cosine similarity:  however  and  before is:  0.78252112865448
Cosine similarity:  however  and  fabricate is: 

## The highest cosine similarity in this word set is between "before" and "however" at 0.78252, or approximately 78.3% ##