## **Assignment: Week 4**



## **1. Business Understanding**
The goal of this assignment is to understand the use of pretrained word embeddings and gain a deeper understanding of word representations. Specifically, the objective is to analyze word vectors using GloVe embeddings, compute a vector operation that combines semantic properties of words, and interpret the results.


## **2. Data Understanding**
### Dataset:
**GloVe (Global Vectors for Word Representation)**: `glove.6B.100d.txt`, a file containing pretrained word vectors of 100 dimensions.

GloVe is a pretrained word embedding model based on statistical properties of large corpora. It represents words as vectors in a way that preserves semantic and syntactic relationships.


## **3. Data Preparation**

### Code to Load and Prepare the Data:

In [1]:
import numpy as np

# Function to load GloVe embeddings
def load_glove_model(file_path):
    embeddings = {}
    with open(file_path, "r", encoding="utf-8") as f:
        for line in f:
            values = line.split()
            word = values[0]
            vector = np.array(values[1:], dtype='float32')
            embeddings[word] = vector
    return embeddings

# Path to the GloVe file
glove_path = "glove.6B.100d.txt"
glove_embeddings = load_glove_model(glove_path)

# Check if target words are in the vocabulary
words_to_check = ["man", "woman", "king"]
for word in words_to_check:
    if word in glove_embeddings:
        print(f"Found: {word}")
    else:
        print(f"Not found: {word}")

Found: man
Found: woman
Found: king


## 4. Modeling
Explanation:
The vector operation vec("woman") - vec("man") + vec("king") combines semantic relationships. The nearest words to the resulting vector will be found using cosine similarity as the distance metric.

In [2]:
from scipy.spatial.distance import cosine

# Compute the resulting vector
vec_woman = glove_embeddings["woman"]
vec_man = glove_embeddings["man"]
vec_king = glove_embeddings["king"]

result_vector = vec_woman - vec_man + vec_king

# Function to find the closest vectors using cosine similarity
def find_closest_vectors(embeddings, target_vector, top_n=5):
    distances = {}
    for word, vector in embeddings.items():
        distances[word] = cosine(target_vector, vector)
    closest = sorted(distances.items(), key=lambda x: x[1])[:top_n]
    return closest

# Find the nearest words
nearest_words = find_closest_vectors(glove_embeddings, result_vector)

print("Closest words to the resulting vector:")
for word, score in nearest_words:
    print(f"{word}: {score:.4f}")


Closest words to the resulting vector:
king: 0.1448
queen: 0.2166
monarch: 0.3066
throne: 0.3167
daughter: 0.3191


## 5. Evaluation
Analysis of Results:
What is the closest word?
The word closest to the computed vector likely represents the semantic concept of "a female ruler," given the inputs of woman, man, and king.

Interpretation of the Vector Operation:
The operation vec("woman") - vec("man") + vec("king") translates the gendered concept of king into its female equivalent, while preserving the notion of royalty. This demonstrates how GloVe embeddings capture semantic and syntactic relationships between words.