# Semantic Projections Assignment 📘

This notebook will guide you through an exercise on **semantic projections** using pre-trained word embeddings.  
We will:
1. Download pre-trained embeddings (GloVe).
2. Define a semantic direction (e.g., gender).
3. Project words onto this direction.
4. Visualize the results in 2D using PCA.
5. Reflect on the meaning of these projections.

---

### In the commented code where it says **TODO**, you should complete it and make it functional.

## 1. Setup
We will start by installing the required libraries and downloading GloVe embeddings.

In [None]:
import gensim.downloader as api

# Load pretrained GloVe (100d)
embeddings_index = api.load("glove-wiki-gigaword-100")
print(f"Loaded {len(embeddings_index.key_to_index)} word vectors.")

## 2. Defining a Semantic Direction
To explore projections, we need a **semantic direction**.
For this example, let's define the **gender direction**:
$$ d_{gender} = \\vec{he} - \\vec{she} $$

In [None]:
import numpy as np

# Define the gender direction
gender_direction = embeddings_index["he"] - embeddings_index["she"]
print("Gender direction vector created (shape):", gender_direction.shape)

In [None]:
# Helper function to create a semantic direction between any two words
def semantic_direction(word1, word2, embeddings):
    return embeddings[word1] - embeddings[word2]

# Example: royalty direction
royalty_direction = semantic_direction("king", "queen", embeddings_index)
print("Royalty direction vector created (shape):", royalty_direction.shape)

## 3. Projecting Words
The projection of a word embedding onto the semantic direction is given by:
$$ \\text{proj}_{d}(w) = \\frac{w \\cdot d}{\\|d\\|^2} \\, d $$

In [None]:
# Function to compute projection of a word onto a direction
def project_word(word, direction, embeddings):
    vec = embeddings[word]
    projection = (np.dot(vec, direction) / np.dot(direction, direction)) * direction
    return projection

# Test with some words
words_to_test = ["man", "woman", "king", "queen", "doctor", "nurse"]
for w in words_to_test:
    proj = project_word(w, gender_direction, embeddings_index)
    score = np.dot(embeddings_index[w], gender_direction)
    print(f"{w}: projection score along gender axis = {score:.4f}")

## 4. Visualization in 2D (PCA)
We will project embeddings and their semantic projections into **2D space** using PCA for visualization.

In [None]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

def visualize_projections(words, direction, embeddings, title):
    vectors = [embeddings[w] for w in words]
    projections = [project_word(w, direction, embeddings) for w in words]
    
    pca = PCA(n_components=2)
    all_points = np.vstack([vectors, projections])
    reduced = pca.fit_transform(all_points)
    
    n = len(words)
    orig_points = reduced[:n]
    proj_points = reduced[n:]
    
    plt.figure(figsize=(8,6))
    for i, word in enumerate(words):
        plt.scatter(orig_points[i,0], orig_points[i,1], color='blue')
        plt.text(orig_points[i,0]+0.02, orig_points[i,1], word, fontsize=10)
        plt.scatter(proj_points[i,0], proj_points[i,1], color='red', marker='x')
        plt.plot([orig_points[i,0], proj_points[i,0]], [orig_points[i,1], proj_points[i,1]], 'k--', alpha=0.5)
    
    plt.title(title)
    plt.xlabel("PCA 1")
    plt.ylabel("PCA 2")
    plt.show()

# Example visualization: gender axis
visualize_projections(words_to_test, gender_direction, embeddings_index, "Word Embeddings and Projections onto Gender Direction")

## 5. Reflection
- What do you observe about the position of words like *king/queen* and *man/woman*?
- Do professions like *doctor* or *nurse* show bias when projected?
- How could you use this technique to **debias embeddings**?
---
## ✅ Final Task: Extend this notebook by trying a different semantic direction, e.g.,
$$ d_{royalty} = \\vec{king} - \\vec{queen} $$, and analyze the projections.

In [None]:
# Example with royalty direction
royalty_words = ["king", "queen", "prince", "princess", "man", "woman"]
visualize_projections(royalty_words, royalty_direction, embeddings_index, "Word Embeddings and Projections onto Royalty Direction")