<a href="https://colab.research.google.com/github/Neverlost0311/nlp-word-embeddings-lab/blob/main/05-word-analogies/lab5_word_analogies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 5: Word Analogies with Embeddings

## Objective

In this lab, we will explore how **word embeddings capture relationships** between words using **vector arithmetic**.

We will demonstrate analogies like:

- king - man + woman ‚âà queen  
- Paris - France + Germany ‚âà Berlin  
- walking - walk + swim ‚âà swimming  

This shows that embeddings encode **semantic relationships**, not just similarity.


In [1]:
# ================================
# Cell 1: Install & Import Libraries
# ================================

!pip install -q google-genai numpy scikit-learn

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from google import genai
import os

print("‚úÖ Libraries installed and imported")


‚úÖ Libraries installed and imported


In [2]:
# ================================
# Cell 2: Setup Gemini API Key
# ================================

from getpass import getpass

API_KEY = getpass("Enter your Gemini API Key: ")

os.environ["GEMINI_API_KEY"] = API_KEY
client = genai.Client(api_key=API_KEY)

print("‚úÖ Gemini client created")


Enter your Gemini API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
‚úÖ Gemini client created


In [3]:
# ================================
# Cell 3: Embedding Helper
# ================================

MODEL_NAME = "models/text-embedding-004"

def get_embedding(text):
    result = client.models.embed_content(
        model=MODEL_NAME,
        contents=[text]
    )
    return np.array(result.embeddings[0].values)

print("‚úÖ Embedding function ready")


‚úÖ Embedding function ready


In [5]:
# ================================
# Cell 4: Analogy Solver
# ================================

def solve_analogy(a, b, c, candidates):
    """
    Solves: a - b + c ‚âà ?
    """
    print(f"\nüîç Solving: {a} - {b} + {c} = ?")

    ea = get_embedding(a)
    eb = get_embedding(b)
    ec = get_embedding(c)

    target = ea - eb + ec

    best_word = None
    best_score = -1

    for word in candidates:
        ew = get_embedding(word)
        score = cosine_similarity([target], [ew])[0][0]

        print(f"Similarity with {word}: {score:.4f}")

        if score > best_score:
            best_score = score
            best_word = word

    print(f"\n‚úÖ Best match: {best_word}")
    return best_word


In [6]:
# ================================
# Cell 5: Test 1 ‚Äî king - man + woman
# ================================

candidates = ["queen", "princess", "man", "woman", "king", "prince"]

solve_analogy("king", "man", "woman", candidates)



üîç Solving: king - man + woman = ?
Similarity with queen: 0.6000
Similarity with princess: 0.5823
Similarity with man: 0.0621
Similarity with woman: 0.6003
Similarity with king: 0.7232
Similarity with prince: 0.4080

‚úÖ Best match: king


'king'

In [7]:
# ================================
# Cell 6: Test 2 ‚Äî Paris - France + Germany
# ================================

candidates = ["Berlin", "Munich", "Paris", "Rome", "Madrid", "Germany"]

solve_analogy("Paris", "France", "Germany", candidates)



üîç Solving: Paris - France + Germany = ?
Similarity with Berlin: 0.7613
Similarity with Munich: 0.6708
Similarity with Paris: 0.6336
Similarity with Rome: 0.4595
Similarity with Madrid: 0.5197
Similarity with Germany: 0.8125

‚úÖ Best match: Germany


'Germany'

In [8]:
# ================================
# Cell 7: Test 3 ‚Äî walking - walk + swim
# ================================

candidates = ["swimming", "swim", "walk", "running", "jumping"]

solve_analogy("walking", "walk", "swim", candidates)



üîç Solving: walking - walk + swim = ?
Similarity with swimming: 0.9474
Similarity with swim: 0.9527
Similarity with walk: 0.4535
Similarity with running: 0.5242
Similarity with jumping: 0.5140

‚úÖ Best match: swim


'swim'

## Conclusion

We observed that:

- Embeddings support **vector arithmetic**
- Relationships like:
  - gender (king ‚Üí queen)
  - geography (Paris ‚Üí Berlin)
  - verb tense (walk ‚Üí walking)
  are encoded in vector space

This proves that **modern embeddings capture semantic relationships**, not just similarity.
