<a href="https://colab.research.google.com/github/Neverlost0311/nlp-word-embeddings-lab/blob/main/03-contextual-embeddings/lab3_contextual_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3: Contextual Embeddings

## Objective

In this lab, we will understand **contextual embeddings** — how the **same word** can have **different meanings in different sentences**, and how modern embedding models capture this difference.

We will:
- Generate embeddings using Gemini
- Compare sentences with the same word in different contexts
- Measure similarity using cosine similarity
- Observe how meaning changes with context


In [None]:
# ================================
# Cell 1: Install & Import Libraries
# ================================

!pip install -q google-genai numpy scikit-learn

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from google import genai
import os

print("✅ Libraries installed and imported")


✅ Libraries installed and imported


In [2]:
# ================================
# Cell 2: Setup Gemini API
# ================================

from getpass import getpass

API_KEY = getpass("Enter your Gemini API Key: ")
os.environ["GEMINI_API_KEY"] = API_KEY

client = genai.Client(api_key=API_KEY)

print("✅ Gemini client ready")


Enter your Gemini API Key: ··········
✅ Gemini client ready


In [3]:
# ================================
# Cell 3: Embedding Helper Function
# ================================

MODEL_NAME = "models/text-embedding-004"

def get_embedding(text):
    result = client.models.embed_content(
        model=MODEL_NAME,
        contents=[text]
    )
    return np.array(result.embeddings[0].values)

print("✅ Embedding function ready")


✅ Embedding function ready


In [4]:
# ================================
# Cell 4: Same word, different meaning (bank)
# ================================

s1 = "I went to the bank to deposit money."
s2 = "The river bank was full of grass."

e1 = get_embedding(s1)
e2 = get_embedding(s2)

sim = cosine_similarity([e1], [e2])[0][0]

print("Sentence 1:", s1)
print("Sentence 2:", s2)
print("Cosine similarity:", sim)


Sentence 1: I went to the bank to deposit money.
Sentence 2: The river bank was full of grass.
Cosine similarity: 0.49659033064605607


In [5]:
# ================================
# Cell 5: Same word, different meaning (bat)
# ================================

s1 = "He hit the ball with a bat."
s2 = "A bat is flying in the sky."

e1 = get_embedding(s1)
e2 = get_embedding(s2)

sim = cosine_similarity([e1], [e2])[0][0]

print("Sentence 1:", s1)
print("Sentence 2:", s2)
print("Cosine similarity:", sim)


Sentence 1: He hit the ball with a bat.
Sentence 2: A bat is flying in the sky.
Cosine similarity: 0.6623180786917049


In [6]:
# ================================
# Cell 6: Same meaning sentences
# ================================

s1 = "The cat is sleeping on the bed."
s2 = "A cat is lying on the bed."

e1 = get_embedding(s1)
e2 = get_embedding(s2)

sim = cosine_similarity([e1], [e2])[0][0]

print("Sentence 1:", s1)
print("Sentence 2:", s2)
print("Cosine similarity:", sim)


Sentence 1: The cat is sleeping on the bed.
Sentence 2: A cat is lying on the bed.
Cosine similarity: 0.9386673649059654


## Conclusion

We observed that:

- Sentences with the **same word but different meanings** have **lower similarity**
- Sentences with **same meaning** have **higher similarity**
- This proves that **modern embeddings are contextual**, not just word-based
