# 📘 Day 2 – LLMs as Semantic Instruments

This notebook explores how LLMs encode and measure meaning using embeddings, and compares interpretive (Gemini) vs quantitative (Hugging Face) approaches.

> Core goal: Understand how sentence meaning is computed, visualized, and contrasted.

**Sections:**
1. Gemini Meaning Probes
2. Hugging Face Embeddings
3. Meaning Matrix Heatmap
4. Semantic Drift
5. Annotator Simulation (optional)
6. Recap and What’s Next

## 1️⃣ Gemini Meaning Probes

Use Google Gemini (via API) to interpret sentence meaning.

➡️ Requires a free [Gemini API key](https://makersuite.google.com/app/apikey).

In [None]:
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY_HERE")  # Replace with your key
model = genai.GenerativeModel("gemini-pro")

prompt = "Compare the meaning of: 'The minister supported the bill.' and 'The minister opposed the bill.'"
response = model.generate_content(prompt)
print(response.text)

## 2️⃣ Hugging Face Sentence Embeddings

We now compute vector representations using `sentence-transformers`.


> 🔍 **Why `all-MiniLM-L6-v2`?**
>
> - It is **small and fast**, ideal for live demos or classroom settings.
> - Trained specifically for **semantic similarity tasks**, making it highly effective for comparing sentence meanings.
> - Outputs **384-dimensional vectors**, balancing speed with representational depth.
> - Part of the `sentence-transformers` library, maintained by Hugging Face and SBERT.net.
> - Pretrained on **general and question-answer datasets**, making it robust for diverse domains.


In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = [
    "The minister supported the bill.",
    "The minister opposed the bill.",
    "The bill was popular among voters.",
    "Many citizens disagreed with the proposal."
]

embeddings = model.encode(sentences)
print("Embedding shape:", embeddings.shape)

## 3️⃣ Meaning Matrix Heatmap

We compute cosine similarity between sentence embeddings and plot a matrix.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import seaborn as sns
import matplotlib.pyplot as plt

sim_matrix = cosine_similarity(embeddings)

plt.figure(figsize=(8, 6))
sns.heatmap(sim_matrix, xticklabels=sentences, yticklabels=sentences, annot=True, cmap="coolwarm")
plt.title("Cosine Similarity Between Sentences")
plt.show()

## 4️⃣ Semantic Drift Demo

Example showing how small changes in wording can change meaning vectors.

In [None]:
drift_sentences = [
    "The protest was peaceful.",
    "The riot turned violent.",
    "The demonstration gathered thousands.",
    "The violent clash disrupted the city."
]

drift_embeddings = model.encode(drift_sentences)
drift_sim = cosine_similarity(drift_embeddings)

plt.figure(figsize=(8, 6))
sns.heatmap(drift_sim, xticklabels=drift_sentences, yticklabels=drift_sentences, annot=True, cmap="YlGnBu")
plt.title("Semantic Drift – Framing Differences")
plt.show()

## 5️⃣ Annotator Disagreement Simulation *(Optional)*

Simulate two annotators assigning sentiment to the same set of sentences.

In [None]:
coder_A = [
    "The project was successful.",
    "The project had issues.",
    "The plan worked well.",
    "The initiative was flawed."
]

coder_B = [
    "The plan was a success.",
    "There were serious flaws.",
    "The outcome was positive.",
    "The result was problematic."
]

emb_A = model.encode(coder_A)
emb_B = model.encode(coder_B)

from sklearn.decomposition import PCA

combined = np.vstack([emb_A, emb_B])
labels = ['A']*4 + ['B']*4
pca = PCA(n_components=2)
reduced = pca.fit_transform(combined)

plt.figure(figsize=(6, 5))
for i, label in enumerate(labels):
    plt.scatter(reduced[i, 0], reduced[i, 1], label=f"{label} {i%4 + 1}")
plt.title("Annotator Meaning Space")
plt.legend()
plt.show()

## 🔁 Recap & What’s Next

- Gemini shows qualitative interpretation
- HF gives numerical vectors
- Cosine and PCA reveal deep semantic structure

➡️ In Session 2, we’ll use these vectors for classification, clustering, and retrieval.

---