# 📌 Embeddings & Similarity Score Demo
**Objectives:**
- Understand what sentence embeddings are and how they work
- Use pre-trained HuggingFace `sentence-transformers`
- Compute cosine similarity between sentence pairs
- Visualize semantic similarity for interpretation

📍 *Note: We use **sentence embeddings** (not token-level) because they are optimized for capturing overall meaning of a sentence — ideal for social science tasks like stance detection, interview analysis, and topic clustering.*

In [None]:
# ✅ Setup
!pip install -q sentence-transformers scikit-learn

In [None]:
# ✅ Imports
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pandas as pd

In [None]:
# ✅ Load pre-trained embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

In [None]:
# ✏️ Define your sentences here
sentences = [
    "The government passed a new education bill.",
    "Parliament approved a law about education.",
    "Cats are wonderful pets.",
    "Dogs make great companions."
]

In [None]:
# ✅ Generate embeddings
embeddings = model.encode(sentences)

In [None]:
# ✅ Compute pairwise cosine similarity
similarity_matrix = cosine_similarity(embeddings)
pd.DataFrame(similarity_matrix, index=sentences, columns=sentences)

In [None]:
# 📊 Visualize the similarity matrix
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
sns.heatmap(similarity_matrix, xticklabels=sentences, yticklabels=sentences, annot=True, cmap='coolwarm')
plt.title('Cosine Similarity Between Sentences')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

### 🧠 Interpretation:
- Higher scores (~1.0) → similar meaning
- Lower scores (~0.0 or negative) → dissimilar meaning
- You can test political stance, sentiment, or paraphrase detection using this

In [None]:
# 🔁 Try your own!
# Add more sentences or tweak them to see how similarity changes
more_sentences = [
    "Healthcare reform was discussed in Parliament.",
    "I love eating fresh mangoes."
]
more_embeddings = model.encode(more_sentences)
more_similarity = cosine_similarity(more_embeddings, embeddings)
pd.DataFrame(more_similarity, index=more_sentences, columns=sentences)