# Sentence Similarity with Transformers

This notebook demonstrates how to use the `sentence-transformers` library to calculate similarity between sentences using embeddings.

## 1. Setup and Model Loading

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np

# Load the pre-trained model
model = SentenceTransformer("all-MiniLM-L6-v2")

## 2. Define Sentences

We have a list of sentences from different categories: AI, Food, Sports, and Tech.

In [None]:
sentences = [
    # AI
    "Artificial intelligence is transforming the world.",
    "Machine learning enables computers to learn from data.",
    "Deep learning uses neural networks.",

    # Food
    "Pizza is my favorite food.",
    "I love eating pasta.",
    "Burgers taste delicious.",

    # Sports
    "Cricket is very popular in Pakistan.",
    "Football is played worldwide.",
    "Lionel Messi is a famous footballer.",

    # Tech
    "Cloud computing is scalable.",
    "Azure provides AI services.",
    "Kubernetes manages containers.",

    # Random
    "The sky is blue.",
    "I enjoy reading books.",
    "Dogs are loyal animals."
]

## 3. Generate Embeddings

The model converts each sentence into a high-dimensional vector (embedding).

In [None]:
embeddings = model.encode(sentences)
print(f"Embeddings shape: {embeddings.shape}")

## 4. Similarity Calculation

We define a function for cosine similarity and calculate the similarity matrix for all pairs of sentences.

In [None]:
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

n = len(embeddings)
similarity_matrix = np.zeros((n, n))

for i in range(n):
    for j in range(n):
        similarity_matrix[i][j] = cosine_similarity(embeddings[i], embeddings[j])

print("Similarity matrix calculated.")

## 5. Semantic Search Query

We can now search for the most relevant sentence to a new query.

In [None]:
query = "I enjoy playing football."
query_embedding = model.encode([query])[0]

scores = []
for emb in embeddings:
    scores.append(cosine_similarity(query_embedding, emb))

print(f"Query: {query}\n")
for i, score in enumerate(scores):
    print(f"{sentences[i]} â†’ {score:.3f}")