
# Lab-05 — Embeddings (Sentence Similarity & Mini Search)

Welcome! In this lab you will learn how **embeddings** capture **meaning** in numbers. You will compare **sentence similarity**, visualize vector spaces and build a mini semantic search engine.


**AI Demystified: Decoding Models, Compute, and Connectivity**

**Idea.** An *embedding* turns text into a vector of numbers so we can measure **semantic similarity**.  
In this lab we’ll:
- Convert sentences to embeddings with a tiny model
- Compare meanings via **cosine similarity**
- Do a **mini semantic search** over a tiny corpus

*That’s all it takes!* (No API keys; runs on CPU.)


## 1) Setup

In [None]:
!pip -q install -U sentence-transformers

In [None]:
from sentence_transformers import SentenceTransformer

In [None]:
import numpy as np

In [None]:
import torch

In [None]:
print(torch.__version__)

## 2) Load a small embedding model

In [None]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

In [None]:
print(model.get_sentence_embedding_dimension())

## 3) Encode a few sentences

In [None]:

sentences = [
    "buy cat food online",                  # s0
    "purchase pet supplies on the web",     # s1 (similar to s0)
    "apply for a car loan"                  # s2 (different topic)
]


In [None]:
emb = model.encode(sentences, convert_to_numpy=True, normalize_embeddings=True)

In [None]:
print(type(emb))

In [None]:
print(emb.shape)

In [None]:
print(emb[0].shape)

## 4) Cosine similarity (higher = more similar)

In [None]:
sim_0_1 = float(np.dot(emb[0], emb[1]))

In [None]:
print(sim_0_1)

In [None]:
sim_0_2 = float(np.dot(emb[0], emb[2]))

In [None]:
print(sim_0_2)

## 5) Mini semantic search (top‑k)

In [None]:

corpus = [
    "cat food 2kg bag",
    "kitten toys assorted pack",
    "dog leash and harness",
    "best mortgage interest rates",
    "how to refinance a car loan",
    "fast delivery pet supplies"
]


In [None]:
query = "order pet food online"

In [None]:
C = model.encode(corpus, convert_to_numpy=True, normalize_embeddings=True)

In [None]:
q = model.encode([query], convert_to_numpy=True, normalize_embeddings=True)[0]

In [None]:
print(C.shape)

In [None]:
print(q.shape)

In [None]:
scores = C @ q

In [None]:
print(scores.shape)

In [None]:
print(scores)

In [None]:

topk = 3
idx = np.argsort(-scores)[:topk]
results = [(corpus[i], float(scores[i])) for i in idx]


In [None]:
print(results[0])

In [None]:
print(results[1])

In [None]:
print(results[2])