# Sentence Embeddings + Cosine Similarity
### Authored by Aida Mustafanova

In [2]:
import pandas as pd
df = pd.read_csv('evaluation_cases.csv')

In [3]:
# !pip install sentence-transformers


The 'all-MiniLM-L6-v2' encoder is widely used for semantic similarity tasks because it provides a strong trade-off between accuracy and computational efficiency. It maps input sentences into a dense vector space where semantic relationships can be quantified using cosine similarity.

In [4]:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')


Each sentence is encoded into an embedding vector using the model above. Cosine similarity captures the angular distance between the two vectors,
providing a numeric estimate (from 0 to 1) of how similar the sentence meanings are.

In [5]:
def cosine_sim(a, b):
    emb1 = model.encode(a, convert_to_tensor=True)
    emb2 = model.encode(b, convert_to_tensor=True)
    sim = util.cos_sim(emb1, emb2)
    return float(sim)


In [6]:
df["cosine_similarity"] = df.apply(
    lambda row: cosine_sim(row["sent1"], row["sent2"]),
    axis=1
)

df


Unnamed: 0,sent1,sent2,04_score,cosine_similarity
0,the cat sat on the mat,a feline rested atop a rug,0.9,0.562432
1,he ran quickly to the store,he ran quickly to the store,1.0,1.0
2,domestic unrest,political instability in the country,0.75,0.645155
3,turn left at the traffic light,photosynthesis occurs in plant cells,0.0,0.00485
