<a href="https://colab.research.google.com/github/PassionateAbdullah/RAG-LLM-Langchain/blob/main/similarity_metrics_in_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Types of similarity metrics

1. Cosine Similarity:
Measures the cosine of the angle between two vectors.

Values range from -1 (opposite) to 1 (identical). In RAG, we usually normalize vectors, so it ranges from 0 to 1.
2. Dot Product:
Measures the raw projection of one vector onto another.

Larger values mean greater similarity.

3. Euclidean Distance:
Measures the straight-line distance between two points in space.

Lower values mean more similar.

In [1]:
import numpy as np

# Sample vectors (e.g., query and document embeddings)
query = np.array([1.0, 2.0, 3.0])
doc = np.array([2.0, 4.0, 6.0])

# 1. Cosine Similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# 2. Dot Product
def dot_product(a, b):
    return np.dot(a, b)

# 3. Euclidean Distance
def euclidean_distance(a, b):
    return np.linalg.norm(a - b)

# Test
print("Cosine Similarity:", cosine_similarity(query, doc))   # -> 1.0
print("Dot Product:", dot_product(query, doc))               # -> 28.0
print("Euclidean Distance:", euclidean_distance(query, doc)) # -> 3.7416

Cosine Similarity: 1.0
Dot Product: 28.0
Euclidean Distance: 3.7416573867739413


🔄 When to Use What?
Metric

Use Case

Pros

Cons

Cosine Similarity	Standard in NLP & embedding search

Scale-invariant,

 interpretable

 Slightly more expensive

Dot Product	Fast retrieval with normalized vectors	Very efficient,

 used in FAISS

 Needs normalization


Euclidean Distance	Clustering,

 spatial proximity

 Intuitive	Sensitive to vector magnitude
