<h2> Best Open-Source Embedding Models (Local Use) </h2>

| **Model**                | **Size**          | **Use Case**                          | **Library**               |
| ------------------------ | ----------------- | ------------------------------------- | ------------------------- |
| `all-MiniLM-L6-v2`       | \~80MB            | General-purpose sentence embeddings   | SentenceTransformers      |
| `E5-base` / `E5-small`   | \~200MB / \~100MB | Search, semantic similarity           | Hugging Face Transformers |
| `Instructor XL`          | \~1.5GB           | Task-specific embeddings with prompts | Hugging Face              |
| `BGE-M3` / `BGE-base-en` | \~400MB           | Versatile, multilingual embeddings    | Hugging Face              |
| `mpnet-base-v2`          | \~420MB           | Semantic search & clustering          | SentenceTransformers      |


In [None]:
pip install sentence-transformers

In [8]:
from sentence_transformers import SentenceTransformer

# Load a model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode sentences
sentences = ["This is a test.", "Embeddings are useful."]
embeddings = model.encode(sentences)

print(embeddings.shape)


(2, 384)


In [11]:
from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

query = "How does reinforcement learning work?"
candidates = ["Reinforcement learning is...", "Supervised learning is..."]

scores = model.predict([(query, doc) for doc in candidates])
top_docs = [doc for _, doc in sorted(zip(scores, candidates), reverse=True)]
top_docs

['Reinforcement learning is...', 'Supervised learning is...']

In [14]:
import faiss #pip install faiss
import numpy as np
from sentence_transformers import SentenceTransformer

# Load model and encode documents
model = SentenceTransformer('all-MiniLM-L6-v2')
docs = ["Paris is the capital of France.", "Berlin is in Germany.", "Tokyo is in Japan.", "Tokyo is capital of Japan."]
doc_embeddings = model.encode(docs)

# Create FAISS index
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)  # L2 = Euclidean distance (can use cosine too)

index.add(doc_embeddings)  # Store embeddings

# Encode query and search
query = "Capital of Japan"
query_embedding = model.encode([query])
top_k = 2
distances, indices = index.search(query_embedding, top_k)

# Show results
for i in indices[0]:
    print(docs[i])


Tokyo is capital of Japan.
Tokyo is in Japan.


In [27]:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
qdrant = QdrantClient(":memory:")  # In-memory, or use host='localhost'

# Create collection
qdrant.create_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

# Add documents
texts = ["Python is great.", "JavaScript runs in the browser.", "C++ is powerful."]
embeds = model.encode(texts)

points = [
    PointStruct(id=i, vector=embeds[i], payload={"text": texts[i]})
    for i in range(len(texts))
]
qdrant.upsert(collection_name="docs", points=points)

# Query
query = "language used in frontend"
query_vec = model.encode(query)
results = qdrant.query_points(collection_name="docs", query=query_vec, limit=2)

for r in results:
    # print(r.payload["text"], "Score:", r.score)
    print(r[1][0].payload['text'],  "Score:", r[1][0].score)

JavaScript runs in the browser. Score: 0.35208036468145


Implementation Steps for Reranking
Using Cross-Encoder for Reranking

In [None]:
from sentence_transformers import SentenceTransformer
from transformers import CrossEncoder
import numpy as np

# Load cross-encoder model (this will perform query-document reranking)
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Example query and retrieved documents
query = "What is the capital of France?"
retrieved_docs = [
    "Paris is the capital of France.",
    "France is a country in Europe.",
    "Paris is known for its culture and landmarks.",
    "The Eiffel Tower is located in Paris."
]

# Create query-document pairs (query + each retrieved document)
pairs = [(query, doc) for doc in retrieved_docs]

# Use the cross-encoder to get relevance scores for each pair
scores = model.predict(pairs)

# Sort documents based on the relevance score in descending order
sorted_indices = np.argsort(scores)[::-1]  # Sort indices based on scores in descending order

# Get the top-N documents based on reranking
top_k_docs = [retrieved_docs[i] for i in sorted_indices[:2]]  # Top 2 most relevant documents

# Display reranked documents
print("Top-K Reranked Documents:")
for doc in top_k_docs:
    print(doc)
