### ReRanking

Reranking is a two-stage search process that first retrieves a set of candidate documents and then uses a more sophisticated model to reorder them based on deeper semantic relevance. 

Once the documents are retrieved from the Vector database, we are again reranking the documents for improving the relevance and reducing the Noise and Improve efficiency.

The intuition behind a bi-encoder's inferior accuracy is that bi-encoders must compress all of the possible meanings of a document into a single vector — meaning we lose information. Additionally, bi-encoders have no context on the query because we don't know the query until we receive it (we create embeddings before user query time).

On the other hand, a reranker can receive the raw information directly into the large transformer computation, meaning less information loss. Because we are running the reranker at user query time, we have the added benefit of analyzing our document's meaning specific to the user query — rather than trying to produce a generic, averaged meaning.

Rerankers avoid the information loss of bi-encoders — but they come with a different penalty — time.

Ref: https://medium.com/@sujathamudadla1213/bi-encoder-vs-cross-encoder-when-to-use-which-one-4a20edbe6d37

#### Why reranking is important

##### Improves accuracy: 
It goes beyond the limitations of a single embedding by evaluating the relevance of the query and document in tandem, leading to more accurate results. 
##### Handles semantic ambiguity: 
Reranking can better distinguish between different meanings of words or phrases in a query, leading to more precise results than initial retrieval alone might provide. For example, a search for "install Python" can be disambiguated from "Python snake habitats" by a reranker. 
##### Enhances RAG systems: 
By ensuring the best information is passed to a large language model, reranking significantly improves the quality and accuracy of the model's responses. 
##### Increases efficiency: 
It's a cost-effective way to improve results because it applies expensive, sophisticated models to a smaller subset of documents, rather than the entire dataset

In [2]:
!pip3 install sentence-transformers


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
import os
os.environ["HF_TOKEN"] = HF_TOKEN

In [9]:
from sentence_transformers import SentenceTransformer

model_name = "sentence-transformers/paraphrase-xlm-r-multilingual-v1"
model = SentenceTransformer(model_name)
sentences = ["The weather is lovely today.", " It's so sunny outside!"]
document_embeddings = model.encode(sentences)
len(document_embeddings[0])

768

In [10]:
query = "What is the weather in Tokyo?"
query_embedding = model.encode(query)

# Compute dot product between query and document embeddings
# dot_product = np.dot(query_embedding, document_embeddings)



In [16]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity between query and document embeddings
# Reshape query_embedding to 2D array (1 sample, n features)
similarity_scores = cosine_similarity(query_embedding.reshape(1, -1), document_embeddings)
similarity_scores

array([[0.23531786, 0.23735909]], dtype=float32)

In [None]:
most_similar_index = np.argmax(similarity_scores)
most_similar_document = sentences[most_similar_index]
print(most_similar_document)


 It's so sunny outside!


In [22]:
sorted_indices = np.argsort(similarity_scores[0])[::-1]
ranked_documents = [(sentences[i], similarity_scores[0][i]) for i in sorted_indices]
print(ranked_documents)


[(" It's so sunny outside!", 0.23735909), ('The weather is lovely today.', 0.23531786)]


In [23]:
print("Ranked Documents: ")
for rank, (doc, score) in enumerate(ranked_documents, start=1):
    print(f"{rank}. Document: {doc}, Score: {score:.4f}")

Ranked Documents: 
1. Document:  It's so sunny outside!, Score: 0.2374
2. Document: The weather is lovely today., Score: 0.2353


In [25]:
print("Top 2 Documents: ")
for rank, (doc, score) in enumerate(ranked_documents[:2], start=1):
    print(f"{rank}. Document: {doc}, Score: {score:.4f}")

Top 2 Documents: 
1. Document:  It's so sunny outside!, Score: 0.2374
2. Document: The weather is lovely today., Score: 0.2353


In [26]:
!pip3 install rank_bm25

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [27]:
from rank_bm25 import BM25Okapi

top_2_documents = [doc for doc, _ in ranked_documents[:2]]


In [29]:
tokenized_top_2_docs = [doc.split() for doc in top_2_documents]
tokenized_top_2_docs

[["It's", 'so', 'sunny', 'outside!'],
 ['The', 'weather', 'is', 'lovely', 'today.']]

In [32]:
query = "How is the weather today?"
tokenized_query = query.split()
tokenized_query

['How', 'is', 'the', 'weather', 'today?']

In [33]:
bm25 = BM25Okapi(tokenized_top_2_docs)

# Compute BM25 scores for the query
bm25_scores = bm25.get_scores(tokenized_query)

# Get the top 2 documents with highest BM25 scores
bm25_scores

array([0., 0.])

In [34]:
sorted_indices = np.argsort(bm25_scores)[::-1]
ranked_documents = [(top_2_documents[i], bm25_scores[i]) for i in sorted_indices]
print(ranked_documents)

print("Ranked Documents: ")
for rank, (doc, score) in enumerate(ranked_documents, start=1):
    print(f"{rank}. Document: {doc}, Score: {score:.4f}")

[('The weather is lovely today.', 0.0), (" It's so sunny outside!", 0.0)]
Ranked Documents: 
1. Document: The weather is lovely today., Score: 0.0000
2. Document:  It's so sunny outside!, Score: 0.0000


### Using Cross Encoder

The input of the model always consists of a data pair, for example two sentences, one is query and other is document, the outputs a value between 0 and 1 indicating the similarity score between two sentences.

Ref: https://sbert.net/examples/cross_encoder/applications/README.html

In [36]:
from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

query = "What is the weather in Tokyo?"
document = ["The weather is lovely today in tokyo and its sunny too.", "The weather is lovely today in mumbai and its sunny too.", "The weather is lovely today in japan."]
pairs = [[query, doc] for doc in document]
scores = model.predict(pairs)
scores

array([ 7.511463 , -4.9981213,  1.7217119], dtype=float32)

In [38]:
scored_docs = zip(scores, document)
reranked_docs = sorted(scored_docs, key=lambda x: x[0], reverse=True)
reranked_docs


[(7.511463, 'The weather is lovely today in tokyo and its sunny too.'),
 (1.7217119, 'The weather is lovely today in japan.'),
 (-4.9981213, 'The weather is lovely today in mumbai and its sunny too.')]

### Cohere API

In [39]:
!pip3 install cohere

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting cohere
  Downloading cohere-5.20.0-py3-none-any.whl.metadata (3.4 kB)
Collecting fastavro<2.0.0,>=1.9.4 (from cohere)
  Downloading fastavro-1.12.1-cp311-cp311-macosx_10_9_universal2.whl.metadata (5.5 kB)
Downloading cohere-5.20.0-py3-none-any.whl (303 kB)
Downloading fastavro-1.12.1-cp311-cp311-macosx_10_9_universal2.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m9.9 MB/s[0m  [33m0:00:00[0m
[?25hInstalling collected packages: fastavro, cohere
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [cohere]2m1/2[0m [cohere]
[1A[2KSuccessfully installed cohere-5.20.0 fastavro-1.12.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
import cohere

co = cohere.Client('YOUR_API_KEY')

response = co.rerank(
    model='rerank-english-v3.0',
    query="What is the capital of France?",
    documents=["Paris is the capital of France.", "Paris is the capital of France.", "Paris is the capital of France."],
    top_n=2,
    return_documents=True
)
