# Hybrid Search:

**Combination of**


1.   Semantic Search -> Dense Vector Search
2.   Syntatic Search -> Sparse Vector Search -> Exact Search -> Keyword Search.

We use Reciprocal Rank Fusion In Hybrid Search

## Reciprocal Rank Fusion (RRF):
The RRF score is calculated by taking the sum of the reciprocal rankings that is given from each list. By putting the rank of the document in the denominator, it penalizes the documents that are ranked lower in the list.

[HYBRID SEARCH](https://www.youtube.com/watch?v=CK0ExcCWDP4&t=2230s)

[RERANK](https://www.youtube.com/watch?v=Uh9bYiVrW_s&list=PLuqoMuyL16cXOMakHBMk_GMRob85yNOV5&index=119)

[HYBRID+RERANK](https://medium.com/@nadikapoudel16/advanced-rag-implementation-using-hybrid-search-reranking-with-zephyr-alpha-llm-4340b55fef22)

In [11]:
!pip install --upgrade --quiet  pypdf langchain langchain-community langchain-core langchain-experimental langchain-text-splitters langchain-openai pinecone-client pinecone-text pinecone-notebooks

In [23]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY']= userdata.get('OPENAI_API_KEY')
pinecone = userdata.get('pinecone')

# Load the Document

In [9]:
from langchain_community.document_loaders import PyPDFLoader

file_path = '/content/yolov10.pdf'
loader = PyPDFLoader(file_path=file_path)
pages = loader.load()

In [10]:
pages[0]

Document(metadata={'source': '/content/yolov10.pdf', 'page': 0}, page_content='YOLOv10: Real-Time End-to-End Object Detection\nAo Wang Hui Chen∗Lihao Liu Kai Chen Zijia Lin\nJungong Han Guiguang Ding∗\nTsinghua University\n/uni00000015/uni00000011/uni00000018 /uni00000018/uni00000011/uni00000013 /uni0000001a/uni00000011/uni00000018 /uni00000014/uni00000013/uni00000011/uni00000013 /uni00000014/uni00000015/uni00000011/uni00000018 /uni00000014/uni00000018/uni00000011/uni00000013 /uni00000014/uni0000001a/uni00000011/uni00000018 /uni00000015/uni00000013/uni00000011/uni00000013\n/uni0000002f/uni00000044/uni00000057/uni00000048/uni00000051/uni00000046/uni0000005c/uni00000003/uni0000000b/uni00000050/uni00000056/uni0000000c/uni00000016/uni0000001a/uni00000011/uni00000018/uni00000017/uni00000013/uni00000011/uni00000013/uni00000017/uni00000015/uni00000011/uni00000018/uni00000017/uni00000018/uni00000011/uni00000013/uni00000017/uni0000001a/uni00000011/uni00000018/uni00000018/uni00000013/uni00000011

In [12]:
len(pages)

18

In [14]:
type(pages)

list

# Split the Document

In [45]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=8000,
    chunk_overlap=3000,
    length_function=len,
    is_separator_regex=False,
)

In [46]:
docs = text_splitter.split_documents(pages)

In [47]:
docs[0]

Document(metadata={'source': '/content/yolov10.pdf', 'page': 0}, page_content='YOLOv10: Real-Time End-to-End Object Detection\nAo Wang Hui Chen∗Lihao Liu Kai Chen Zijia Lin\nJungong Han Guiguang Ding∗\nTsinghua University\n/uni00000015/uni00000011/uni00000018 /uni00000018/uni00000011/uni00000013 /uni0000001a/uni00000011/uni00000018 /uni00000014/uni00000013/uni00000011/uni00000013 /uni00000014/uni00000015/uni00000011/uni00000018 /uni00000014/uni00000018/uni00000011/uni00000013 /uni00000014/uni0000001a/uni00000011/uni00000018 /uni00000015/uni00000013/uni00000011/uni00000013\n/uni0000002f/uni00000044/uni00000057/uni00000048/uni00000051/uni00000046/uni0000005c/uni00000003/uni0000000b/uni00000050/uni00000056/uni0000000c/uni00000016/uni0000001a/uni00000011/uni00000018/uni00000017/uni00000013/uni00000011/uni00000013/uni00000017/uni00000015/uni00000011/uni00000018/uni00000017/uni00000018/uni00000011/uni00000013/uni00000017/uni0000001a/uni00000011/uni00000018/uni00000018/uni00000013/uni00000011

In [48]:
len(docs)

18

In [49]:
type(docs)

list

# Setting Up Pinecone

In [50]:
import os

from pinecone import Pinecone, ServerlessSpec

index_name = "langchain-pinecone-hybrid-search"

# initialize Pinecone client
pc = Pinecone(api_key=pinecone)

# create the index
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,  # dimensionality of dense model
        metric="dotproduct",  # sparse values supported only for dotproduct
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

In [51]:
index = pc.Index(index_name)

In [52]:
index

<pinecone.data.index.Index at 0x7ce410204d90>

## Creating Embeddings

### Dense Vector

In [53]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

### Sparese Vector

In [54]:
# To encode the text to sparse values you can either choose SPLADE or BM25

from pinecone_text.sparse import BM25Encoder

# or from pinecone_text.sparse import SpladeEncoder if you wish to work with SPLADE

# use default tf-idf values
bm25_encoder = BM25Encoder().default()

## The above code is using default tfids values. It's highly recommended to fit the tf-idf values to your own corpus.

In [55]:
corpus = []
for doc in docs:
  content = doc.page_content
  corpus.append(content)

print(corpus)

['YOLOv10: Real-Time End-to-End Object Detection\nAo Wang Hui Chen∗Lihao Liu Kai Chen Zijia Lin\nJungong Han Guiguang Ding∗\nTsinghua University\n/uni00000015/uni00000011/uni00000018 /uni00000018/uni00000011/uni00000013 /uni0000001a/uni00000011/uni00000018 /uni00000014/uni00000013/uni00000011/uni00000013 /uni00000014/uni00000015/uni00000011/uni00000018 /uni00000014/uni00000018/uni00000011/uni00000013 /uni00000014/uni0000001a/uni00000011/uni00000018 /uni00000015/uni00000013/uni00000011/uni00000013\n/uni0000002f/uni00000044/uni00000057/uni00000048/uni00000051/uni00000046/uni0000005c/uni00000003/uni0000000b/uni00000050/uni00000056/uni0000000c/uni00000016/uni0000001a/uni00000011/uni00000018/uni00000017/uni00000013/uni00000011/uni00000013/uni00000017/uni00000015/uni00000011/uni00000018/uni00000017/uni00000018/uni00000011/uni00000013/uni00000017/uni0000001a/uni00000011/uni00000018/uni00000018/uni00000013/uni00000011/uni00000013/uni00000018/uni00000015/uni00000011/uni00000018/uni00000018/uni0

In [56]:
corpus[0]

'YOLOv10: Real-Time End-to-End Object Detection\nAo Wang Hui Chen∗Lihao Liu Kai Chen Zijia Lin\nJungong Han Guiguang Ding∗\nTsinghua University\n/uni00000015/uni00000011/uni00000018 /uni00000018/uni00000011/uni00000013 /uni0000001a/uni00000011/uni00000018 /uni00000014/uni00000013/uni00000011/uni00000013 /uni00000014/uni00000015/uni00000011/uni00000018 /uni00000014/uni00000018/uni00000011/uni00000013 /uni00000014/uni0000001a/uni00000011/uni00000018 /uni00000015/uni00000013/uni00000011/uni00000013\n/uni0000002f/uni00000044/uni00000057/uni00000048/uni00000051/uni00000046/uni0000005c/uni00000003/uni0000000b/uni00000050/uni00000056/uni0000000c/uni00000016/uni0000001a/uni00000011/uni00000018/uni00000017/uni00000013/uni00000011/uni00000013/uni00000017/uni00000015/uni00000011/uni00000018/uni00000017/uni00000018/uni00000011/uni00000013/uni00000017/uni0000001a/uni00000011/uni00000018/uni00000018/uni00000013/uni00000011/uni00000013/uni00000018/uni00000015/uni00000011/uni00000018/uni00000018/uni00

In [57]:
len(corpus)

18

In [58]:
# fit tf-idf values on your corpus
bm25_encoder.fit(corpus)

# store the values to a json file
bm25_encoder.dump("bm25_values.json")

# load to your BM25Encoder object
bm25_encoder = BM25Encoder().load("bm25_values.json")

  0%|          | 0/18 [00:00<?, ?it/s]

# Create Retriever

In [64]:
from langchain_community.retrievers import PineconeHybridSearchRetriever

retriever = PineconeHybridSearchRetriever(
    embeddings=embeddings,
    sparse_encoder=bm25_encoder,
    index=index
)

In [65]:
retriever

PineconeHybridSearchRetriever(embeddings=OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7ce41001c0d0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7ce3557f3a60>, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version='', openai_api_base=None, openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True), sparse_encoder=<pinecone_text.sparse.bm25_encoder.BM25Encoder object at 0x7ce4101ff0a0>, index=<pinecone.data.index.Index object at 0x7ce

### Use the retriever

In [66]:
retriever.add_texts(corpus)

  0%|          | 0/1 [00:00<?, ?it/s]

In [71]:
result = retriever.invoke("Implementation Details of Yolov10")

In [72]:
result

[Document(page_content='A Appendix\nA.1 Implementation Details\nFollowing [ 20,56,59], all YOLOv10 models are trained from scratch using the SGD optimizer for\n500 epochs. The SGD momentum and weight decay are set to 0.937 and 5 ×10−4, respectively. The\ninitial learning rate is 1 ×10−2and it decays linearly to 1 ×10−4. For data augmentation, we adopt the\nMosaic [ 2,19], Mixup [ 68] and copy-paste augmentation [ 17],etc., like [ 20,59]. Tab. 14 presents the\ndetailed hyper-parameters. All models are trained on 8 NVIDIA 3090 GPUs. Besides, we increase\nthe width scale factor of YOLOv10-M to 1.0 to obtain YOLOv10-B. For PSA, we employ it after the\nSPPF module [ 20] and adopt the expansion factor of 2 for FFN. For CIB, we also adopt the expansion\nratio of 2 for the inverted bottleneck block structure. Following [ 59,56], we report the standard mean\naverage precision (AP) across different object scales and IoU thresholds on the COCO dataset [33].\nMoreover, we follow [ 71] to establish

# Chain

Memory - Prompt - StrOutputParser - Runnable

In [79]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import AIMessage, HumanMessage
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

In [80]:
llm = ChatOpenAI(model="gpt-4o-mini")

### Defining System Prompt

In [82]:
system_prompt = (
    """
      You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer
      the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the
      answer concise.
      \n\n
      {context}
    """
)

### Contextualizing the question

In [91]:
contextualize_q_system_prompt = (
    """
    Given a chat history and the latest user question which might reference context in the chat history,
    formulate a standalone question which can be understood without the chat history. Do NOT answer the question,
    just reformulate it if needed and otherwise return it as is.
    """
)

In [92]:
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

### Create Retrieval Chain

In [84]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

### Run the Chain

In [85]:
chat_history = []

question = "What is Yolo?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

print(chat_history)
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

[]


In [87]:
ai_msg_1['answer']

'YOLO, which stands for "You Only Look Once," is a popular real-time object detection algorithm that processes images in a single pass, enabling fast and efficient identification of objects and their locations within an image. It is widely used in various applications, including autonomous driving and video surveillance, due to its balance between speed and accuracy. The architecture involves a single neural network that predicts bounding boxes and class probabilities directly from full images, making it distinct from traditional methods that require separate region proposals.'

In [88]:
print(chat_history)

[HumanMessage(content='What is Yolo?'), AIMessage(content='YOLO, which stands for "You Only Look Once," is a popular real-time object detection algorithm that processes images in a single pass, enabling fast and efficient identification of objects and their locations within an image. It is widely used in various applications, including autonomous driving and video surveillance, due to its balance between speed and accuracy. The architecture involves a single neural network that predicts bounding boxes and class probabilities directly from full images, making it distinct from traditional methods that require separate region proposals.')]


In [89]:
question = "What are the implemenattion detail of Yolov10?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

print(ai_msg_1['answer'])

YOLOv10 is trained from scratch using the SGD optimizer for 500 epochs, with a momentum of 0.937 and a weight decay of 5 × 10^-4. The initial learning rate is set to 0.01 and decays linearly to 0.0001, with various data augmentation techniques such as Mosaic, Mixup, and copy-paste employed. Additionally, the model is trained on 8 NVIDIA 3090 GPUs, and hyper-parameters are meticulously detailed, including box loss gain, class loss gain, and various augmentation parameters.


In [90]:
question = "What is the conclusion we obtain from  Yolov10 paper?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

print(ai_msg_1['answer'])

The conclusion of the YOLOv10 paper emphasizes the development of a new generation of real-time end-to-end object detectors that effectively balance performance and efficiency. By addressing post-processing inefficiencies and optimizing model architecture, YOLOv10 achieves significant improvements in both accuracy and latency compared to previous models. Extensive experiments demonstrate its superiority in state-of-the-art performance across various model scales, indicating its potential for practical applications in real-time object detection.


In [93]:
question = "What does NMS mean?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

print(ai_msg_1['answer'])

NMS stands for Non-Maximum Suppression, a technique used in object detection to eliminate redundant bounding boxes around detected objects. It works by selecting the box with the highest confidence score and removing any other boxes that overlap significantly with it based on a defined threshold. This process helps to ensure that only the most accurate and relevant predictions are retained for final output, improving the quality of detection results.


In [94]:
question = "Based on the paper, Yolov10 outperforms how many other model and which are those?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

print(ai_msg_1['answer'])

Based on the paper, YOLOv10 outperforms several previous models, including YOLOv8, YOLOv6, and RT-DETR. Specifically, it achieves state-of-the-art performance across various model scales, demonstrating significant improvements in accuracy, latency, and parameter efficiency compared to these models. The paper highlights that YOLOv10-S and YOLOv10-X are 1.8× and 1.3× faster than RT-DETR-R18 and R101, respectively, while also outpacing YOLOv8 in terms of mean average precision (AP).
