# **Hybrid search**

In this discussion, we’re diving into hybrid search, a powerful technique used in Retrieval-Augmented Generation (RAG) applications. Up to now, in many RAG implementations, we’ve primarily relied on semantic search. Here's how it typically works:

1. **Document Chunking**: We begin by dividing documents into smaller chunks (e.g., D1, D2, D3,...Dn).
   
2. **Embedding**: These document chunks are converted into embedding vectors.

3. **Vector Storage**: The resulting vectors are stored in a vector database, also known as a vector store DB.

4. **Query Handling**: When a user submits a query, it’s also converted into a vector, and a vector search is performed in the database. 

5. **Similarity Check**: Algorithms like cosine similarity help match the query vector with similar vectors in the database.

6. **Response Generation**: The retrieved results are combined with a prompt template and an LLM to generate the final output.

This approach, known as semantic search, is widely used because it identifies similar vectors in the database, which represent semantically related content. However, there’s another search method we need to discuss: **hybrid search**.

### Understanding Hybrid Search

Hybrid search combines multiple search techniques, specifically **semantic search** and **syntactic search** (also known as exact or keyword search). Here’s a breakdown:

- **Semantic Search**: This dense vector search identifies similar content based on meaning. All text is converted into dense vectors and stored in the vector store DB.

- **Syntactic Search**: Also referred to as exact search or keyword search, this technique involves converting text into sparse matrices using methods like one-hot encoding, bag of words, or TF-IDF. These sparse matrices allow for keyword-based searching, where the search focuses on specific keywords within the text.

### Hybrid Search in Action

In hybrid search, both dense vectors (semantic) and sparse vectors (syntactic) are stored in the vector database. When a user submits a query, it’s processed in two ways:

1. **Keyword Search**: The query is converted into sparse vectors, and a keyword search retrieves top results based on exact matches.

2. **Vector Search**: The query is also converted into dense vectors, and a vector search retrieves top results based on semantic similarity.

These two sets of results—one from the keyword search and one from the vector search—are then combined based on a weighted approach. The challenge lies in effectively merging these results, which brings us to a technique called **Reciprocal Rank Fusion**. This method will be explored further, as it plays a key role in balancing the outcomes of both search types to deliver the most relevant results.

In essence, hybrid search enhances the search capability by integrating both semantic and syntactic methods, making it highly effective in applications where both context and specific keywords are crucial.

# Hybrid Search with Reciprocal Rank Fusion

## Introduction

In this section, we'll explore the concept of hybrid search, which combines semantic (vector) search and keyword (exact) search to enhance document retrieval from a vector database. This approach leverages reciprocal rank fusion to rank and combine results from both search types, producing a final score for each document. We'll also touch on graph knowledge search and how it integrates with the hybrid search method.

## What is Hybrid Search?

Hybrid search is a combination of two types of searches:
- **Vector Search (Semantic Search)**: Uses dense vectors to find the top-K most relevant results based on cosine similarity.
- **Keyword Search (Exact Search)**: Converts the query into sparse vectors and retrieves top-K results based on exact keyword matches.

By combining these two approaches, we can improve the relevance of the results returned to the user.

## How Does Reciprocal Rank Fusion Work?

Reciprocal rank fusion is a method used to combine the rankings from vector search and keyword search. Here’s how it works:

### Step 1: Retrieve Top-K Results
- **Vector Search**: Retrieve the top-K results based on cosine similarity using dense vectors.
- **Keyword Search**: Retrieve the top-K results based on exact keyword matches using sparse vectors.

### Step 2: Rank the Documents
Each document is assigned a rank in both searches:
- **Vector Search**: Ranked according to similarity.
- **Keyword Search**: Ranked according to exact match relevance.

### Step 3: Calculate the Final Score
The final score for each document is calculated using the following formula:

$$
\text{Score} = \sum \frac{1}{C + \text{Rank}(d)}
$$

Where:
- $ (C) $ is a constant that varies across databases (typically between 1 and 60).
- $ \text{Rank}(d) $ is the rank of the document in the search.

For example, if a document is ranked 1st in vector search and 5th in keyword search, its final score might be:

$$
\text{Score} = \frac{1}{1+1} + \frac{1}{1+5} = 1 + 0.2 = 1.2
$$

### Step 4: Weightage and Final Ranking
The final ranking can be influenced by assigning different weightages to the vector and keyword searches. For instance, if keyword search is given 70% importance, documents scoring higher in keyword search might rank higher overall.

## Practical Example

In the next section, we will demonstrate a practical implementation of hybrid search using reciprocal rank fusion. We will retrieve top-K documents from a vector database and combine the results from both semantic and keyword searches to produce a final ranked list.

## Graph Knowledge Search

Graph Knowledge Search is another advanced retrieval method that can be integrated with hybrid search. It involves using a graph database (e.g., Neo4j) to store and query documents based on relationships between nodes (documents) and edges (connections).

### Types of Queries Supported:
- **Keyword Search**: Exact search within the graph.
- **Semantic Search**: Using vectors to find similar documents.
- **Graph Knowledge Search**: Leveraging the graph structure to enhance retrieval.

By combining these methods, we can create a highly efficient Retrieval-Augmented Generation (RAG) application.

## Conclusion

Hybrid search, enhanced by reciprocal rank fusion, provides a powerful way to retrieve relevant documents by combining semantic and exact search results. By assigning appropriate weightages and leveraging graph databases, we can further refine the retrieval process, making it more robust and effective.

# Hybrid Search RAG With Pinecone DB And Langchain

[Pinecone API Key](https://app.pinecone.io/organizations/-O0Gsbmbisj2R1r7QAjP/projects/7a752ee5-afcc-4f39-8ac4-28fc80737a24/keys)


In [1]:
!pip install pinecone-client pinecone-text pinecone-notebooks

Collecting pinecone-client
  Downloading pinecone_client-5.0.1-py3-none-any.whl.metadata (19 kB)
Collecting pinecone-text
  Downloading pinecone_text-0.9.0-py3-none-any.whl.metadata (10 kB)
Collecting pinecone-notebooks
  Downloading pinecone_notebooks-0.1.1-py3-none-any.whl.metadata (2.6 kB)
Collecting pinecone-plugin-inference<2.0.0,>=1.0.3 (from pinecone-client)
  Downloading pinecone_plugin_inference-1.0.3-py3-none-any.whl.metadata (2.2 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone-client)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Collecting mmh3<5.0.0,>=4.1.0 (from pinecone-text)
  Downloading mmh3-4.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting nltk<4.0.0,>=3.6.5 (from pinecone-text)
  Downloading nltk-3.8.2-py3-none-any.whl.metadata (2.9 kB)
Collecting python-dotenv<2.0.0,>=1.0.1 (from pinecone-text)
  Downloading pytho

In [2]:
!pip install langchain langchain-community

Collecting langchain
  Downloading langchain-0.2.13-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community
  Downloading langchain_community-0.2.12-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain-core<0.3.0,>=0.2.30 (from langchain)
  Downloading langchain_core-0.2.30-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.99-py3-none-any.whl.metadata (13 kB)
Collecting packaging<25,>=23.2 (from langchain-core<0.3.0,>=0.2.30->langchain)
  Downloading packaging-24.1-py3-none-any.whl.metadata (3.2 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading orjson-3.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 k

In [3]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("GROQ_API_KEY")
secret_value_1 = user_secrets.get_secret("PINECONE_API_KEY")
secret_value_2 = user_secrets.get_secret("HF_TOKEN")

## Create index code for pinecone
```python
pc.create_index(
    name="quickstart",
    dimension=2, # Replace with your model dimensions
    metric="cosine", # Replace with your model metric
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    ) 
)
```

In [4]:
from langchain_community.retrievers import PineconeHybridSearchRetriever
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key = secret_value_1)

In [5]:
pc.list_indexes().names()

['hybrid-search-pinecone']

In [6]:
# create index
index_name = "hybrid-search-pinecone"

if index_name not in pc.list_indexes().names():
    pc.create_index(
        name = index_name,
        dimension = 384, #dimension of dense vector and 384 is HuggingFaceEmbeddings Dimension
        metric = "dotproduct", #Sparse value
        spec = ServerlessSpec(cloud="aws", region="us-east-1"),
    )

In [7]:
pc.list_indexes().names()

['hybrid-search-pinecone']

In [8]:
index = pc.Index(index_name)
index

<pinecone.data.index.Index at 0x7a25833ff5e0>

In [9]:
!pip install langchain-huggingface

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.0.3-py3-none-any.whl.metadata (1.2 kB)
Collecting sentence-transformers>=2.6.0 (from langchain-huggingface)
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Downloading langchain_huggingface-0.0.3-py3-none-any.whl (17 kB)
Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers, langchain-huggingface
Successfully installed langchain-huggingface-0.0.3 sentence-transformers-3.0.1


In [10]:
# Vector Embedding and Sparse Matrix
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name = "all-MiniLM-L6-v2")
embeddings

2024-08-13 13:14:13.363159: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-13 13:14:13.363436: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-13 13:14:13.598105: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [11]:
# BM25Encoder is used for Sparse Matrix it uses TF-IDF
from pinecone_text.sparse import BM25Encoder

bm25_encoder = BM25Encoder().default()
bm25_encoder

<pinecone_text.sparse.bm25_encoder.BM25Encoder at 0x7a2582a586a0>

In [12]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /usr/share/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [13]:
# Examples

sentences = [
    "Wait a minute (uh), get it how you live it (uh)",
    "Ten toes in when we standin' on business",
    "I'm a big stepper, underground methods",
    "Top notch hoes get the most, not the lesser"
]

# applying TF-IDF on sentences
bm25_encoder.fit(sentences)

# store the valueto json file
bm25_encoder.dump("bm25_values.json")

# load to your BM25Encoder object
bm25_encoder = BM25Encoder().load("bm25_values.json")

  0%|          | 0/4 [00:00<?, ?it/s]

In [14]:
retriever = PineconeHybridSearchRetriever(embeddings = embeddings,
                                         sparse_encoder = bm25_encoder,
                                         index = index)

In [15]:
retriever

PineconeHybridSearchRetriever(embeddings=HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False), sparse_encoder=<pinecone_text.sparse.bm25_encoder.BM25Encoder object at 0x7a24f25be110>, index=<pinecone.data.index.Index object at 0x7a25833ff5e0>)

In [16]:
retriever.add_texts(
    [
        "In 2021, I visited Bhigwan",
        "In 2022, I visited Pune",
        "In 2023, I visited Goa",
    ]
)

  0%|          | 0/1 [00:00<?, ?it/s]

In [17]:
retriever.invoke("Where I visited in 2022")

[Document(page_content='In 2022, I visited Pune'),
 Document(page_content='In 2023, I visited Goa'),
 Document(page_content='In 2021, I visited Bhigwan')]

In [18]:
retriever.invoke("Where I visited in 2021")

[Document(page_content='In 2021, I visited Bhigwan'),
 Document(page_content='In 2022, I visited Pune'),
 Document(page_content='In 2023, I visited Goa')]