# Hybrid Search

# Conceptual Explanation of Hybrid Search
Hybrid Search combines vector similarity search (semantic search) with traditional search techniques such as BM25 or full-text search. This approach enables both semantic relevance and precise term matching.

# Key Components in Hybrid Search:
 1. Vector Similarity Search:

* Finds results based on the closeness of embeddings (numerical representations) in a vector space.
* Used for capturing semantic meaning.

2. Traditional Search (BM25/Full-text):

* Finds results by exact term matching or keyword-based scoring.
* Provides precise control over text matching.

3. Combined Approach:

* Uses a weighted scoring or filtering mechanism to blend the results of both searches.
* For example, filtering vector similarity results based on keywords.


In [4]:
pip install faiss-cpu langchain


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Step 2: Set Up FAISS VectorStore

Here's how you can initialize FAISS with LangChain:



In [5]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
import os

# Step 1: Create sample documents
documents = [
    "In 2023, I visited Paris.",
    "In 2022, I visited New York.",
    "In 2021, I visited New Orleans.",
    "The Eiffel Tower is in Paris.",
    "Statue of Liberty is located in New York.",
]

# Step 2: Initialize OpenAI embeddings
embedding_model = OpenAIEmbeddings()

# Step 3: Split documents for indexing
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
docs = text_splitter.create_documents(documents)

# Step 4: Initialize FAISS VectorStore
vectorstore = FAISS.from_documents(docs, embedding_model)


  embedding_model = OpenAIEmbeddings()


Step 3: Perform Similarity Search

Perform a similarity search over the indexed documents:

In [6]:
# Standard similarity search
query = "Which city has the Eiffel Tower?"
result = vectorstore.similarity_search(query, k=3)  # Retrieve top 3 matches

for doc in result:
    print(doc.page_content)


The Eiffel Tower is in Paris.
In 2023, I visited Paris.
Statue of Liberty is located in New York.


Step 4: Hybrid Search with Filters (Simulating BM25 + Vector Search)

FAISS doesn’t natively support hybrid search out of the box. However, you can simulate it by combining:

Vector similarity search for semantic matching.

Text filters (keyword-based) for exact matches

In [7]:
# Define a hybrid search function
def hybrid_search(query, vectorstore, keyword_filter=None, top_k=3):
    # Step 1: Perform vector similarity search
    results = vectorstore.similarity_search(query, k=top_k)
    
    # Step 2: Apply keyword filter (if provided)
    if keyword_filter:
        filtered_results = [
            doc for doc in results if keyword_filter.lower() in doc.page_content.lower()
        ]
        return filtered_results or results  # Return filtered results or fallback to vector results
    return results

# Hybrid search example
query = "Which city has the Eiffel Tower?"
keyword_filter = "Paris"  # Filter by keyword
hybrid_results = hybrid_search(query, vectorstore, keyword_filter=keyword_filter)

for doc in hybrid_results:
    print(doc.page_content)


The Eiffel Tower is in Paris.
In 2023, I visited Paris.


# Explanation of Hybrid Search in FAISS

# Vector Similarity:

FAISS calculates similarity between the query's embedding and indexed document embeddings.

Results are ranked based on cosine similarity or dot product.

# Keyword Filtering:

After retrieving results, you can apply keyword-based filters to refine the results.

# Integration with LangChain:

LangChain's FAISS integration simplifies document indexing and retrieval.

You can chain FAISS with prompt templates and models for end-to-end question-answering.

Step 5: Integrate with a LangChain QA Chain

Use the retrieved results for question-answering:

In [8]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

# Step 1: Initialize the LLM
llm = OpenAI()

# Step 2: Create QA chain
qa_chain = load_qa_chain(llm, chain_type="stuff")

# Step 3: Use the hybrid search results
query = "Where is the Eiffel Tower located?"
context = "\n".join([doc.page_content for doc in hybrid_results])
answer = qa_chain.run(input_documents=hybrid_results, question=query)

print("Answer:", answer)


  llm = OpenAI()
stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  qa_chain = load_qa_chain(llm, chain_type="stuff")
  answer = qa_chain.run(input_documents=hybrid_results, question=query)


Answer:  The Eiffel Tower is located in Paris.
