## Hybrid Retriever- Combining Dense And Sparse Retriever

In [6]:
from langchain_community.vectorstores import FAISS
from langchain_community.retrievers import BM25Retriever
from langchain_classic.retrievers.ensemble import EnsembleRetriever
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEmbeddings

from langchain.chat_models.base import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain


In [7]:
# Step 1: Sample documents
docs = [
    Document(page_content="LangChain helps build LLM applications."),
    Document(page_content="Pinecone is a vector database for semantic search."),
    Document(page_content="The Eiffel Tower is located in Paris."),
    Document(page_content="Langchain can be used to develop agentic ai application."),
    Document(page_content="Langchain has many types of retrievers.")
]

# Step 2: Dense Retriever (FAISS + HuggingFace)
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
dense_vectorstore = FAISS.from_documents(
    docs,
    embedding_model
)

dense_retriever = dense_vectorstore.as_retriever()

In [8]:
### Sparse Retriever(BM25)
sparse_retriever = BM25Retriever.from_documents(docs)
sparse_retriever.k = 3 ##top- k documents to retriever

# Step 4: combine with Ensemble Retriever
hybrid_retriever = EnsembleRetriever(
    retrievers=[dense_retriever, sparse_retriever],
    weight=[0.7, 0.3]
)



In [9]:
# Step 5: Query and get results
query = "How can I build application using LLMs?"
results = hybrid_retriever.invoke(query)

# Step 6: Print results
for i, doc in enumerate(results):
    print(f"\nðŸ”¹ Document {i+1}:\n{doc.page_content}")


ðŸ”¹ Document 1:
LangChain helps build LLM applications.

ðŸ”¹ Document 2:
Langchain can be used to develop agentic ai application.

ðŸ”¹ Document 3:
Langchain has many types of retrievers.

ðŸ”¹ Document 4:
Pinecone is a vector database for semantic search.


### RAG Pipeline with hybrid retriever

In [16]:
# Step 5: Prompt Template
prompt = PromptTemplate.from_template(
    """
    Answer the question based on the context below.

    Context:
    {context}

    Question: {input}
    """
)

# Step 6: LLM
llm = init_chat_model("openai:gpt-4.1")

In [17]:
# Step 7: Create stuff Docuemnt Chain
document_chain = create_stuff_documents_chain(
    llm,
    prompt
)

In [18]:
# Step 8: create full RAG chain
rag_chain = create_retrieval_chain(
    retriever=hybrid_retriever,
    combine_docs_chain=document_chain
)

In [20]:
# Step 9: Ask a question
query = {"input": "How can I build an app using LLMs?"}
response = rag_chain.invoke(query)

# Step 10: Output
print("âœ… Answer:\n", response["answer"])

print("\nðŸ“„ Source Documents:")
for i, doc in enumerate(response["context"]):
    print(f"\nDoc {i+1}: {doc.page_content}")

âœ… Answer:
 To build an app using LLMs (Large Language Models), you can use tools like LangChain. Hereâ€™s how you might approach it based on the context:

1. **Define your appâ€™s goal:** Decide what your app should do (e.g., answer questions, summarize documents, automate tasks).

2. **Use LangChain:** LangChain helps you to build applications using LLMs by providing easy ways to:
   - **Create agentic AI applications** (apps that can reason, make decisions, or take actions).
   - **Combine LLMs with different data sources**.
   - **Retrieve information** using various types of retrievers.

3. **Incorporate data retrieval:**  
   - If your app needs to search or understand large sets of data, connect it to a vector database like Pinecone.  
   - Pinecone allows your app to perform semantic search, finding relevant information based on meaning, not just keywords.

4. **Build and connect components:**  
   - Use LangChain to connect the LLM, data retrievers, and vector databases (like