# RAG Application with Live Web Search using DuckDuckGo and LangGraph

This notebook demonstrates a hybrid RAG (Retrieval-Augmented Generation) system that combines:
- **Vector Store RAG**: Retrieve from your private knowledge base
- **Live Web Search**: Get current information from DuckDuckGo
- **LangGraph Orchestration**: Intelligent routing and workflow management

## Architecture Overview

```
User Query ‚Üí Router Agent ‚Üí [Vector Store Retrieval] + [Web Search]
                                        ‚Üì
                               Hybrid Context Builder
                                        ‚Üì
                                  LLM Generator
                                        ‚Üì
                                  Final Answer
```

## Features
- üîç Intelligent query routing (local docs vs web search)
- üìö Vector store for private documents
- üåê Live web search for current information
- üîÑ Hybrid retrieval combining both sources
- üéØ Context-aware answer generation

## 1. Install Required Packages

In [None]:
# Install required packages
!pip install -q langchain langchain-openai langgraph langchain-community \
    duckduckgo-search chromadb sentence-transformers pypdf \
    tiktoken faiss-cpu

## 2. Import Dependencies

In [None]:
import os
from typing import TypedDict, Annotated, List, Dict, Any, Literal
import operator

# LangChain imports
from langchain_community.tools import DuckDuckGoSearchResults
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document

# Vector store imports
from langchain_community.vectorstores import Chroma, FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader, PyPDFLoader

# LangGraph imports
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

print("‚úÖ All imports successful!")

## 3. Set Up API Keys

In [None]:
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Or load from environment
# from dotenv import load_dotenv
# load_dotenv()

## 4. Create Sample Documents for RAG Knowledge Base

In a real application, you'd load your own documents. For this demo, we'll create sample documents.

In [None]:
# Sample documents for the knowledge base
sample_documents = [
    Document(
        page_content="""LangChain is a framework for developing applications powered by language models. 
        It provides tools for prompt management, chains, agents, and memory. LangChain supports 
        multiple LLM providers including OpenAI, Anthropic, and Hugging Face.""",
        metadata={"source": "langchain_docs", "topic": "framework"}
    ),
    Document(
        page_content="""LangGraph is a library for building stateful, multi-actor applications with LLMs. 
        It extends LangChain with graph-based orchestration capabilities, allowing developers to 
        create complex workflows with cycles, conditional branching, and state management.""",
        metadata={"source": "langgraph_docs", "topic": "framework"}
    ),
    Document(
        page_content="""RAG (Retrieval-Augmented Generation) is a technique that combines information 
        retrieval with text generation. It retrieves relevant documents from a knowledge base and 
        uses them as context for generating more accurate and informed responses.""",
        metadata={"source": "rag_guide", "topic": "technique"}
    ),
    Document(
        page_content="""Vector databases store embeddings and enable semantic search. Popular options 
        include Chroma, Pinecone, Weaviate, and FAISS. They use similarity metrics like cosine 
        similarity to find relevant documents based on semantic meaning rather than exact matches.""",
        metadata={"source": "vector_db_guide", "topic": "database"}
    ),
    Document(
        page_content="""DuckDuckGo is a privacy-focused search engine that doesn't track users. 
        It can be integrated into applications for web search capabilities without requiring 
        an API key, making it ideal for prototyping and development.""",
        metadata={"source": "search_engines", "topic": "tools"}
    ),
    Document(
        page_content="""Our company policy on remote work: Employees can work remotely up to 3 days per week. 
        Core hours are 10 AM - 3 PM local time. All team meetings should be scheduled during core hours. 
        Remote work equipment is provided by the company.""",
        metadata={"source": "company_handbook", "topic": "policy", "date": "2024"}
    ),
    Document(
        page_content="""Machine learning embeddings are dense vector representations of text that capture 
        semantic meaning. Popular embedding models include OpenAI's text-embedding-ada-002, 
        sentence-transformers, and Google's Universal Sentence Encoder.""",
        metadata={"source": "ml_guide", "topic": "embeddings"}
    )
]

print(f"‚úÖ Created {len(sample_documents)} sample documents")

## 5. Set Up Vector Store with Embeddings

In [None]:
# Initialize embeddings
# Option 1: OpenAI embeddings (requires API key, higher quality)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Option 2: Free local embeddings (no API key needed)
# embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

print("‚úÖ Embeddings model initialized")

In [None]:
# Create vector store from documents
# Option 1: Chroma (persistent, easy to use)
vectorstore = Chroma.from_documents(
    documents=sample_documents,
    embedding=embeddings,
    collection_name="rag_knowledge_base",
    persist_directory="./chroma_db"
)

# Option 2: FAISS (in-memory, very fast)
# vectorstore = FAISS.from_documents(sample_documents, embeddings)

print(f"‚úÖ Vector store created with {vectorstore._collection.count()} documents")

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most relevant documents
)

print("‚úÖ Retriever configured")

## 6. Test Vector Store Retrieval

In [None]:
# Test retrieval
test_query = "What is RAG?"
docs = retriever.invoke(test_query)

print(f"Query: {test_query}\n")
print("Retrieved documents:")
for i, doc in enumerate(docs, 1):
    print(f"\n{i}. {doc.page_content[:150]}...")
    print(f"   Source: {doc.metadata.get('source', 'unknown')}")

## 7. Initialize DuckDuckGo Search Tool

In [None]:
# Initialize DuckDuckGo search
web_search = DuckDuckGoSearchResults(
    num_results=3,
    output_format="list"
)

# Test search
test_results = web_search.run("latest AI news 2024")
print("‚úÖ DuckDuckGo search initialized")
print(f"Test search returned {len(test_results) if isinstance(test_results, list) else 1} results")

## 8. Define RAG State

In [None]:
class RAGState(TypedDict):
    """State for the RAG workflow."""
    # Input
    query: str
    
    # Routing decision
    use_web_search: bool
    use_vector_store: bool
    
    # Retrieved context
    vector_docs: List[Document]
    web_results: List[Dict[str, Any]]
    
    # Combined context
    combined_context: str
    
    # Output
    answer: str
    sources: List[str]

print("‚úÖ RAG state defined")

## 9. Create Router Node (Decides What to Retrieve)

In [None]:
# Initialize LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)

def route_query(state: RAGState) -> RAGState:
    """
    Determine whether to use vector store, web search, or both.
    """
    print("\n" + "="*60)
    print("üîÄ ROUTER NODE")
    print("="*60)
    
    query = state["query"]
    
    # Create routing prompt
    router_prompt = f"""Analyze this query and determine the best retrieval strategy:

Query: "{query}"

Decide:
1. use_vector_store: True if the query is about internal knowledge, documentation, or company policies
2. use_web_search: True if the query requires current/recent information, news, or real-time data

Note: Both can be True for queries needing both internal and external information.

Examples:
- "What is our remote work policy?" ‚Üí vector_store: True, web_search: False
- "Latest AI news" ‚Üí vector_store: False, web_search: True
- "How does LangChain compare to latest AI frameworks?" ‚Üí vector_store: True, web_search: True

Respond with ONLY a JSON object: {{"use_vector_store": true/false, "use_web_search": true/false}}"""
    
    response = llm.invoke([HumanMessage(content=router_prompt)])
    
    # Parse response
    import json
    try:
        decision = json.loads(response.content)
        use_vector = decision.get("use_vector_store", True)
        use_web = decision.get("use_web_search", False)
    except:
        # Default fallback
        use_vector = True
        use_web = False
    
    print(f"üìä Routing Decision:")
    print(f"   Vector Store: {use_vector}")
    print(f"   Web Search: {use_web}")
    
    return {
        "use_vector_store": use_vector,
        "use_web_search": use_web
    }

print("‚úÖ Router node created")

## 10. Create Vector Store Retrieval Node

In [None]:
def retrieve_from_vectorstore(state: RAGState) -> RAGState:
    """
    Retrieve relevant documents from vector store.
    """
    print("\n" + "="*60)
    print("üìö VECTOR STORE RETRIEVAL NODE")
    print("="*60)
    
    if not state.get("use_vector_store", False):
        print("‚è≠Ô∏è  Skipping vector store retrieval")
        return {"vector_docs": []}
    
    query = state["query"]
    
    try:
        # Retrieve documents
        docs = retriever.invoke(query)
        
        print(f"‚úÖ Retrieved {len(docs)} documents from vector store")
        for i, doc in enumerate(docs, 1):
            print(f"   {i}. {doc.metadata.get('source', 'unknown')} - {doc.page_content[:60]}...")
        
        return {"vector_docs": docs}
        
    except Exception as e:
        print(f"‚ùå Error retrieving from vector store: {e}")
        return {"vector_docs": []}

print("‚úÖ Vector retrieval node created")

## 11. Create Web Search Node

In [None]:
def search_web(state: RAGState) -> RAGState:
    """
    Search the web using DuckDuckGo.
    """
    print("\n" + "="*60)
    print("üåê WEB SEARCH NODE")
    print("="*60)
    
    if not state.get("use_web_search", False):
        print("‚è≠Ô∏è  Skipping web search")
        return {"web_results": []}
    
    query = state["query"]
    
    try:
        # Perform web search
        raw_results = web_search.run(query)
        
        # Parse results
        results = []
        if isinstance(raw_results, str):
            # Parse string results
            snippets = raw_results.split('snippet: ')
            for snippet in snippets[1:]:
                parts = snippet.split('title: ')
                if len(parts) > 1:
                    title = parts[1].split('link: ')[0].strip()
                    link = parts[1].split('link: ')[1].strip() if 'link: ' in parts[1] else ""
                    results.append({
                        "title": title,
                        "snippet": parts[0].strip(),
                        "url": link
                    })
        else:
            results = raw_results if isinstance(raw_results, list) else []
        
        print(f"‚úÖ Found {len(results)} web results")
        for i, result in enumerate(results[:3], 1):
            print(f"   {i}. {result.get('title', 'N/A')[:60]}...")
        
        return {"web_results": results}
        
    except Exception as e:
        print(f"‚ùå Error searching web: {e}")
        return {"web_results": []}

print("‚úÖ Web search node created")

## 12. Create Context Builder Node

In [None]:
def build_context(state: RAGState) -> RAGState:
    """
    Combine vector store docs and web results into unified context.
    """
    print("\n" + "="*60)
    print("üî® CONTEXT BUILDER NODE")
    print("="*60)
    
    context_parts = []
    sources = []
    
    # Add vector store documents
    vector_docs = state.get("vector_docs", [])
    if vector_docs:
        context_parts.append("=== INTERNAL KNOWLEDGE BASE ===")
        for i, doc in enumerate(vector_docs, 1):
            context_parts.append(f"\n[Document {i}]")
            context_parts.append(doc.page_content)
            source = doc.metadata.get('source', 'unknown')
            sources.append(f"Internal: {source}")
        print(f"üìö Added {len(vector_docs)} documents from vector store")
    
    # Add web results
    web_results = state.get("web_results", [])
    if web_results:
        context_parts.append("\n\n=== WEB SEARCH RESULTS ===")
        for i, result in enumerate(web_results, 1):
            context_parts.append(f"\n[Web Result {i}]")
            context_parts.append(f"Title: {result.get('title', 'N/A')}")
            context_parts.append(f"Content: {result.get('snippet', 'N/A')}")
            url = result.get('url', '')
            if url:
                sources.append(f"Web: {url}")
        print(f"üåê Added {len(web_results)} web search results")
    
    combined_context = "\n".join(context_parts)
    
    if not combined_context.strip():
        combined_context = "No relevant context found."
    
    print(f"\n‚úÖ Built context with {len(combined_context)} characters")
    
    return {
        "combined_context": combined_context,
        "sources": sources
    }

print("‚úÖ Context builder node created")

## 13. Create Generator Node

In [None]:
def generate_answer(state: RAGState) -> RAGState:
    """
    Generate final answer using retrieved context.
    """
    print("\n" + "="*60)
    print("‚ú® ANSWER GENERATOR NODE")
    print("="*60)
    
    query = state["query"]
    context = state.get("combined_context", "")
    
    # Create generation prompt
    generation_prompt = f"""You are a helpful assistant answering questions based on provided context.

Context:
{context}

Question: {query}

Instructions:
1. Answer the question based on the provided context
2. If the context contains relevant information, use it to provide a detailed answer
3. Distinguish between internal knowledge and web search results if both are present
4. If the context doesn't contain enough information, say so
5. Be concise but comprehensive

Answer:"""
    
    try:
        response = llm.invoke([HumanMessage(content=generation_prompt)])
        answer = response.content
        
        print(f"‚úÖ Generated answer ({len(answer)} characters)")
        
        return {"answer": answer}
        
    except Exception as e:
        print(f"‚ùå Error generating answer: {e}")
        return {"answer": "I encountered an error generating the answer."}

print("‚úÖ Generator node created")

## 14. Build the RAG LangGraph Workflow

In [None]:
# Create the workflow
workflow = StateGraph(RAGState)

# Add nodes
workflow.add_node("route", route_query)
workflow.add_node("retrieve_vectors", retrieve_from_vectorstore)
workflow.add_node("search_web", search_web)
workflow.add_node("build_context", build_context)
workflow.add_node("generate", generate_answer)

# Define the flow
workflow.set_entry_point("route")

# Both retrieval operations happen in parallel (conceptually)
workflow.add_edge("route", "retrieve_vectors")
workflow.add_edge("route", "search_web")

# Both feed into context builder
workflow.add_edge("retrieve_vectors", "build_context")
workflow.add_edge("search_web", "build_context")

# Context builder feeds into generator
workflow.add_edge("build_context", "generate")

# Generator is the end
workflow.add_edge("generate", END)

# Compile the graph
rag_app = workflow.compile()

print("\n‚úÖ RAG Application Ready!")
print("\nüìä Workflow: Route ‚Üí [Vector Retrieval + Web Search] ‚Üí Context Builder ‚Üí Generator")

## 15. Visualize the RAG Workflow

In [None]:
# Visualize the workflow
try:
    from IPython.display import Image, display
    display(Image(rag_app.get_graph().draw_mermaid_png()))
except Exception as e:
    print(f"Could not visualize graph: {e}")
    print("\nWorkflow Structure:")
    print("Route ‚Üí Retrieve Vectors + Search Web ‚Üí Build Context ‚Üí Generate ‚Üí End")

## 16. Helper Function to Run RAG Queries

In [None]:
def ask_rag(query: str, verbose: bool = True) -> Dict[str, Any]:
    """
    Run a query through the RAG system.
    
    Args:
        query: The question to ask
        verbose: Whether to print detailed output
        
    Returns:
        Dictionary with answer and sources
    """
    if verbose:
        print("\n" + "="*80)
        print("ü§ñ RAG SYSTEM - QUERY PROCESSING")
        print("="*80)
        print(f"\n‚ùì Query: {query}\n")
    
    # Initialize state
    initial_state = {
        "query": query,
        "use_web_search": False,
        "use_vector_store": False,
        "vector_docs": [],
        "web_results": [],
        "combined_context": "",
        "answer": "",
        "sources": []
    }
    
    try:
        # Run the workflow
        final_state = rag_app.invoke(initial_state)
        
        if verbose:
            print("\n" + "="*80)
            print("üìù FINAL ANSWER")
            print("="*80)
            print(final_state["answer"])
            
            if final_state.get("sources"):
                print("\n" + "="*80)
                print("üìö SOURCES")
                print("="*80)
                for i, source in enumerate(final_state["sources"], 1):
                    print(f"{i}. {source}")
        
        return {
            "answer": final_state["answer"],
            "sources": final_state.get("sources", []),
            "used_vector_store": final_state.get("use_vector_store", False),
            "used_web_search": final_state.get("use_web_search", False)
        }
        
    except Exception as e:
        print(f"\n‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
        return {"answer": f"Error: {e}", "sources": []}

print("‚úÖ Helper function ready!")

## 17. Example 1: Query Using Only Vector Store

In [None]:
# Example 1: Internal knowledge query
result1 = ask_rag("What is LangGraph and what are its capabilities?")

## 18. Example 2: Query Using Only Web Search

In [None]:
# Example 2: Current events query
result2 = ask_rag("What are the latest developments in AI in 2024?")

## 19. Example 3: Hybrid Query (Both Sources)

In [None]:
# Example 3: Hybrid query needing both sources
result3 = ask_rag("How does RAG compare to the latest retrieval methods in 2024?")

## 20. Example 4: Company Policy Query

In [None]:
# Example 4: Company-specific query
result4 = ask_rag("What is our company's remote work policy?")

## 21. Add Your Own Documents

In [None]:
def add_documents_to_rag(texts: List[str], metadatas: List[Dict] = None):
    """
    Add new documents to the RAG knowledge base.
    
    Args:
        texts: List of document texts
        metadatas: Optional list of metadata dicts for each document
    """
    if metadatas is None:
        metadatas = [{"source": "user_added"} for _ in texts]
    
    # Create documents
    docs = [Document(page_content=text, metadata=meta) 
            for text, meta in zip(texts, metadatas)]
    
    # Add to vector store
    vectorstore.add_documents(docs)
    
    print(f"‚úÖ Added {len(docs)} documents to the knowledge base")
    print(f"üìä Total documents: {vectorstore._collection.count()}")

# Example: Add a new document
new_docs = [
    """Python 3.12 introduces new features including improved error messages, 
    a new f-string parser, and performance improvements. The PEP 701 changes make 
    f-strings more flexible and powerful."""
]

new_metadata = [{"source": "python_docs", "version": "3.12", "topic": "programming"}]

add_documents_to_rag(new_docs, new_metadata)

## 22. Load Documents from Files

In [None]:
def load_documents_from_file(file_path: str, chunk_size: int = 1000) -> List[Document]:
    """
    Load and process documents from a file.
    
    Args:
        file_path: Path to the file (.txt, .pdf, etc.)
        chunk_size: Size of text chunks
        
    Returns:
        List of processed documents
    """
    # Choose loader based on file type
    if file_path.endswith('.pdf'):
        loader = PyPDFLoader(file_path)
    elif file_path.endswith('.txt'):
        loader = TextLoader(file_path)
    else:
        raise ValueError(f"Unsupported file type: {file_path}")
    
    # Load documents
    documents = loader.load()
    
    # Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=200
    )
    splits = text_splitter.split_documents(documents)
    
    print(f"‚úÖ Loaded {len(documents)} documents")
    print(f"üìÑ Split into {len(splits)} chunks")
    
    return splits

# Example usage (uncomment to use with your own files):
# docs = load_documents_from_file("your_document.pdf")
# vectorstore.add_documents(docs)
# print(f"Total documents in store: {vectorstore._collection.count()}")

## 23. Advanced: Hybrid Search with Reranking

In [None]:
def rerank_results(query: str, documents: List[Document], web_results: List[Dict]) -> List[Dict]:
    """
    Rerank combined results based on relevance to query.
    Simple implementation using LLM scoring.
    """
    all_results = []
    
    # Convert documents to dict format
    for doc in documents:
        all_results.append({
            "content": doc.page_content,
            "source": doc.metadata.get("source", "vector_store"),
            "type": "vector"
        })
    
    # Add web results
    for result in web_results:
        all_results.append({
            "content": result.get("snippet", ""),
            "source": result.get("url", "web"),
            "type": "web"
        })
    
    # Simple reranking based on content length and keyword matching
    # In production, use a proper reranking model
    query_terms = query.lower().split()
    
    for result in all_results:
        content_lower = result["content"].lower()
        score = sum(1 for term in query_terms if term in content_lower)
        result["relevance_score"] = score
    
    # Sort by relevance
    ranked = sorted(all_results, key=lambda x: x["relevance_score"], reverse=True)
    
    return ranked[:5]  # Return top 5

print("‚úÖ Reranking function created")

## 24. Interactive RAG Chat Interface

In [None]:
def interactive_rag_chat():
    """
    Interactive chat interface for the RAG system.
    """
    print("\n" + "="*80)
    print("üí¨ INTERACTIVE RAG CHAT")
    print("="*80)
    print("Ask questions and get answers from both the knowledge base and web!")
    print("Commands: 'quit' to exit, 'stats' for system stats\n")
    
    conversation_history = []
    
    while True:
        query = input("\n‚ùì You: ")
        
        if query.lower() in ['quit', 'exit', 'q']:
            print("\nüëã Thanks for chatting! Goodbye!")
            break
        
        if query.lower() == 'stats':
            print(f"\nüìä System Statistics:")
            print(f"   Knowledge Base Documents: {vectorstore._collection.count()}")
            print(f"   Queries Processed: {len(conversation_history)}")
            continue
        
        if not query.strip():
            continue
        
        # Process query
        result = ask_rag(query, verbose=False)
        
        print(f"\nü§ñ Assistant: {result['answer']}")
        
        if result.get('sources'):
            print(f"\nüìö Sources: {len(result['sources'])} total")
        
        conversation_history.append({
            "query": query,
            "answer": result['answer'],
            "sources": result.get('sources', [])
        })

# Uncomment to start interactive mode:
# interactive_rag_chat()

## 25. Batch Processing Multiple Queries

In [None]:
# Batch process multiple queries
queries = [
    "What is RAG?",
    "What are the latest AI developments?",
    "Explain our remote work policy",
    "How does LangChain work?"
]

print("\n" + "="*80)
print("üì¶ BATCH PROCESSING QUERIES")
print("="*80)

results = []
for i, query in enumerate(queries, 1):
    print(f"\n[{i}/{len(queries)}] Processing: {query}")
    result = ask_rag(query, verbose=False)
    results.append(result)
    print(f"‚úÖ Answer preview: {result['answer'][:100]}...")

print(f"\n‚úÖ Processed {len(queries)} queries successfully!")

## 26. Performance Monitoring and Analytics

In [None]:
import time
from datetime import datetime

def benchmark_rag(query: str, num_runs: int = 3) -> Dict:
    """
    Benchmark RAG system performance.
    """
    print(f"\nüèÉ Running benchmark for: '{query}'")
    print(f"Number of runs: {num_runs}\n")
    
    times = []
    
    for i in range(num_runs):
        start = time.time()
        result = ask_rag(query, verbose=False)
        end = time.time()
        
        elapsed = end - start
        times.append(elapsed)
        print(f"Run {i+1}: {elapsed:.2f}s")
    
    avg_time = sum(times) / len(times)
    
    print(f"\nüìä Benchmark Results:")
    print(f"   Average Time: {avg_time:.2f}s")
    print(f"   Min Time: {min(times):.2f}s")
    print(f"   Max Time: {max(times):.2f}s")
    
    return {
        "query": query,
        "avg_time": avg_time,
        "min_time": min(times),
        "max_time": max(times),
        "runs": num_runs
    }

# Run benchmark
benchmark_result = benchmark_rag("What is LangGraph?")

## 27. Best Practices and Tips

### Document Management:
1. **Chunking Strategy**: Use appropriate chunk sizes (500-1500 tokens) with overlap
2. **Metadata**: Add rich metadata (source, date, category) for better filtering
3. **Updates**: Regularly update vector store with new information
4. **Cleaning**: Remove duplicates and outdated documents

### Retrieval Optimization:
1. **k Parameter**: Adjust number of retrieved documents (3-5 is usually good)
2. **Similarity Threshold**: Filter out low-relevance results
3. **Hybrid Search**: Combine semantic and keyword search
4. **Reranking**: Use reranking models for better result ordering

### Web Search Integration:
1. **Query Refinement**: Optimize search queries before sending to DuckDuckGo
2. **Result Filtering**: Filter and validate web results
3. **Caching**: Cache frequent search results
4. **Rate Limiting**: Respect search API limits

### LLM Generation:
1. **Context Length**: Don't exceed model's context window
2. **Prompt Engineering**: Craft clear, specific prompts
3. **Citation**: Ask LLM to cite sources in responses
4. **Fact-Checking**: Validate generated content when possible

### Production Considerations:
1. **Error Handling**: Comprehensive error handling and fallbacks
2. **Monitoring**: Track query patterns and system performance
3. **Caching**: Implement response caching for common queries
4. **Security**: Sanitize inputs and validate sources
5. **Scalability**: Use managed vector databases (Pinecone, Weaviate)

## 28. Advanced Features to Implement

### 1. Conversational RAG:
- Add conversation history to context
- Track follow-up questions
- Maintain user session state

### 2. Multi-Modal RAG:
- Process images and PDFs
- Extract text from various formats
- Handle structured data (tables, charts)

### 3. Advanced Retrieval:
```python
# Hypothetical Document Embeddings (HyDE)
# Parent-Child Document Retrieval
# Multi-Vector Retrieval
```

### 4. Quality Control:
- Confidence scoring
- Answer validation
- Source verification
- Hallucination detection

### 5. User Feedback:
- Collect user ratings
- Learn from feedback
- Improve over time

## 29. Troubleshooting Common Issues

### Issue: Poor Retrieval Quality
**Solutions:**
- Adjust chunk size and overlap
- Try different embedding models
- Increase number of retrieved documents (k)
- Add metadata filtering

### Issue: Slow Performance
**Solutions:**
- Use faster embedding models
- Implement caching
- Use approximate nearest neighbor search (FAISS with IVF)
- Reduce context length

### Issue: Irrelevant Web Results
**Solutions:**
- Refine search queries
- Add result filtering
- Increase relevance threshold
- Use better search tools (Tavily, Brave)

### Issue: Out of Context Window
**Solutions:**
- Reduce number of retrieved documents
- Summarize retrieved content
- Use models with larger context windows
- Implement smart context truncation

## 30. Resources and Next Steps

### Documentation:
- [LangChain RAG Tutorial](https://python.langchain.com/docs/use_cases/question_answering/)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Vector Databases Comparison](https://www.pinecone.io/learn/vector-database/)

### Alternative Tools:
- **Embeddings**: Cohere, Voyage AI, Jina AI
- **Vector DBs**: Pinecone, Weaviate, Qdrant, Milvus
- **Search**: Tavily (LLM-optimized), Brave Search, SerpAPI
- **Reranking**: Cohere Rerank, Jina Reranker

### Next Steps:
1. Deploy with FastAPI backend
2. Build web UI with Streamlit/Gradio
3. Add authentication and rate limiting
4. Implement A/B testing for different strategies
5. Set up monitoring and analytics
6. Create evaluation pipeline
7. Scale with production-grade vector DB

### Evaluation:
- Test retrieval quality (precision, recall)
- Measure generation quality (faithfulness, relevance)
- Track latency and throughput
- Collect user feedback

Happy building! üöÄ