# Modern RAG Implementation Guide - Part 1C

This notebook walks you through implementing the modern RAG application step by step. We'll build everything from scratch using 2025 best practices.

## Prerequisites Check

Make sure you've completed the setup from **nbv2-part1a-setup.ipynb** before proceeding.

In [None]:
# Verify we're in the right directory and have the right dependencies
import os
import sys

print(f"Current directory: {os.getcwd()}")
print(f"Python version: {sys.version}")

# Check if we can import the main dependencies
try:
    import fastapi
    import langchain_openai
    import langchain_community
    import pgvector
    print("‚úÖ All dependencies available!")
except ImportError as e:
    print(f"‚ùå Missing dependency: {e}")
    print("Make sure you've run 'poetry install' in the v2-modern-step1 directory")

## Step 1: Add PDF Documents

First, let's add some PDF documents to work with. Create the same documents as in the original project:

In [None]:
import os

# Check if PDF documents directory exists and has files
pdf_dir = "v2-modern-step1/pdf-documents"

if os.path.exists(pdf_dir):
    pdf_files = [f for f in os.listdir(pdf_dir) if f.endswith('.pdf')]
    print(f"PDF documents directory exists with {len(pdf_files)} files:")
    for file in pdf_files:
        print(f"  - {file}")
    
    if len(pdf_files) == 0:
        print("\nüìÅ Please add your PDF documents to the pdf-documents folder.")
        print("For this demo, you can use:")
        print("  - John F. Kennedy biography (Wikipedia PDF)")
        print("  - Robert F. Kennedy biography (Wikipedia PDF)")
        print("  - Joseph P. Kennedy biography (Wikipedia PDF)")
else:
    print("‚ùå PDF documents directory not found.")
    print("Make sure you're running this from the project root directory.")

## Step 2: Environment Configuration

Let's set up our environment variables:

In [None]:
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv('v2-modern-step1/.env')

# Check if API keys are configured
openai_key = os.getenv('OPENAI_API_KEY')
langsmith_key = os.getenv('LANGCHAIN_API_KEY')

if openai_key and not openai_key.startswith('your_'):
    print("‚úÖ OpenAI API key is configured")
else:
    print("‚ö†Ô∏è OpenAI API key needs to be configured in v2-modern-step1/.env")
    print("Copy .env.template to .env and add your API key")

if langsmith_key and not langsmith_key.startswith('your_'):
    print("‚úÖ LangSmith API key is configured")
else:
    print("‚ÑπÔ∏è LangSmith API key not configured (optional for monitoring)")

# Show the project name that will be used in LangSmith
project_name = os.getenv('LANGCHAIN_PROJECT', 'ModernRAGStep1-2025')
print(f"üìä LangSmith project: {project_name}")

## Step 3: Test Database Connection

Let's verify that PostgreSQL with PGVector is working:

In [None]:
import psycopg
from psycopg import sql

# Database connection string (same as in our code)
connection_string = "postgresql+psycopg://postgres@localhost:5432/modern_rag_db"

try:
    # Extract connection parameters
    # For testing, we'll use a simpler connection string
    test_conn_str = "host=localhost port=5432 dbname=modern_rag_db user=postgres"
    
    with psycopg.connect(test_conn_str) as conn:
        with conn.cursor() as cur:
            # Check if pgvector extension exists
            cur.execute("SELECT * FROM pg_extension WHERE extname = 'vector';")
            result = cur.fetchone()
            
            if result:
                print("‚úÖ Database connection successful!")
                print("‚úÖ PGVector extension is installed!")
            else:
                print("‚ö†Ô∏è Database connected, but PGVector extension not found.")
                print("Run: psql -d modern_rag_db -c 'CREATE EXTENSION vector;'")
                
except Exception as e:
    print(f"‚ùå Database connection failed: {e}")
    print("\nTroubleshooting steps:")
    print("1. Make sure PostgreSQL is running: brew services start postgresql")
    print("2. Create the database: psql -U postgres -c 'CREATE DATABASE modern_rag_db;'")
    print("3. Install pgvector: psql -d modern_rag_db -c 'CREATE EXTENSION vector;'")

## Step 4: Document Loading and Processing

Now let's implement the modern document loading process:

In [None]:
# Document loading with modern approach
import os
from langchain_community.document_loaders import DirectoryLoader, UnstructuredPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker

print("üìö Loading documents...")

# Set up the document loader
pdf_directory = "v2-modern-step1/pdf-documents"
loader = DirectoryLoader(
    pdf_directory,
    glob="**/*.pdf",
    use_multithreading=True,
    show_progress=True,
    max_concurrency=50,
    loader_cls=UnstructuredPDFLoader,
)

try:
    # Load documents
    docs = loader.load()
    print(f"‚úÖ Loaded {len(docs)} documents")
    
    # Show a sample of what we loaded
    if docs:
        sample_doc = docs[0]
        print(f"\nSample document content (first 200 chars):")
        print(f"Content: {sample_doc.page_content[:200]}...")
        print(f"Metadata: {sample_doc.metadata}")
    
except Exception as e:
    print(f"‚ùå Error loading documents: {e}")
    print("Make sure you have PDF files in the pdf-documents directory")
    docs = []

In [None]:
# Modern embedding model setup
print("üß† Setting up embeddings...")

# Using the modern, cost-effective embedding model
embeddings = OpenAIEmbeddings(model='text-embedding-3-small')

print("‚úÖ Using text-embedding-3-small (5x cheaper than ada-002!)")

# Test the embeddings
try:
    test_embedding = embeddings.embed_query("This is a test sentence.")
    print(f"‚úÖ Embedding test successful! Vector dimension: {len(test_embedding)}")
except Exception as e:
    print(f"‚ùå Embedding test failed: {e}")
    print("Check your OpenAI API key in the .env file")

In [None]:
# Document chunking with SemanticChunker
if docs:  # Only proceed if we have documents
    print("‚úÇÔ∏è Chunking documents...")
    
    # Create the semantic text splitter
    text_splitter = SemanticChunker(embeddings=embeddings)
    
    try:
        # Modern approach: No flattening needed!
        chunks = text_splitter.split_documents(docs)
        
        print(f"‚úÖ Created {len(chunks)} chunks from {len(docs)} documents")
        
        # Show sample chunk
        if chunks:
            sample_chunk = chunks[0]
            print(f"\nSample chunk (first 300 chars):")
            print(f"Content: {sample_chunk.page_content[:300]}...")
            print(f"Metadata: {sample_chunk.metadata}")
            
    except Exception as e:
        print(f"‚ùå Error chunking documents: {e}")
        chunks = []
else:
    print("‚è≠Ô∏è Skipping chunking - no documents loaded")
    chunks = []

## Step 5: Vector Database Setup

Now let's create our vector database:

In [None]:
# Create vector database
if chunks:  # Only proceed if we have chunks
    print("üóÑÔ∏è Creating vector database...")
    
    from langchain_community.vectorstores.pgvector import PGVector
    
    try:
        # Create the vector store
        vector_store = PGVector.from_documents(
            documents=chunks,
            embedding=embeddings,
            collection_name="modern_rag_collection",
            connection_string="postgresql+psycopg://postgres@localhost:5432/modern_rag_db",
            pre_delete_collection=True,  # Clean start
        )
        
        print(f"‚úÖ Vector database created successfully!")
        print(f"üìä Stored {len(chunks)} document chunks")
        
        # Test similarity search
        test_query = "Who was John F. Kennedy?"
        similar_docs = vector_store.similarity_search(test_query, k=2)
        
        print(f"\nüîç Test search for '{test_query}':")
        for i, doc in enumerate(similar_docs):
            print(f"Result {i+1}: {doc.page_content[:100]}...")
            
    except Exception as e:
        print(f"‚ùå Error creating vector database: {e}")
        print("Check your database connection and pgvector installation")
        vector_store = None
else:
    print("‚è≠Ô∏è Skipping vector database creation - no chunks available")
    vector_store = None

## Step 6: RAG Chain Implementation

Let's create our modern RAG chain:

In [None]:
# Modern RAG chain setup
if vector_store:
    print("üîó Setting up RAG chain...")
    
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_openai import ChatOpenAI
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.runnables import RunnablePassthrough
    
    # Create retriever
    retriever = vector_store.as_retriever(search_kwargs={"k": 4})
    
    # Define prompt template
    template = """Answer the question based on the following context:

Context: {context}

Question: {question}

Answer: """
    
    prompt = ChatPromptTemplate.from_template(template)
    
    # Modern LLM setup
    llm = ChatOpenAI(
        temperature=0, 
        model='gpt-4o-mini',  # Cost-effective and great for RAG
        streaming=True
    )
    
    # Document formatting function
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    # Create the RAG chain - Modern, clean approach!
    rag_chain = (
        {
            "context": retriever | format_docs,
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )
    
    print("‚úÖ RAG chain created successfully!")
    print("ü§ñ Using gpt-4o-mini for cost-effective responses")
    
else:
    print("‚è≠Ô∏è Skipping RAG chain setup - no vector store available")
    rag_chain = None

## Step 7: Test the RAG Chain

Let's test our RAG system:

In [None]:
# Test the RAG chain
if rag_chain:
    print("üß™ Testing RAG chain...")
    
    test_questions = [
        "Who was John F. Kennedy?",
        "What was Robert Kennedy known for?",
        "Tell me about the Kennedy family."
    ]
    
    for i, question in enumerate(test_questions, 1):
        print(f"\n‚ùì Question {i}: {question}")
        print("üí≠ Thinking...")
        
        try:
            # Use the chain to get an answer
            answer = rag_chain.invoke(question)
            print(f"ü§ñ Answer: {answer}\n")
            print("-" * 80)
            
        except Exception as e:
            print(f"‚ùå Error: {e}")
            break
            
else:
    print("‚è≠Ô∏è Cannot test RAG chain - setup incomplete")
    print("Make sure you have:")
    print("1. PDF documents in the pdf-documents folder")
    print("2. Valid OpenAI API key")
    print("3. PostgreSQL with pgvector running")

## Step 8: FastAPI Server Implementation

Now let's look at how the modern FastAPI server works (this runs separately from the notebook):

In [None]:
# Let's examine our modern FastAPI server code
server_code = """
# v2-modern-step1/app/server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

class QueryRequest(BaseModel):
    question: str

class QueryResponse(BaseModel):
    answer: str

@app.post("/query", response_model=QueryResponse)
async def query_documents(request: QueryRequest):
    try:
        answer = await rag_chain.ainvoke(request.question)
        return QueryResponse(answer=answer)
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Error: {str(e)}")

@app.post("/stream")
async def stream_query(request: QueryRequest):
    # Streaming support for real-time responses
    ...
"""

print("üöÄ Modern FastAPI Server Features:")
print("‚úÖ Direct endpoint control (no LangServe)")
print("‚úÖ Custom error handling")
print("‚úÖ Proper request/response models")
print("‚úÖ Streaming support")
print("‚úÖ Health check endpoints")
print("‚úÖ Automatic API documentation")

print("\nüìù To run the server:")
print("1. cd v2-modern-step1")
print("2. poetry shell")
print("3. uvicorn app.server:app --reload")
print("4. Visit http://localhost:8000/docs for API documentation")

## Step 9: Performance and Cost Analysis

Let's analyze the improvements:

In [None]:
# Cost analysis
print("üí∞ Cost Analysis (2025 vs 2024):")
print("\nüìä Embedding Costs:")
print("  2024: text-embedding-ada-002 = $0.0001/1k tokens")
print("  2025: text-embedding-3-small = $0.00002/1k tokens")
print("  üí° Savings: 5x cheaper!")

print("\nü§ñ LLM Costs:")
print("  2024: gpt-4-1106-preview = $0.01/1k input tokens")
print("  2025: gpt-4o-mini = $0.00015/1k input tokens")
print("  üí° Savings: ~67x cheaper!")

# Estimate costs for a typical query
if chunks:
    avg_chunk_tokens = sum(len(chunk.page_content.split()) for chunk in chunks[:10]) // 10
    context_tokens = avg_chunk_tokens * 4 * 1.3  # 4 chunks, ~1.3 words per token
    
    old_embedding_cost = len(chunks) * avg_chunk_tokens * 1.3 * 0.0001 / 1000
    new_embedding_cost = len(chunks) * avg_chunk_tokens * 1.3 * 0.00002 / 1000
    
    old_llm_cost = context_tokens * 0.01 / 1000
    new_llm_cost = context_tokens * 0.00015 / 1000
    
    print(f"\nüìà Example for {len(chunks)} chunks:")
    print(f"  2024 embedding cost: ${old_embedding_cost:.4f}")
    print(f"  2025 embedding cost: ${new_embedding_cost:.4f}")
    print(f"  2024 LLM cost per query: ${old_llm_cost:.4f}")
    print(f"  2025 LLM cost per query: ${new_llm_cost:.4f}")
    
    total_old = old_embedding_cost + old_llm_cost
    total_new = new_embedding_cost + new_llm_cost
    savings = (total_old - total_new) / total_old * 100
    
    print(f"\nüí° Total cost reduction: {savings:.1f}%")

## Step 10: Monitoring with LangSmith

Check your LangSmith dashboard for monitoring:

In [None]:
# LangSmith monitoring info
import os

langsmith_configured = os.getenv('LANGCHAIN_TRACING_V2') == 'true'

if langsmith_configured:
    project_name = os.getenv('LANGCHAIN_PROJECT', 'ModernRAGStep1-2025')
    print(f"üìä LangSmith Monitoring Active!")
    print(f"Project: {project_name}")
    print(f"Dashboard: https://smith.langchain.com/")
    print("\nüîç You can monitor:")
    print("  ‚Ä¢ Query performance")
    print("  ‚Ä¢ Token usage")
    print("  ‚Ä¢ Response quality")
    print("  ‚Ä¢ Error rates")
else:
    print("üìä LangSmith monitoring not configured")
    print("To enable monitoring, set LANGCHAIN_TRACING_V2=true in .env")

## Summary

üéâ **Congratulations!** You've successfully implemented a modern RAG application using 2025 best practices!

### What We Built:
‚úÖ **Modern Setup**: Python 3.13.3 + Poetry 2.1.4  
‚úÖ **Cost-Effective Models**: text-embedding-3-small + gpt-4o-mini  
‚úÖ **Direct FastAPI**: No deprecated LangServe dependency  
‚úÖ **Clean Code**: Simplified, readable implementation  
‚úÖ **Proper Error Handling**: Production-ready code  
‚úÖ **Monitoring**: LangSmith integration  

### Key Improvements Over 2024:
- üí∞ **67x cheaper** LLM costs
- üí∞ **5x cheaper** embedding costs
- üöÄ **Better performance** with Python 3.13
- üßπ **Cleaner code** without deprecated dependencies
- üîß **More control** over API behavior

### Next Steps:
1. **Run the server**: `cd v2-modern-step1 && poetry shell && uvicorn app.server:app --reload`
2. **Test the API**: Visit `http://localhost:8000/docs`
3. **Compare with old approach**: Check out **nbv2-part1d-comparison.ipynb**

### Production Considerations:
- Add authentication and rate limiting
- Implement proper logging
- Use connection pooling for the database
- Add caching for frequently asked questions
- Monitor costs and performance in production

Happy coding with modern RAG! üöÄ