# Notebook 25: Backend RAG Implementation - Building the Document Processing Pipeline

## üéØ What You'll Learn

Now that you understand the concepts behind RAG, it's time to build the real implementation! In this notebook, you'll transform your existing PDF CRUD backend into an intelligent document assistant by adding a complete RAG pipeline.

You'll implement the `/qa-pdf/{id}` endpoint that can answer any question about any uploaded PDF file using the power of LangChain, OpenAI embeddings, and vector similarity search.

## üõ†Ô∏è What We're Building

**The RAG Endpoint Architecture:**
```
POST /pdfs/qa-pdf/{id}
Body: {"question": "What are the main conclusions?"}
Response: "Based on the document, the main conclusions are..."
```

**The Complete Processing Pipeline:**
1. **Load PDF** from database/S3 storage
2. **Extract text** using PyPDFLoader
3. **Split into chunks** with RecursiveCharacterTextSplitter
4. **Generate embeddings** with OpenAI
5. **Create vector store** with FAISS
6. **Set up RAG chain** with RetrievalQA
7. **Process question** and return intelligent answer

---

**üí° Key Insight**: This single endpoint combines document processing, vector search, and AI generation into one seamless operation.

## Part 1: Environment Setup and Dependencies

### Required Packages Installation

**üîß Core RAG Dependencies:**
```bash
# Navigate to your backend directory
cd 001-langchain-pdf-fastapi-backend

# Activate your virtual environment
pyenv activate your-virtual-environment-name

# Install the exact LangChain version we're using
pip install langchain==0.1.1

# Install additional required packages
pip install boto3
pip install python-multipart
```

**üîë Environment Configuration:**

Add to your `backend/.env` file:
```env
# OpenAI API Key for embeddings and LLM
OPENAI_API_KEY=sk-your-actual-openai-key-here

# Existing environment variables
DATABASE_URL=your-database-url
AWS_ACCESS_KEY_ID=your-aws-key
AWS_SECRET_ACCESS_KEY=your-aws-secret
S3_BUCKET_NAME=your-bucket-name
```

### Understanding the Package Ecosystem

**ü§ñ LangChain 0.1.1:**
- **Why this version?** Stable API with all RAG components we need
- **Core modules**: Document loaders, text splitters, embeddings, vector stores, chains
- **Integration**: Works seamlessly with OpenAI and FAISS

**‚òÅÔ∏è Boto3:**
- **Purpose**: AWS S3 integration for PDF file access
- **Usage**: Download PDF files from S3 for processing
- **Alternative**: Direct file system access if not using S3

**üì§ Python-multipart:**
- **Purpose**: Handle file uploads and form data
- **Usage**: Process the question request body in our endpoint
- **FastAPI requirement**: Needed for request body parsing

## Part 2: Enhanced Schema Definition

### Adding Question Request Schema

**üìù Update `backend/schemas.py`:**

```python
from pydantic import BaseModel
from typing import Optional

# Existing PDF schemas
class PDFRequest(BaseModel):
    name: str
    selected: bool
    file: str

class PDFResponse(BaseModel):
    id: int
    name: str
    selected: bool
    file: str

    class Config:
        from_attributes = True

# NEW: Schema for PDF Q&A requests
class QuestionRequest(BaseModel):
    question: str
    
    class Config:
        # Example for API documentation
        json_schema_extra = {
            "example": {
                "question": "What are the main conclusions of this document?"
            }
        }
```

### Why We Need This Schema

**üéØ Type Safety:**
- Ensures the request contains a valid question string
- FastAPI automatically validates the request body
- Provides clear error messages for malformed requests

**üìö API Documentation:**
- Automatically generates OpenAPI/Swagger documentation
- Provides example requests in the FastAPI docs interface
- Makes the API self-documenting for other developers

**üîß Development Experience:**
- IDE autocomplete and type checking
- Clear contract between frontend and backend
- Easy to extend with additional parameters later

## Part 3: RAG Implementation in the PDF Router

### Complete RAG-Enhanced Router Implementation

**üìù Update `backend/routers/pdfs.py`:**

```python
from typing import List
from sqlalchemy.orm import Session
from fastapi import APIRouter, Depends, HTTPException, status, UploadFile, File
import schemas
import crud
from database import SessionLocal
from uuid import uuid4

# Basic LangChain imports for text summarization
from langchain import OpenAI, PromptTemplate
from langchain.chains import LLMChain

# RAG-specific imports for document Q&A
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from schemas import QuestionRequest

# Initialize LLM instance for RAG
llm = OpenAI(temperature=0)  # Temperature=0 for consistent, factual answers

router = APIRouter(prefix="/pdfs")

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# ... (existing CRUD endpoints remain the same) ...

# Basic LangChain text summarization (from previous tutorial)
langchain_llm = OpenAI(temperature=0)

summarize_template_string = """
        Provide a summary for the following text:
        {text}
"""

summarize_prompt = PromptTemplate(
    template=summarize_template_string,
    input_variables=['text'],
)

summarize_chain = LLMChain(
    llm=langchain_llm,
    prompt=summarize_prompt,
)

@router.post('/summarize-text')
async def summarize_text(text: str):
    summary = summarize_chain.run(text=text)
    return {'summary': summary}


# NEW: Advanced RAG endpoint for PDF question answering
@router.post("/qa-pdf/{id}")
def qa_pdf_by_id(id: int, question_request: QuestionRequest, db: Session = Depends(get_db)):
    """
    Ask a question about a specific PDF document using RAG.
    
    This endpoint:
    1. Retrieves the PDF from the database
    2. Loads and processes the PDF content
    3. Splits text into manageable chunks
    4. Creates embeddings and vector store
    5. Sets up retrieval-augmented generation
    6. Answers the question based on document content
    """
    
    # Step 1: Retrieve PDF from database
    pdf = crud.read_pdf(db, id)
    if pdf is None:
        raise HTTPException(status_code=404, detail="PDF not found")
    
    print(f"Processing PDF: {pdf.file}")  # Debug logging
    
    try:
        # Step 2: Load PDF content using PyPDFLoader
        loader = PyPDFLoader(pdf.file)
        document = loader.load()
        
        # Step 3: Split document into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=3000,      # Size of each text chunk
            chunk_overlap=400     # Overlap to maintain context
        )
        document_chunks = text_splitter.split_documents(document)
        
        print(f"Created {len(document_chunks)} text chunks")  # Debug logging
        
        # Step 4: Generate embeddings and create vector store
        embeddings = OpenAIEmbeddings()
        stored_embeddings = FAISS.from_documents(document_chunks, embeddings)
        
        # Step 5: Create RetrievalQA chain
        QA_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",   # Stuff all retrieved chunks into one prompt
            retriever=stored_embeddings.as_retriever()
        )
        
        # Step 6: Process the question and generate answer
        question = question_request.question
        answer = QA_chain.run(question)
        
        return {"answer": answer}
        
    except Exception as e:
        print(f"Error processing PDF Q&A: {str(e)}")  # Debug logging
        raise HTTPException(
            status_code=500, 
            detail=f"Error processing PDF: {str(e)}"
        )
```

## Part 4: Understanding Each Component in Detail

### Step-by-Step Breakdown of the RAG Pipeline

**üîç Step 1: Database Retrieval**
```python
pdf = crud.read_pdf(db, id)
if pdf is None:
    raise HTTPException(status_code=404, detail="PDF not found")
```
- **Purpose**: Validate that the PDF exists and get its file path
- **Error handling**: Return 404 if PDF ID doesn't exist
- **Security**: Ensures users can only query their accessible PDFs

**üìÑ Step 2: Document Loading**
```python
loader = PyPDFLoader(pdf.file)
document = loader.load()
```
- **PyPDFLoader**: Extracts text from PDF files, preserving page structure
- **Document format**: Returns a list of Document objects with content and metadata
- **File sources**: Works with local files, URLs, or S3 paths

**‚úÇÔ∏è Step 3: Text Chunking**
```python
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=3000,      # ~750 words per chunk
    chunk_overlap=400     # 100 words overlap between chunks
)
document_chunks = text_splitter.split_documents(document)
```

**Why These Parameters?**
- **chunk_size=3000**: Balances context vs. processing efficiency
- **chunk_overlap=400**: Prevents important information from being split across chunks
- **RecursiveCharacterTextSplitter**: Tries to split at natural boundaries (paragraphs, sentences)

**üß† Step 4: Embedding Generation**
```python
embeddings = OpenAIEmbeddings()
stored_embeddings = FAISS.from_documents(document_chunks, embeddings)
```
- **OpenAIEmbeddings**: Converts text to 1536-dimensional vectors
- **FAISS**: Facebook AI Similarity Search - ultra-fast vector database
- **Cost consideration**: Each chunk generates one embedding (~$0.0001 per chunk)

**üîó Step 5: RAG Chain Setup**
```python
QA_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=stored_embeddings.as_retriever()
)
```
- **RetrievalQA**: Orchestrates the entire retrieve-then-generate process
- **"stuff" chain type**: Concatenates all retrieved chunks into one prompt
- **Alternatives**: "map_reduce", "refine" for different use cases

**‚ùì Step 6: Question Processing**
```python
question = question_request.question
answer = QA_chain.run(question)
```
- **Semantic search**: Finds most relevant chunks based on question
- **Context injection**: Combines question with relevant document excerpts
- **LLM generation**: Produces answer based on provided context

## Part 5: Error Handling and Production Considerations

### Comprehensive Error Handling Strategy

**üö® Common Error Scenarios:**

**1. PDF Not Found:**
```python
if pdf is None:
    raise HTTPException(status_code=404, detail="PDF not found")
```

**2. File Access Issues:**
```python
try:
    loader = PyPDFLoader(pdf.file)
    document = loader.load()
except FileNotFoundError:
    raise HTTPException(status_code=404, detail="PDF file not accessible")
except Exception as e:
    raise HTTPException(status_code=500, detail=f"Error loading PDF: {str(e)}")
```

**3. OpenAI API Issues:**
```python
try:
    embeddings = OpenAIEmbeddings()
    stored_embeddings = FAISS.from_documents(document_chunks, embeddings)
except Exception as e:
    if "rate_limit" in str(e).lower():
        raise HTTPException(status_code=429, detail="OpenAI rate limit exceeded")
    elif "api_key" in str(e).lower():
        raise HTTPException(status_code=500, detail="OpenAI API key configuration error")
    else:
        raise HTTPException(status_code=500, detail=f"AI processing error: {str(e)}")
```

**4. Empty or Invalid PDFs:**
```python
if not document or len(document) == 0:
    raise HTTPException(status_code=400, detail="PDF appears to be empty or unreadable")
    
if len(document_chunks) == 0:
    raise HTTPException(status_code=400, detail="No text content found in PDF")
```

### Performance and Cost Optimization

**‚è±Ô∏è Processing Time Considerations:**
- **Small PDFs (1-10 pages)**: ~5-15 seconds
- **Medium PDFs (10-50 pages)**: ~15-60 seconds
- **Large PDFs (50+ pages)**: ~1-3 minutes

**üí∞ Cost Breakdown (approximate):**
- **Embeddings**: $0.0001 per text chunk
- **10-page PDF**: ~20 chunks = $0.002 (0.2 cents)
- **100-page PDF**: ~200 chunks = $0.02 (2 cents)
- **Question answering**: $0.002 per question (GPT-3.5-turbo)

**üéØ Optimization Strategies:**

```python
# Add caching for frequently accessed PDFs
import hashlib
import pickle
import os

def get_pdf_cache_key(pdf_id: int) -> str:
    return f"pdf_embeddings_{pdf_id}"

def cache_embeddings(pdf_id: int, embeddings):
    cache_key = get_pdf_cache_key(pdf_id)
    cache_path = f"cache/{cache_key}.pkl"
    os.makedirs("cache", exist_ok=True)
    with open(cache_path, "wb") as f:
        pickle.dump(embeddings, f)

def load_cached_embeddings(pdf_id: int):
    cache_key = get_pdf_cache_key(pdf_id)
    cache_path = f"cache/{cache_key}.pkl"
    if os.path.exists(cache_path):
        with open(cache_path, "rb") as f:
            return pickle.load(f)
    return None
```

## Part 6: Testing Your RAG Implementation

### Starting the Backend Server

**üöÄ Run the Development Server:**
```bash
# Navigate to backend directory
cd 001-langchain-pdf-fastapi-backend

# Activate virtual environment
pyenv activate your-virtual-environment-name

# Start the server
uvicorn main:app --reload
```

**üìä Access the API Documentation:**
Open your browser and go to: `http://127.0.0.1:8000/docs`

### Testing the RAG Endpoint

**üìã Step-by-Step Testing Process:**

**1. Upload a Test PDF:**
- Use the existing `/pdfs/upload` endpoint
- Upload a PDF with readable text content
- Note the returned PDF ID

**2. Test Text Summarization First:**
- Use the `/pdfs/summarize-text` endpoint
- Input some sample text
- Verify that basic LangChain integration works

**3. Test RAG Q&A:**
- Navigate to `/pdfs/qa-pdf/{id}` in the FastAPI docs
- Enter the PDF ID from step 1
- In the request body, enter:
```json
{
  "question": "What is this document about?"
}
```
- Execute the request

### Example Test Scenarios

**üìñ Test with Different Question Types:**

**Factual Questions:**
```json
{
  "question": "What are the main topics covered in this document?"
}
```

**Specific Detail Questions:**
```json
{
  "question": "What statistics or numbers are mentioned?"
}
```

**Analytical Questions:**
```json
{
  "question": "What are the key conclusions or recommendations?"
}
```

**Comparative Questions:**
```json
{
  "question": "What are the advantages and disadvantages mentioned?"
}
```

### Interpreting Results

**‚úÖ Good RAG Responses:**
- Directly reference document content
- Include specific details from the PDF
- Stay grounded in the actual text
- Acknowledge when information isn't in the document

**‚ùå Poor RAG Responses:**
- Generic answers not related to document
- Hallucinated information not in the PDF
- Very short or vague responses
- Error messages or incomplete processing

### Troubleshooting Common Issues

**üîß Issue: "PDF not found" error**
- Solution: Verify the PDF was uploaded successfully and use correct ID

**üîß Issue: Long processing times**
- Solution: Normal for large PDFs; consider implementing progress indicators

**üîß Issue: OpenAI API key errors**
- Solution: Verify API key in .env file and check OpenAI account balance

**üîß Issue: Empty or generic responses**
- Solution: Check if PDF has extractable text (not just images)

**üîß Issue: Server timeout errors**
- Solution: Increase timeout settings for very large documents

## Part 7: Advanced RAG Configurations and Optimizations

### Fine-Tuning RAG Parameters

**üéõÔ∏è Text Splitting Configuration:**
```python
# For technical documents with code
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,      # Smaller chunks for code
    chunk_overlap=200,    # Less overlap needed
    separators=["\n\n", "\n", "```", "###"]  # Code-aware separators
)

# For narrative documents (reports, books)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000,      # Larger chunks for context
    chunk_overlap=600,    # More overlap for narrative flow
    separators=["\n\n", "\n", ". ", ", "]  # Sentence-aware splitting
)

# For scientific papers
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=3000,      # Balanced approach
    chunk_overlap=500,    # Preserve methodology connections
    separators=["\n\n", "\n", "Abstract", "Introduction", "Methods"]  # Section-aware
)
```

**üîç Retrieval Configuration:**
```python
# Configure retrieval parameters
retriever = stored_embeddings.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 4,              # Return top 4 most relevant chunks
        "score_threshold": 0.5  # Minimum similarity threshold
    }
)

# Alternative: MMR (Maximum Marginal Relevance) for diversity
retriever = stored_embeddings.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 4,
        "fetch_k": 10,       # Fetch 10, then diversify to 4
        "lambda_mult": 0.7   # Balance relevance vs diversity
    }
)
```

**üß† LLM Configuration:**
```python
# Factual, consistent answers
llm = OpenAI(temperature=0, max_tokens=500)

# More creative responses
llm = OpenAI(temperature=0.3, max_tokens=800)

# Use GPT-4 for better reasoning (more expensive)
llm = OpenAI(model_name="gpt-4", temperature=0, max_tokens=500)
```

### Alternative Chain Types

**üìö "stuff" Chain (Current Implementation):**
```python
QA_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Concatenate all chunks
    retriever=retriever
)
```
- **Pros**: Simple, fast for small chunks
- **Cons**: Token limit issues with many/large chunks

**üó∫Ô∏è "map_reduce" Chain:**
```python
QA_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="map_reduce",  # Process chunks separately, then combine
    retriever=retriever
)
```
- **Pros**: Handles large documents better
- **Cons**: More API calls, potentially less coherent

**üîÑ "refine" Chain:**
```python
QA_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="refine",  # Iteratively refine answer with each chunk
    retriever=retriever
)
```
- **Pros**: Progressive answer improvement
- **Cons**: Slower, more expensive

## Part 8: Production Deployment Considerations

### Scalability and Performance

**‚ö° Async Implementation for Better Performance:**
```python
import asyncio
from concurrent.futures import ThreadPoolExecutor

@router.post("/qa-pdf/{id}")
async def qa_pdf_by_id_async(id: int, question_request: QuestionRequest, db: Session = Depends(get_db)):
    """
    Async version of RAG endpoint for better performance
    """
    pdf = crud.read_pdf(db, id)
    if pdf is None:
        raise HTTPException(status_code=404, detail="PDF not found")
    
    # Run RAG processing in thread pool to avoid blocking
    with ThreadPoolExecutor() as executor:
        answer = await asyncio.get_event_loop().run_in_executor(
            executor, process_rag_question, pdf.file, question_request.question
        )
    
    return {"answer": answer}

def process_rag_question(pdf_file: str, question: str) -> str:
    """Separate function for RAG processing"""
    # ... (RAG implementation here) ...
    return answer
```

**üì¶ Caching Strategy:**
```python
import redis
import json
from datetime import timedelta

# Initialize Redis client
redis_client = redis.Redis(host='localhost', port=6379, db=0)

def cache_pdf_embeddings(pdf_id: int, embeddings_data: dict, ttl: int = 3600):
    """Cache embeddings for 1 hour by default"""
    key = f"pdf_embeddings:{pdf_id}"
    redis_client.setex(key, ttl, json.dumps(embeddings_data))

def get_cached_embeddings(pdf_id: int) -> dict:
    """Retrieve cached embeddings"""
    key = f"pdf_embeddings:{pdf_id}"
    cached = redis_client.get(key)
    return json.loads(cached) if cached else None
```

### Monitoring and Logging

**üìä Comprehensive Logging:**
```python
import logging
import time
from functools import wraps

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def log_performance(func):
    """Decorator to log RAG processing performance"""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        try:
            result = func(*args, **kwargs)
            end_time = time.time()
            logger.info(f"RAG processing completed in {end_time - start_time:.2f} seconds")
            return result
        except Exception as e:
            end_time = time.time()
            logger.error(f"RAG processing failed after {end_time - start_time:.2f} seconds: {str(e)}")
            raise
    return wrapper

@log_performance
@router.post("/qa-pdf/{id}")
def qa_pdf_by_id(id: int, question_request: QuestionRequest, db: Session = Depends(get_db)):
    # ... (implementation) ...
```

**üí∞ Cost Tracking:**
```python
import tiktoken

def estimate_costs(text_chunks: list, question: str) -> dict:
    """Estimate OpenAI API costs for the operation"""
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    
    # Embedding costs
    total_embedding_tokens = sum(len(encoding.encode(chunk.page_content)) for chunk in text_chunks)
    embedding_cost = (total_embedding_tokens / 1000) * 0.0001  # $0.0001 per 1K tokens
    
    # LLM costs (estimated)
    question_tokens = len(encoding.encode(question))
    context_tokens = sum(len(encoding.encode(chunk.page_content)) for chunk in text_chunks[:4])  # Top 4 chunks
    llm_cost = ((question_tokens + context_tokens) / 1000) * 0.002  # $0.002 per 1K tokens
    
    return {
        "embedding_cost": embedding_cost,
        "llm_cost": llm_cost,
        "total_cost": embedding_cost + llm_cost,
        "total_tokens": total_embedding_tokens + question_tokens + context_tokens
    }
```

## üéØ Key Takeaways

### What You've Built:

1. **üîß Complete RAG Pipeline**: From PDF loading to intelligent question answering
2. **üéØ Production-Ready Endpoint**: With proper error handling and validation
3. **‚ö° Optimized Processing**: Efficient text chunking and embedding strategies
4. **üîç Intelligent Retrieval**: Semantic search for finding relevant content
5. **üß† Contextual Generation**: AI answers grounded in actual document content

### Technical Skills Mastered:

‚úÖ **Document Processing**: PyPDFLoader, text splitting, chunking strategies  
‚úÖ **Vector Operations**: OpenAI embeddings, FAISS vector storage  
‚úÖ **RAG Architecture**: RetrievalQA chains, different chain types  
‚úÖ **API Design**: RESTful endpoints with proper schemas and error handling  
‚úÖ **Performance Optimization**: Caching, async processing, cost management  

### The RAG Implementation You've Created:

```python
# Your complete RAG endpoint
@router.post("/qa-pdf/{id}")
def qa_pdf_by_id(id: int, question_request: QuestionRequest, db: Session = Depends(get_db)):
    # 1. Validate PDF exists
    # 2. Load and process document
    # 3. Create embeddings and vector store
    # 4. Set up RAG chain
    # 5. Generate intelligent answer
    return {"answer": answer}
```

### Next Steps in Your RAG Journey:

**üìä Coming Up:**
- **Notebook 26**: Deep dive into vector databases and embedding concepts
- **Notebook 27**: Complete the frontend for a full RAG user experience

**üöÄ Ready for Advanced Topics:**
You now understand the core RAG implementation. In the next notebook, we'll explore the fascinating world of vector databases and embeddings - the mathematical foundation that makes semantic search possible.

---

**üéâ Congratulations!** You've successfully implemented a production-ready RAG system that can intelligently answer questions about any PDF document. This is a significant achievement in modern AI development.

**Your RAG backend is now ready.** Test it thoroughly with different PDFs and questions to see the power of document-grounded AI in action!