# RAG System Setup Guide

This notebook demonstrates how to build a complete **Retrieval-Augmented Generation (RAG)** system using:
- **SentenceTransformers** for document embeddings
- **Pinecone** as the vector database
- **LlamaParse** for PDF processing

![RAG System Architecture](rag.jpg)
## Overview
The RAG system processes insurance policy documents, creates searchable embeddings, and provides intelligent question-answering capabilities.

---

## Step 1: Initial Setup

### Suppress Warnings
First, we suppress deprecation warnings to keep the output clean during development.

In [7]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Step 2: Load Environment Variables

### Environment Configuration
We load environment variables from the `.env` file which contains:
- `PINECONE_API_KEY`: Your Pinecone API key for vector database access
- `LLAMA_CLOUD_API_KEY`: API key for LlamaParse document parsing service

Make sure to create a `.env` file in your project root with these keys.

In [2]:
from dotenv import load_dotenv
load_dotenv()

import os
print(os.getenv('HF_HOME'))

C:\Users\shrij\Favorites\HackRx6.0\backend\hf_models_cache


## Step 3: Initialize Embedding Model

### SentenceTransformer Setup
We initialize the SentenceTransformer model for creating embeddings:
- **Model**: `all-MiniLM-L6-v2` - A lightweight, efficient model for semantic similarity
- **Device**: CPU (can be changed to GPU if available)
- **Dimension**: 384 (this model produces 384-dimensional vectors)

This model will convert text into numerical vectors that capture semantic meaning.

In [3]:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# embeddings = model.encode(sentences)
# print(embeddings)
model


  from .autonotebook import tqdm as notebook_tqdm


SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

## Step 4: Initialize Pinecone Vector Database

### Pinecone Setup
We connect to Pinecone, our vector database service:
- **Index Name**: `test-index` (customize as needed)
- **Dimension**: 384 (matches our embedding model)
- **Metric**: `cosine` (for semantic similarity)
- **Spec**: `serverless` (cost-effective, scales automatically)

This database will store our document embeddings for fast similarity search.

In [4]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))

In [5]:
# Pinecone index configuration
INDEX_NAME = "llm-rag-langchain-index"
DIMENSION = 384  # all-MiniLM-L6-v2 produces 384-dimensional embeddings
METRIC = "cosine"

if not pc.has_index(INDEX_NAME):
    print(f"Creating index: {INDEX_NAME}")
    pc.create_index(
        name=INDEX_NAME,
        dimension=DIMENSION,
        metric=METRIC,
        spec=ServerlessSpec(
            cloud='aws',
            region=os.getenv('PINECONE_REGION', 'us-west-2')
        )
    )
    print(f"Index '{INDEX_NAME}' created successfully!")
else:
    print(f"Index '{INDEX_NAME}' already exists")

# Connect to the index
index = pc.Index(INDEX_NAME)
print(f"Connected to index: {INDEX_NAME}")
print(f"Index stats: {index.describe_index_stats()}")

Index 'llm-rag-langchain-index' already exists
Connected to index: llm-rag-langchain-index
Index stats: {'dimension': 384,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'default': {'vector_count': 28}},
 'total_vector_count': 28,
 'vector_type': 'dense'}


## Step 5: Setup Document Parser

### LlamaParse Configuration
We initialize LlamaParse for advanced document parsing:
- **API Key**: From environment variables
- **Result Type**: `text` (extracts clean text from documents)
- **Fast Mode**: Disabled for better quality parsing
- **Timeout**: 120 seconds (with fallback handling)

LlamaParse can handle complex document formats (PDFs, Word docs, etc.) and extract structured text.

In [None]:
from llama_parse import LlamaParse
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize LlamaParse (you'll need your API key in environment)
parser = LlamaParse(
    api_key=os.getenv('LLAMA_CLOUD_API_KEY'),  # Set this in your .env file
    result_type="text"
)

2025-07-31 15:29:58,726 - httpx - INFO - HTTP Request: POST https://api.cloud.llamaindex.ai/api/parsing/upload "HTTP/1.1 200 OK"


Started parsing the file under job_id 48f6ce80-1108-4424-b762-ffe8398ada2e


2025-07-31 15:30:01,120 - httpx - INFO - HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/48f6ce80-1108-4424-b762-ffe8398ada2e "HTTP/1.1 200 OK"
2025-07-31 15:30:03,517 - httpx - INFO - HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/48f6ce80-1108-4424-b762-ffe8398ada2e "HTTP/1.1 200 OK"
2025-07-31 15:30:04,673 - httpx - INFO - HTTP Request: GET https://api.cloud.llamaindex.ai/api/parsing/job/48f6ce80-1108-4424-b762-ffe8398ada2e/result/text "HTTP/1.1 200 OK"


## Step 6: Document Processing Function

### Text Chunking and Processing
This function handles document processing with intelligent chunking:
- **Chunk Size**: 1000 characters (optimal for embeddings)
- **Overlap**: 200 characters (maintains context between chunks)
- **Error Handling**: Graceful fallbacks if parsing fails
- **Metadata**: Preserves source information for each chunk

The function splits large documents into manageable pieces while maintaining semantic coherence.

In [None]:
# Parse the PDF (replace with your PDF path)
pdf_path = "data\BAJHLIP23020V012223.pdf"  # Update this path
documents = parser.load_data(pdf_path)

## Step 7: Load and Process Document

### Document Loading Implementation
Here we load a sample document and process it:
- **File Loading**: Reads the document using LlamaParse
- **Text Extraction**: Converts document content to clean text
- **Chunking**: Splits text into overlapping segments
- **Validation**: Ensures document was processed successfully

In [67]:
import re

def clean_chunk(chunk: str) -> str:
    # Replace multiple newlines with a single space
    cleaned_text = re.sub(r'\n+', ' ', chunk)
    cleaned_text = cleaned_text.strip()
    # Replace multiple spaces with a single space (optional, but good for normalization)
    cleaned_text = re.sub(r' +', ' ', cleaned_text)
    return cleaned_text

## Step 8: Generate Embeddings

### Vector Creation
This step converts text chunks into numerical vectors:
- **Input**: Text chunks from the document
- **Processing**: SentenceTransformer encodes each chunk
- **Output**: 384-dimensional vectors representing semantic meaning
- **Batch Processing**: Handles multiple chunks efficiently

Each embedding captures the semantic content of its corresponding text chunk.

In [68]:
# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Adjust based on your needs
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

## Step 9: Prepare Vectors for Upload

### Vector Formatting
We format the embeddings for Pinecone storage:
- **ID Generation**: Creates unique IDs for each vector
- **Vector Data**: Converts embeddings to the required format
- **Metadata**: Attaches source text and document information
- **Batch Preparation**: Organizes vectors for efficient upload

This structure allows for fast retrieval and maintains traceability to original text.

In [69]:
# Process documents and create chunks
all_chunks = []
for doc in documents:
    chunks = text_splitter.split_text(doc.text)
    cleaned_chunks = [clean_chunk(chunk) for chunk in chunks]
    
    for i, cleaned_chunk in enumerate(cleaned_chunks):
        # Extract coverage type and document section from chunk content
        coverage_type = None
        document_section = None
        
        # Simple keyword-based classification
        chunk_lower = cleaned_chunk.lower()
        if any(word in chunk_lower for word in ['dental', 'tooth', 'teeth']):
            coverage_type = 'dental'
        elif any(word in chunk_lower for word in ['medical', 'surgery', 'treatment']):
            coverage_type = 'medical'
        elif any(word in chunk_lower for word in ['optical', 'eye', 'vision']):
            coverage_type = 'optical'
            
        if 'exclusion' in chunk_lower:
            document_section = 'exclusions'
        elif any(word in chunk_lower for word in ['benefit', 'coverage', 'cover']):
            document_section = 'benefits'
        elif 'definition' in chunk_lower:
            document_section = 'definitions'
        elif 'claim' in chunk_lower:
            document_section = 'claims'
            
        all_chunks.append({
            'id': f"doc_{documents.index(doc)}_chunk_{i}",
            'text': cleaned_chunk,
            'metadata': {
                'source': pdf_path,
                'document_index': documents.index(doc),
                'chunk_index': i,
                'chunk_size': len(cleaned_chunk),
                'coverage_type': coverage_type,
                'document_section': document_section
            }
        })

print(f"Total chunks created: {len(all_chunks)}")
print(f"Sample chunk structure:")
print(f"ID: {all_chunks[0]['id']}")
print(f"Text preview: {all_chunks[0]['text'][:200]}...")
print(f"Metadata: {all_chunks[0]['metadata']}")

Total chunks created: 658
Sample chunk structure:
ID: doc_0_chunk_0
Text preview: Bajaj Allianz General Insurance Co. Ltd. Caringld yours Bajaj Allianz House, Airport Road, Yerawada, Pune - 411 006. Reg. No.: 113 For more details, log on to: www.bajajallianz.com | E-mail: bagichelp...
Metadata: {'source': 'data\\BAJHLIP23020V012223.pdf', 'document_index': 0, 'chunk_index': 0, 'chunk_size': 360, 'coverage_type': None, 'document_section': None}


In [1]:
# Generate embeddings for all chunks
print("\nGenerating embeddings...")
chunk_texts = [chunk['text'] for chunk in all_chunks]
chunk_embeddings = model.encode(chunk_texts)

print(f"Generated {len(chunk_embeddings)} embeddings")
print(f"Embedding dimension: {len(chunk_embeddings[0])}")


Generating embeddings...


NameError: name 'all_chunks' is not defined

In [76]:
# Prepare data for Pinecone (but don't upload yet)
pinecone_data = []
for i, chunk in enumerate(all_chunks):
    pinecone_record = {
        'id': chunk['id'],
        'values': chunk_embeddings[i].tolist(),
        'metadata': {
            'text': chunk['text'],
            'source': chunk['metadata']['source'],
            'document_index': chunk['metadata']['document_index'],
            'chunk_index': chunk['metadata']['chunk_index'],
            'chunk_size': chunk['metadata']['chunk_size'],
            'coverage_type': chunk['metadata']['coverage_type'] if chunk['metadata']['coverage_type'] else "",
            'document_section': chunk['metadata']['document_section'] if chunk['metadata']['document_section'] else ""
        }
    }
    pinecone_data.append(pinecone_record)

# Display what would be added to Pinecone
print(f"\n=== PINECONE UPLOAD PREVIEW ===")
print(f"Total records to upload: {len(pinecone_data)}")
print(f"\nFirst 3 records preview:")

for i in range(min(3, len(pinecone_data))):
    record = pinecone_data[i]
    print(f"\n--- Record {i+1} ---")
    print(f"ID: {record['id']}")
    print(f"Vector dimension: {len(record['values'])}")
    print(f"Text preview: {record['metadata']['text'][:150]}...")
    print(f"Metadata: {record['metadata']}")

print(f"\n=== STATISTICS ===")
print(f"Average chunk size: {sum(len(chunk['text']) for chunk in all_chunks) / len(all_chunks):.1f} characters")
print(f"Total text length: {sum(len(chunk['text']) for chunk in all_chunks)} characters")
print(f"Embedding dimension: {DIMENSION}")


=== PINECONE UPLOAD PREVIEW ===
Total records to upload: 658

First 3 records preview:

--- Record 1 ---
ID: doc_0_chunk_0
Vector dimension: 384
Text preview: Bajaj Allianz General Insurance Co. Ltd. Caringld yours Bajaj Allianz House, Airport Road, Yerawada, Pune - 411 006. Reg. No.: 113 For more details, l...
Metadata: {'text': 'Bajaj Allianz General Insurance Co. Ltd. Caringld yours Bajaj Allianz House, Airport Road, Yerawada, Pune - 411 006. Reg. No.: 113 For more details, log on to: www.bajajallianz.com | E-mail: bagichelp@bajajallianz.co.in or BAJAJ|Allianz Call at: Sales - 1800 209 0144 / Service - 1800 209 5858 (Toll Free No.) Issuing Office: GLOBAL HEALTH CARE Policy Wordings', 'source': 'data\\BAJHLIP23020V012223.pdf', 'document_index': 0, 'chunk_index': 0, 'chunk_size': 360, 'coverage_type': '', 'document_section': ''}

--- Record 2 ---
ID: doc_0_chunk_1
Vector dimension: 384
Text preview: Policy Wordings UIN- BAJHLIP23020V012223 SECTION A) PREAMBLE...
Metadata: {'text': 'P

## Step 10: Upload Vectors to Pinecone

### Vector Database Storage
This step uploads our embeddings to Pinecone:
- **Batch Upload**: Sends vectors in efficient batches
- **Index Storage**: Stores vectors in the specified index
- **Metadata Inclusion**: Preserves text and source information
- **Success Verification**: Confirms upload completion

Once uploaded, vectors are ready for similarity search queries.

In [77]:
# Uncomment the line below when you're ready to upload to Pinecone
index.upsert(vectors=pinecone_data)
print(f"\nReady to upload to Pinecone index: {INDEX_NAME}")
# print("Uncomment the last line in the cell to actually upload the data.")


Ready to upload to Pinecone index: llm-rag-langchain-index


## Step 11: Test Search Functionality

### Query Processing
Now we test the semantic search capability:
- **Query Embedding**: Converts search query to vector format
- **Similarity Search**: Finds most relevant document chunks
- **Result Ranking**: Returns results sorted by relevance score
- **Metadata Retrieval**: Includes source text and context

This demonstrates the complete RAG (Retrieval-Augmented Generation) pipeline.

In [78]:
from langchain_google_genai import ChatGoogleGenerativeAI
from config import config


llm = ChatGoogleGenerativeAI(
                model=config.LLM_MODEL_NAME,
                google_api_key=config.GOOGLE_API_KEY,
                temperature=0.1,  # Low temperature for more consistent responses
                max_tokens=1024,  # Reasonable response length
                top_p=0.9,  # Nucleus sampling for better quality
            )

## Step 12: Verify Index Statistics

### Database Verification
Check that vectors were successfully stored:
- **Vector Count**: Total number of vectors in the index
- **Index Dimensions**: Confirms embedding dimension (384)
- **Storage Status**: Verifies successful upload
- **Index Health**: Ensures database is ready for queries

This validation step confirms your RAG system is properly set up.

In [None]:
index.describe_index_stats()

## Step 13: Advanced Search Function

### Enhanced Query Processing
This function provides comprehensive search capabilities:
- **Query Embedding**: Converts natural language queries to vectors
- **Top-K Search**: Retrieves the most relevant results
- **Score Thresholding**: Filters results by relevance
- **Formatted Output**: Returns clean, readable results

Use this function for production-ready semantic search.

In [79]:
import json

def extract_query_metadata(query: str) -> dict:
    """
    Extract structured metadata from a natural language query using LLM
    """
    extraction_prompt = f"""Analyze the following insurance policy query and extract structured metadata.
    Return a JSON object with the following fields:
    - intent: The main purpose (e.g., "query_coverage", "query_deductible", "query_exclusion", "query_claim_process")
    - entities: List of relevant entities mentioned (e.g., ["dental", "surgery", "accident"])
    - coverage_type: If mentioned (e.g., "dental", "medical", "optical")
    - document_section: Likely section (e.g., "benefits", "exclusions", "definitions", "claims")
    
    Query: "{query}"
    
    Return only valid JSON:"""
    
    try:
        response = llm.invoke(extraction_prompt)
        # Parse the JSON response
        metadata = json.loads(response.content.strip())
        return metadata
    except:
        # Fallback to basic metadata if LLM fails
        return {
            "intent": "general_query",
            "entities": [],
            "coverage_type": None,
            "document_section": None
        }

def enhanced_rag_query(query: str):
    """
    Enhanced RAG query with metadata extraction and filtering
    """
    # Step 1: Extract query metadata
    query_metadata = extract_query_metadata(query)
    print(f"Extracted metadata: {query_metadata}")
    
    # Step 2: Create Pinecone filter based on extracted metadata
    pinecone_filter = {}
    if query_metadata.get('coverage_type'):
        pinecone_filter['coverage_type'] = query_metadata['coverage_type']
    
    # You might want to map some 'document_section' values for filtering
    # For example, if your metadata stores "benefits" and the LLM extracts "benefits"
    if query_metadata.get('document_section'):
        pinecone_filter['document_section'] = query_metadata['document_section']
    
    # If you had "entities" in your chunk metadata, you could filter on those too
    # Example: if query_metadata.get('entities'):
    #              pinecone_filter['entities'] = {"$in": query_metadata['entities']}
    # This assumes your 'entities' in chunk metadata is a list.

    print(f"Pinecone filter applied: {pinecone_filter}") # Added for debugging
    
    # Step 3: Retrieve relevant documents from Pinecone
    query_embedding = model.encode(query).tolist()
    results = index.query(
        vector=query_embedding,
        top_k=5,
        include_values=False,
        include_metadata=True,
        filter=pinecone_filter if pinecone_filter else None
    )
    
    # Step 4: Extract relevant context from search results
    context_chunks = []
    for match in results['matches']:
        context_chunks.append(match['metadata']['text'])
    
    context = "\n\n".join(context_chunks)
    
    # Step 5: Create enhanced prompt with context and extracted metadata
    prompt = f"""Based on the following context from the insurance policy document, please answer the user's question.
    If the answer cannot be found in the context, please say so.
    
    Query Analysis:
    - Intent: {query_metadata.get('intent', 'general')}
    - Key Entities: {', '.join(query_metadata.get('entities', []))}
    - Coverage Type: {query_metadata.get('coverage_type', 'general')}
    
    Context:
    {context}
    
    Question: {query}
    
    Answer:"""
    
    # Step 6: Generate response using Gemini
    response = llm.invoke(prompt)
    
    return {
        'answer': response.content,
        'query_metadata': query_metadata,
        'context': context,
        'sources': [match['id'] for match in results['matches']],
        'scores': [match['score'] for match in results['matches']]
    }

## Step 14: Interactive Testing

### Try Different Queries
Test your RAG system with various queries:
- **Specific Questions**: Ask about content in your document
- **Keyword Searches**: Search for specific terms or concepts
- **Semantic Queries**: Use natural language questions
- **Relevance Testing**: Verify that results match your expectations

Experiment with different query types to understand your system's capabilities.

In [None]:
def run_query(query):
    """
    Run a query against the Pinecone index and return the results.
    """
    query_embedding = model.encode(query).tolist()
    results = index.query(
        vector=query_embedding,
        top_k=5,
        include_values=True,
        include_metadata=True
    )
    for match in results['matches']:
        print(f"ID: {match['id']}, Score: {match['score']}, Metadata: {match['metadata']}")


def rag_query(query):
    """
    Process a query using RAG (Retrieval Augmented Generation) with Pinecone and Gemini
    """
    # Step 1: Retrieve relevant documents from Pinecone
    query_embedding = model.encode(query).tolist()
    results = index.query(
        vector=query_embedding,
        top_k=5,
        include_values=False,
        include_metadata=True
    )
    
    # Step 2: Extract relevant context from search results
    context_chunks = []
    for match in results['matches']:
        context_chunks.append(match['metadata']['text'])
    
    context = "\n\n".join(context_chunks)
    
    # Step 3: Create prompt with context and query
    prompt = f"""Based on the following context from the insurance policy document, please answer the user's question. 
                If the answer cannot be found in the context, please say so.

                Context:
                {context}

                Question: {query}

                Answer:"""
    
    # Step 4: Generate response using Gemini
    response = llm.invoke(prompt)
    
    return {
        'answer': response.content,
        'context': context,
        'sources': [match['id'] for match in results['matches']],
        'scores': [match['score'] for match in results['matches']]
    }

## Step 15: Complete RAG Implementation

### Full Pipeline Integration
This comprehensive function combines all components:
- **Document Retrieval**: Searches for relevant content
- **Context Assembly**: Combines multiple relevant chunks
- **LLM Integration**: Uses retrieved context to generate answers
- **Source Attribution**: Provides references to original content

This represents a complete Retrieval-Augmented Generation system.

## Step 16: Test RAG Query

### Question-Answering Example
Test the complete RAG system with a sample question:
- **Input**: Natural language question about your document
- **Processing**: System retrieves relevant context and generates answer
- **Output**: AI-generated response based on your document content
- **Sources**: References to original text chunks

This demonstrates how RAG enhances LLM responses with your specific document content.

In [80]:
# Test the RAG system
result = enhanced_rag_query("Does this policy cover dental surgery, and what are the conditions?")
print("Answer:", result['answer'])
print("\nSources:", result['sources'])

Extracted metadata: {'intent': 'general_query', 'entities': [], 'coverage_type': None, 'document_section': None}
Pinecone filter applied: {}


  return forward_call(*args, **kwargs)
Batches: 100%|██████████| 1/1 [00:00<00:00, 24.11it/s]


Answer: Yes, dental surgery can be covered under specific conditions, but there are also general exclusions.

Here are the conditions:

1.  **General Exclusion:**
    *   Any Dental Treatment that comprises of cosmetic surgery, dentures, dental prosthesis, dental implants, orthodontics, or surgery of any kind is generally **not covered** unless it is a result of Accidental Bodily Injury to natural teeth and also requires Hospitalization (unless specified otherwise in the Table of Benefits or a Policy endorsement).
    *   Out-patient Dental Treatment expenses are generally **not covered**.

2.  **Optional Dental Plan Benefits (Part B-III):**
    *   The policy can be extended to cover certain dental-related expenses, including some

Sources: ['doc_17_chunk_2', 'doc_25_chunk_10', 'doc_17_chunk_9', 'doc_27_chunk_8', 'doc_28_chunk_1']


## Step 17: Performance and System Verification

### System Health Check
Final verification of your RAG system:
- **Response Quality**: Evaluate answer relevance and accuracy
- **Performance Metrics**: Check response times and efficiency
- **Error Handling**: Verify graceful handling of edge cases
- **Scalability**: Test with different document sizes and query types

This ensures your system is production-ready and performing optimally.

In [6]:
index.delete(delete_all=True)

NotFoundException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Sat, 02 Aug 2025 03:55:34 GMT', 'Content-Type': 'application/json', 'Content-Length': '55', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '56', 'x-pinecone-request-id': '6424586466494543201', 'x-envoy-upstream-service-time': '56', 'server': 'envoy'})
HTTP response body: {"code":5,"message":"Namespace not found","details":[]}


## Conclusion and Next Steps

### 🎉 Congratulations!
You've successfully built a complete RAG (Retrieval-Augmented Generation) system with:

✅ **Document Processing**: Parse and chunk documents with LlamaParse  
✅ **Semantic Embeddings**: Convert text to vectors with SentenceTransformer  
✅ **Vector Storage**: Store embeddings in Pinecone for fast search  
✅ **Intelligent Retrieval**: Find relevant content based on semantic similarity  
✅ **AI-Powered Answers**: Generate responses using retrieved context  

### 🚀 Next Steps
- **Scale Up**: Add more documents to your knowledge base
- **Customize**: Tune chunk sizes and similarity thresholds
- **Enhance**: Integrate with web interfaces or chat applications
- **Monitor**: Track performance and improve based on user feedback

### 📚 Additional Resources
- Experiment with different embedding models
- Try different chunking strategies for your document types
- Explore advanced Pinecone features like metadata filtering
- Consider integrating with other LLM providers

Your RAG system is now ready for production use!