# Retrieval Augmented Generation

## Summary: RAG Document Q&A System

This notebook implements a Retrieval-Augmented Generation (RAG) system for querying ACME Medical Devices corporate documents.

### Test Data

**Location:** `RAGSystemInputDocs/`

**Purpose:** Synthetic corporate documents created for verification and demonstration purposes. This test data was used to run the flows and prove the RAG pipeline works end-to-end.

**Documents:** 8 corporate procedure PDFs covering:
- Sales Team (pricing authority, anti-kickback rules)
- Manufacturing Team (DHR requirements, non-conformance handling, cleanroom protocols)
- Product Team (launch gate reviews, design controls)
- Customer Support (complaint classification tiers, escalation procedures)
- Medical Partnerships (KOL contracts, Sunshine Act compliance)
- Accounting (expense thresholds, SOX controls)
- Legal (contract authority, litigation holds)
- Regulatory Affairs (510(k) requirements, MDR deadlines)

Each document contains team leadership contacts, mandatory rules, required approvals, and escalation procedures for legal gray areas.

### Architecture & Flow

| Step | Component | Purpose |
|------|-----------|---------|
| 1 | **Tokenizer** (AutoTokenizer) | Converts text into token IDs for model processing |
| 2 | **Context Encoder** (DPRContextEncoder) | Converts documents into vector embeddings |
| 3 | **Document Loader** (PyPDFLoader + TextSplitter) | Loads and chunks PDFs for processing |
| 4 | **Embedding Generation** | Encodes all document chunks into vectors |
| 5 | **Vector Index** (FAISS) | Stores embeddings for fast similarity search |
| 6 | **Question Encoder** (DPRQuestionEncoder) | Encodes user questions for comparison |
| 7 | **Search Function** | Finds most relevant document chunks for a query |
| 8 | **LLM Integration** (Claude API) | Generates conversational answers from retrieved context |

### Query Flow
```
User Question → Question Encoder → FAISS Search → Top-k Chunks → Claude Prompt → Answer
```

### Limitations & Areas for Future Improvement

| Limitation | Improvement |
|------------|-------------|
| Fixed chunk size/overlap may not suit varied document types | Use LangChain's SemanticChunker or implement parent-child retrieval |
| DPR model not trained on domain terminology | Fine-tune embeddings or use domain-specific models |
| High k value needed for reliable retrieval | Better embedding models (e.g., OpenAI ada-002, Cohere) |
| No persistence — index rebuilds each run | Save FAISS index to disk, add vector database (Pinecone, Chroma) |
| Small dataset | Scale testing with larger document sets |
| No reranking | Add cross-encoder reranker for improved precision |

### Usage

**For developers adapting this code:**

1. Replace `RAGSystemInputDocs/` with your own PDF documents
2. Replace references to`ANTHROPIC_API_KEY` with your real API key
3. Adjust `chunk_size` and `chunk_overlap` based on your document structure
4. Tune `k` based on your total chunk count
5. Modify the prompt template in `ask_acme()` to fit your domain

## Import necessary libraries

In [26]:
# Suppress warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="urllib3")
from transformers import logging
logging.set_verbosity_error()

#Import context tokenizer
from transformers import DPRContextEncoderTokenizer
from transformers import AutoTokenizer

#Load pre-trained tokenizer to enable tokenization of input context documents
model_name = 'facebook/dpr-ctx_encoder-single-nq-base'
context_tokenizer = AutoTokenizer.from_pretrained(model_name)

# Verify it loaded
print("Tokenizer load details (printed for verification):")
print(f"Tokenizer type: {type(context_tokenizer)}")
print(f"Vocab size: {context_tokenizer.vocab_size}")

#Initialize the Context encoder
from transformers import DPRContextEncoder
encoder_model = 'facebook/dpr-ctx_encoder-single-nq-base'
context_encoder = DPRContextEncoder.from_pretrained(encoder_model)

Tokenizer load details (printed for verification):
Tokenizer type: <class 'transformers.models.dpr.tokenization_dpr_fast.DPRQuestionEncoderTokenizerFast'>
Vocab size: 30522


In [36]:
# Re-enable warnings for this section
from transformers import logging
logging.set_verbosity_warning()

## Install LangChain pyPDF for PDF processing

In [83]:
import os
import re
os.environ["TOKENIZERS_PARALLELISM"] = "false"
!pip3 install langchain pypdf #You can skip this line if already installed in your environment or replace with appropriate install command
!pip3 install langchain-community
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


## Loading the test dataset for ACMEMedicalDevicesCompany

In [111]:
doc_folder = "RAGSystemInputDocs"
all_paragraphs = []

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=300,
    separators=["\n\n", "\n", ". ", " "]
)

for filename in os.listdir(doc_folder):
    if filename.endswith(".pdf"):
        loader = PyPDFLoader(os.path.join(doc_folder, filename))
        pages = loader.load()
        for page in pages:
            text = page.page_content.replace("  ", " ")
            chunks = text_splitter.split_text(text)
            all_paragraphs.extend(chunks)

print(f"Loaded {len(all_paragraphs)} text chunks from PDFs")

Loaded 24 text chunks from PDFs


## Convert text chunks from documents into searchable vectors

In [112]:
import torch

# Encode all document chunks into embeddings
embeddings = []
for chunk in all_paragraphs:
    tokens = context_tokenizer(chunk, return_tensors='pt', truncation=True, max_length=256, padding=True)
    with torch.no_grad():
        embedding = context_encoder(**tokens).pooler_output
    embeddings.append(embedding)

embeddings_tensor = torch.cat(embeddings, dim=0)

# Verify embeddings shape
print(f"Context embeddings shape: {embeddings_tensor.shape}")
print(f"  → {embeddings_tensor.shape[0]} document chunks")
print(f"  → {embeddings_tensor.shape[1]} dimensions per embedding")

Context embeddings shape: torch.Size([24, 768])
  → 24 document chunks
  → 768 dimensions per embedding


## Implement Facebook AI Similarity Search (Faiss)
* Selected Faiss (developed by FAIR) for efficient searching and processing of collections of high-dimensional vectors.
* FAISS will find the k most similar vectors to a query vector from a large collection using distance metrics. When a user asks a question, FAISS searches our document embeddings and returns the most relevant chunks to answer it.

In [113]:
!pip3 install faiss-cpu
import faiss
import numpy as np

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [114]:
context_embeddings_np = embeddings_tensor.detach().numpy().astype('float32')

index = faiss.IndexFlatL2(embeddings_tensor.shape[1])
index.add(context_embeddings_np)

print(f"FAISS index created with {index.ntotal} vectors of {embeddings_tensor.shape[1]} dimensions")

FAISS index created with 24 vectors of 768 dimensions


In [115]:
# Suppress warnings in this cell
from transformers import logging
logging.set_verbosity_error()

# Implement the question encoder
from transformers import DPRQuestionEncoder

question_encoder = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
question_tokenizer = AutoTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')

print(f"Question encoder loaded: {type(question_encoder)}")

Question encoder loaded: <class 'transformers.models.dpr.modeling_dpr.DPRQuestionEncoder'>


In [116]:
# Define a function to encode a question into an embedding for FAISS comparison
def search(question, k=3):
    # Encode the question
    tokens = question_tokenizer(question, return_tensors='pt', truncation=True, max_length=256)
    with torch.no_grad():
        question_embedding = question_encoder(**tokens).pooler_output.numpy().astype('float32')
    
    # Search FAISS index
    distances, indices = index.search(question_embedding, k)
    
    # Return top k matching chunks
    return [(all_paragraphs[i], distances[0][j]) for j, i in enumerate(indices[0])]

# Define a search function takes a user question, encodes it, and returns the k most relevant document chunks
def search(question, k=3):
    tokens = question_tokenizer(question, return_tensors='pt', truncation=True, max_length=256)
    
    with torch.no_grad():
        query_embedding = question_encoder(**tokens).pooler_output.numpy().astype('float32')
    
    distances, indices = index.search(query_embedding, k)
    
    results = []
    for j, i in enumerate(indices[0]):
        results.append((all_paragraphs[i], distances[0][j]))
    
    return results

In [57]:
!pip3 install anthropic

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [136]:
from anthropic import Anthropic
from dotenv import load_dotenv

# Replace with your own API key
# Locally I chose to load my API key from .env file, but you can provide your own implementation to make Claude connection here
load_dotenv()
CLAUDE_API_KEY = os.getenv("ANTHROPIC_API_KEY")
client = Anthropic(api_key=CLAUDE_API_KEY)

# Test API connection
try:
    test_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=10,
        messages=[{"role": "user", "content": "Hi"}]
    )
    print("API connection success!")
except Exception as e:
    print(f"API connection failed")
    print(f"Error: {e}")

API connection success!


In [133]:
# FUNCTION DEFINITION FOR INTERACTIVE Q&A DEMO

def ask_acme(question, k=15): #Increasing k will consume more tokens and lead to slower responses but retrieves more relevant chunks that DPR ranks poorly due to terminology mismatch
    print("=" * 60)
    print(f"STEP 1: User Question")
    print(f"  → {question}")
    
    print("\n" + "=" * 60)
    print(f"STEP 2: Searching FAISS index for top {k} relevant chunks...")
    chunks = search(question, k)
    print(f"  → Retrieved {len(chunks)} chunks (distances: {[f'{d:.1f}' for _, d in chunks]})")
    
    print("\n" + "=" * 60)
    print("STEP 3: Building prompt with retrieved context...")
    context = "\n\n".join([chunk[0] for chunk in chunks])
    prompt = f"""Based on the following ACME Medical Devices company documents, answer the question. 
Be concise and specific. If the answer isn't in the context, say so.
Context:
{context}
Question: {question}"""
    print(f"  → Prompt built with {len(context)} characters of context")
    
    print("\n" + "=" * 60)
    print("STEP 4: Sending to Claude for conversational response...")
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    answer = response.content[0].text
    
    print("\n" + "=" * 60)
    print("STEP 5: Final Answer")
    print(f"\n{answer}")
    print("\n" + "=" * 60)
    
    return answer

# Putting it All Together: Interactive RAG Demo, Document Q&A with FAISS Retrieval 

In [134]:
while True:
    user_input = input("\nAsk a question about ACME Medical Devices (or 'quit' to exit): ")
    if user_input.lower() in ['quit', 'exit', 'q']:
        print("Goodbye!")
        break
    ask_acme(user_input, k=15)  


Ask a question about ACME Medical Devices (or 'quit' to exit):  What are the Device History Record requirements?


STEP 1: User Question
  → What are the Device History Record requirements?

STEP 2: Searching FAISS index for top 15 relevant chunks...
  → Retrieved 15 chunks (distances: ['100.8', '101.4', '101.8', '103.0', '103.6', '104.2', '104.6', '104.7', '104.8', '105.6', '105.9', '106.0', '106.3', '107.4', '107.8'])

STEP 3: Building prompt with retrieved context...
  → Prompt built with 15324 characters of context

STEP 4: Sending to Claude for conversational response...

STEP 5: Final Answer

Based on the ACME Medical Devices Manufacturing Team operating procedures, the Device History Record (DHR) requirements are:

**Mandatory Requirements:**
- Every device MUST have a complete DHR before release
- DHR must include: lot numbers, component traceability, test results, and operator IDs
- Missing DHR data = STOP SHIPMENT until resolved
- DHR falsification = immediate termination and FDA notification

These requirements fall under the Manufacturing Team's mandatory rules and are part of complianc


Ask a question about ACME Medical Devices (or 'quit' to exit):  Who is the VP of Regulatory Affairs?


STEP 1: User Question
  → Who is the VP of Regulatory Affairs?

STEP 2: Searching FAISS index for top 15 relevant chunks...
  → Retrieved 15 chunks (distances: ['102.9', '103.7', '104.5', '105.1', '106.0', '106.3', '107.6', '108.6', '109.8', '114.6', '114.8', '114.8', '115.0', '115.9', '116.3'])

STEP 3: Building prompt with retrieved context...
  → Prompt built with 16597 characters of context

STEP 4: Sending to Claude for conversational response...

STEP 5: Final Answer

Based on the documents provided, the VP of Regulatory Affairs is **Dr. Robert Kim** (contact: rkim@acmemeddevices.com).




Ask a question about ACME Medical Devices (or 'quit' to exit):  What are the labeling change requirements?


STEP 1: User Question
  → What are the labeling change requirements?

STEP 2: Searching FAISS index for top 15 relevant chunks...
  → Retrieved 15 chunks (distances: ['98.2', '101.7', '101.9', '109.7', '109.8', '109.9', '110.6', '111.6', '113.1', '114.3', '115.5', '117.1', '118.4', '118.9', '118.9'])

STEP 3: Building prompt with retrieved context...
  → Prompt built with 13527 characters of context

STEP 4: Sending to Claude for conversational response...

STEP 5: Final Answer

Based on the ACME Medical Devices documents, the labeling change requirements are:

**Mandatory Requirements:**
- **ANY labeling change requires RA (Regulatory Affairs) review before implementation**
- **Approver:** VP RA must approve all label changes
- **Form:** ACME-LBL-001 form must be used

**Key Considerations:**
- **IFU (Instructions for Use) changes may trigger new 510(k)** - always consult RA first
- **UDI compliance required for all Class II devices**

The documents emphasize that no labeling changes 


Ask a question about ACME Medical Devices (or 'quit' to exit):  What should I do if a customer threatens a lawsuit?


STEP 1: User Question
  → What should I do if a customer threatens a lawsuit?

STEP 2: Searching FAISS index for top 15 relevant chunks...
  → Retrieved 15 chunks (distances: ['109.7', '111.0', '111.5', '111.7', '112.8', '113.1', '114.2', '116.0', '118.2', '119.2', '120.0', '121.2', '122.0', '122.5', '122.5'])

STEP 3: Building prompt with retrieved context...
  → Prompt built with 15185 characters of context

STEP 4: Sending to Claude for conversational response...

STEP 5: Final Answer

Based on the ACME Medical Devices documents, if a customer threatens a lawsuit, you should:

1. **Remain calm and professional**
2. **Do NOT apologize or admit fault**
3. **Document verbatim** what the customer said
4. **Escalate immediately to Legal (Angela Martinez)**
5. **Do not contact the customer again without Legal guidance**

This procedure applies specifically when a customer mentions lawsuit, attorney, or legal action during any interaction.




Ask a question about ACME Medical Devices (or 'quit' to exit):  quit


Goodbye!
