# 🚀 Comprehensive RAG Demo - All Methods

## Overview

This notebook demonstrates **three different RAG approaches** in a progressive learning format:

1. **📚 Basic Text RAG**: Simple embedding with sample company data
2. **📄 Document RAG**: Processing real PDF and Word documents
3. **🖼️ Multi-Modal RAG**: Text + Images using GPT-4o Vision

Perfect for intern training and understanding RAG evolution!

### What You'll Learn:
- ✅ Core RAG concepts and implementation
- ✅ Document processing techniques
- ✅ Multi-modal AI capabilities
- ✅ Real-world applications
- ✅ Latest OpenAI Vision API usage

## 📦 Setup and Installation

In [None]:
# Install required packages
!pip install langchain langchain-openai langchain-chroma langchain-community
!pip install tiktoken chromadb pypdf docx2txt PyMuPDF
!pip install python-dotenv

In [None]:
# Import required libraries
import os
import sys
from pathlib import Path
import base64
from io import BytesIO

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.schema import Document
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.messages import HumanMessage

print("✅ All libraries imported successfully!")

In [None]:
# Load environment variables from .env file
def load_env_file():
    """Load environment variables from .env file"""
    code_snippets_path = Path("Code Snippets")
    env_file = code_snippets_path / ".env"
    
    if env_file.exists():
        with open(env_file, 'r') as f:
            for line in f:
                if '=' in line and not line.startswith('#'):
                    key, value = line.strip().split('=', 1)
                    value = value.strip("'\"")
                    os.environ[key] = value
        print("✅ Environment variables loaded from .env file")
        return True
    else:
        print("❌ .env file not found")
        return False

# Load the environment variables
load_env_file()

# Check if API key is available
if os.getenv("OPENAI_API_KEY"):
    print("✅ OpenAI API key loaded successfully!")
else:
    print("❌ OpenAI API key not found. Please check your .env file.")

## 🏗️ ComprehensiveRAGDemo Class Definition

In [None]:
class ComprehensiveRAGDemo:
    """
    Comprehensive RAG implementation with multiple approaches:
    1. Basic text RAG with sample data
    2. Document RAG with PDF/Word files
    3. Multi-modal RAG with text + images
    """
    
    def __init__(self):
        """Initialize comprehensive RAG system."""
        print("🚀 Initializing Comprehensive RAG Demo...")
        
        if not os.getenv("OPENAI_API_KEY"):
            raise ValueError("OPENAI_API_KEY not found")
        
        # Initialize models
        self.embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
        self.llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
        # Use the latest vision model
        self.vision_llm = ChatOpenAI(model="gpt-4o", max_tokens=1024)
        
        # Text splitter
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len,
            separators=["\n\n", "\n", " ", ""]
        )
        
        self.vectorstore = None
        self.qa_chain = None
        self.processed_content = []
        
        print("✅ Comprehensive RAG Demo initialized!")
        print("🔧 Supports: Basic Text, Document Processing, Multi-Modal")
    
    def query(self, question):
        """Ask a question using the current RAG setup."""
        if self.qa_chain is None:
            raise ValueError("No knowledge base created!")
        
        result = self.qa_chain.invoke({"query": question})
        
        return {
            "answer": result["result"],
            "source_documents": result["source_documents"]
        }

print("✅ ComprehensiveRAGDemo class defined!")

## 🔑 Initialize RAG System

In [None]:
# Initialize the comprehensive RAG demo
rag_demo = ComprehensiveRAGDemo()

print("🎯 RAG system ready with:")
print("   📊 OpenAI Embeddings (ada-002)")
print("   🤖 GPT-3.5-turbo for text generation")
print("   👁️ GPT-4o for vision analysis")

---

# 📚 SECTION 1: Basic Text RAG

Let's start with the fundamentals - basic text embedding and retrieval using sample company data.

### Learning Objectives:
- Understand document chunking
- See embedding creation
- Experience vector similarity search
- Learn answer generation with sources

In [None]:
# Add Basic Text RAG methods to our class
def demo_basic_text_rag(self):
    """Demonstrate basic RAG with sample text data."""
    print("\n" + "="*60)
    print("📚 SECTION 1: BASIC TEXT RAG DEMO")
    print("="*60)
    
    # Sample company documents
    sample_docs = [
        """
        Company Policy: Remote Work Guidelines
        
        Our company supports flexible remote work arrangements. Employees can work from home 
        up to 3 days per week with manager approval. Remote work days must be scheduled in advance.
        
        Equipment: The company provides laptops and necessary software for remote work.
        Communication: Daily check-ins via Slack are required for remote workers.
        Productivity: Remote workers must maintain the same productivity standards as office workers.
        """,
        
        """
        Employee Benefits Overview
        
        Health Insurance: Full medical, dental, and vision coverage provided.
        Vacation Policy: 20 days of paid vacation per year, plus 10 sick days.
        Professional Development: $2000 annual budget for training and conferences.
        Retirement: 401k with 4% company matching.
        Wellness: Free gym membership and mental health support.
        """,
        
        """
        IT Security Guidelines
        
        Password Requirements: Minimum 12 characters with special characters.
        VPN: Required for all remote connections to company systems.
        Software Updates: Automatic updates must be enabled on all devices.
        Data Protection: No company data on personal devices without encryption.
        Incident Reporting: Security incidents must be reported within 1 hour.
        """,
        
        """
        Meeting Room Booking System
        
        Conference rooms can be booked through the company portal.
        Maximum booking duration: 4 hours per session.
        Cancellation: Must cancel at least 2 hours in advance.
        Equipment: All rooms have projectors, whiteboards, and video conferencing.
        Catering: Can be arranged through HR for meetings over 2 hours.
        """
    ]
    
    print(f"📄 Processing {len(sample_docs)} sample documents...")
    
    # Convert to Document objects
    documents = [Document(page_content=doc) for doc in sample_docs]
    
    # Create knowledge base
    self._create_basic_knowledge_base(documents)
    
    # Demo questions
    demo_questions = [
        "How many days can I work from home?",
        "What's our vacation policy?",
        "What are the password requirements?",
        "How do I book a meeting room?"
    ]
    
    print("\n🎪 Basic RAG Demo Questions:")
    for i, question in enumerate(demo_questions, 1):
        print(f"\n{i}. {question}")
        result = self.query(question)
        print(f"💡 Answer: {result['answer']}")
        print(f"📖 Sources: {len(result['source_documents'])} chunks")
    
    print("\n✅ Basic Text RAG demonstration complete!")

def _create_basic_knowledge_base(self, documents):
    """Create knowledge base from basic documents."""
    # Split documents
    chunks = self.text_splitter.split_documents(documents)
    print(f"🔪 Created {len(chunks)} chunks")
    
    # Create vector store
    self.vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=self.embeddings,
        persist_directory=None
    )
    
    # Create QA chain
    self.qa_chain = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3}),
        return_source_documents=True
    )
    
    print("✅ Basic knowledge base created!")
    
    # Show embedding details for educational purposes
    self._show_embedding_details(chunks[:2])  # Show first 2 chunks

def _show_embedding_details(self, sample_chunks):
    """Show embedding details for educational purposes."""
    print("\n🔍 EMBEDDING DETAILS (Educational):")
    print("-" * 50)
    
    for i, chunk in enumerate(sample_chunks, 1):
        print(f"\n📄 Chunk {i}:")
        print(f"Text: {chunk.page_content[:100]}...")
        print(f"Length: {len(chunk.page_content)} characters")
        
        # Generate embedding for this chunk
        try:
            embedding = self.embeddings.embed_query(chunk.page_content)
            print(f"Embedding dimensions: {len(embedding)}")
            print(f"Embedding type: {type(embedding)}")
            print(f"First 10 values: {embedding[:10]}")
            print(f"Embedding range: [{min(embedding):.4f}, {max(embedding):.4f}]")
        except Exception as e:
            print(f"Error generating embedding: {e}")
    
    print("\n💡 Key Points:")
    print("   • Each text chunk becomes a 1536-dimensional vector")
    print("   • Similar texts have similar embeddings (cosine similarity)")
    print("   • Vector database enables fast similarity search")

def query_database_directly(self, query_text, k=3):
    """Query the vector database directly and show similarity scores."""
    if self.vectorstore is None:
        print("❌ No vector database available. Create knowledge base first.")
        return
    
    print(f"\n🔍 DIRECT DATABASE QUERY: '{query_text}'")
    print("-" * 60)
    
    # Get query embedding
    query_embedding = self.embeddings.embed_query(query_text)
    print(f"Query embedding dimensions: {len(query_embedding)}")
    print(f"Query embedding preview: {query_embedding[:5]}...")
    
    # Search with similarity scores
    try:
        # Use similarity_search_with_score for detailed results
        results = self.vectorstore.similarity_search_with_score(query_text, k=k)
        
        print(f"\n📊 Top {k} Similar Chunks:")
        for i, (doc, score) in enumerate(results, 1):
            print(f"\n{i}. Similarity Score: {score:.4f}")
            print(f"   Source: {doc.metadata.get('source', 'Unknown')}")
            print(f"   Type: {doc.metadata.get('type', 'Unknown')}")
            print(f"   Content: {doc.page_content[:150]}...")
        
        print("\n💡 Understanding Similarity Scores:")
        print("   • Lower scores = more similar (distance-based)")
        print("   • Scores typically range from 0.0 to 2.0")
        print("   • Score < 0.5: Very similar")
        print("   • Score 0.5-1.0: Moderately similar")
        print("   • Score > 1.0: Less similar")
        
        return results
        
    except Exception as e:
        print(f"❌ Error querying database: {e}")
        return None

def show_vector_database_stats(self):
    """Show statistics about the vector database."""
    if self.vectorstore is None:
        print("❌ No vector database available.")
        return
    
    print("\n📊 VECTOR DATABASE STATISTICS:")
    print("-" * 40)
    
    try:
        # Get collection info
        collection = self.vectorstore._collection
        count = collection.count()
        
        print(f"Total vectors stored: {count}")
        print(f"Embedding model: {self.embeddings.model}")
        print(f"Vector dimensions: 1536 (OpenAI ada-002)")
        print(f"Database type: ChromaDB (in-memory)")
        
        # Sample a few vectors to show distribution
        if count > 0:
            sample_docs = self.vectorstore.similarity_search("sample", k=min(3, count))
            print(f"\n📄 Sample stored chunks:")
            for i, doc in enumerate(sample_docs, 1):
                print(f"   {i}. {doc.page_content[:80]}...")
        
    except Exception as e:
        print(f"Error getting database stats: {e}")

# Add methods to our class
ComprehensiveRAGDemo.demo_basic_text_rag = demo_basic_text_rag
ComprehensiveRAGDemo._create_basic_knowledge_base = _create_basic_knowledge_base
ComprehensiveRAGDemo._show_embedding_details = _show_embedding_details
ComprehensiveRAGDemo.query_database_directly = query_database_directly
ComprehensiveRAGDemo.show_vector_database_stats = show_vector_database_stats

print("✅ Basic Text RAG methods added with embedding visualization!")

In [None]:
# Run Basic Text RAG Demo
rag_demo.demo_basic_text_rag()

### 🎓 Key Takeaways from Basic RAG:

1. **Document Chunking**: Text is split into manageable pieces
2. **Embeddings**: Each chunk becomes a vector representation
3. **Vector Search**: Find most similar chunks to user question
4. **Context Injection**: Relevant chunks are added to LLM prompt
5. **Answer Generation**: LLM creates response based on context

**Try your own questions about the company policies!**

In [None]:
# Try your own question with basic RAG
question = "What equipment does the company provide for remote work?"

result = rag_demo.query(question)
print(f"❓ Question: {question}")
print(f"💡 Answer: {result['answer']}")
print(f"📖 Sources: {len(result['source_documents'])} chunks used")

### 🔍 Deep Dive: Understanding Embeddings and Vector Search

Let's explore what's happening under the hood!

In [None]:
# Show vector database statistics
rag_demo.show_vector_database_stats()

In [None]:
# Query the database directly to see similarity scores
query_text = "remote work policy"
results = rag_demo.query_database_directly(query_text, k=3)

print("\n🎯 This shows you exactly how vector similarity search works!")
print("The RAG system uses these top chunks to generate the final answer.")

In [None]:
# Compare different queries and their similarity scores
test_queries = [
    "vacation days",
    "password security",
    "meeting room booking",
    "completely unrelated topic"
]

print("🧪 SIMILARITY COMPARISON TEST:")
print("=" * 50)

for query in test_queries:
    print(f"\n🔍 Query: '{query}'")
    results = rag_demo.query_database_directly(query, k=1)
    if results:
        best_score = results[0][1]
        print(f"   Best similarity score: {best_score:.4f}")
        if best_score < 0.5:
            print("   ✅ Excellent match found!")
        elif best_score < 1.0:
            print("   ⚠️ Moderate match found")
        else:
            print("   ❌ Poor match - might hallucinate")

### 🎓 Embedding Visualization Exercise

Let's see how different texts create different embeddings:

In [None]:
# Compare embeddings of similar vs different texts
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Test texts
texts = [
    "Employees can work from home",
    "Remote work is allowed for staff",  # Similar meaning
    "The sky is blue today",  # Completely different
    "Password must be 12 characters"  # Different topic
]

print("🧮 EMBEDDING SIMILARITY ANALYSIS:")
print("=" * 50)

# Generate embeddings
embeddings = []
for text in texts:
    embedding = rag_demo.embeddings.embed_query(text)
    embeddings.append(embedding)
    print(f"\n📝 Text: '{text}'")
    print(f"   Embedding dimensions: {len(embedding)}")
    print(f"   First 5 values: {embedding[:5]}")

# Calculate cosine similarities
print("\n🔗 COSINE SIMILARITY MATRIX:")
print("(1.0 = identical, 0.0 = completely different)")
print("-" * 50)

similarity_matrix = cosine_similarity(embeddings)

for i, text1 in enumerate(texts):
    for j, text2 in enumerate(texts):
        if i <= j:  # Only show upper triangle
            similarity = similarity_matrix[i][j]
            print(f"'{text1[:20]}...' vs '{text2[:20]}...': {similarity:.3f}")

print("\n💡 Notice how similar meanings have higher cosine similarity!")
print("This is the foundation of semantic search in RAG systems.")

---

# 📄 SECTION 2: Document RAG

Now let's process real documents from your EmbeddingDocs folder!

### What's Different:
- Real PDF and Word document processing
- Automatic text extraction
- Metadata preservation
- Larger, more complex knowledge base

In [None]:
# Add Document RAG methods to our class
def demo_document_rag(self):
    """Demonstrate document RAG with PDF and Word files."""
    print("\n" + "="*60)
    print("📄 SECTION 2: DOCUMENT RAG DEMO")
    print("="*60)
    
    # Load documents from folder
    documents = self.load_documents_from_folder()
    
    if not documents:
        print("❌ No documents found in EmbeddingDocs folder")
        return
    
    # Create knowledge base
    self.create_document_knowledge_base(documents)
    
    # Demo questions based on loaded documents
    print("\n🎪 Document RAG Demo Questions:")
    
    # Check what documents we have and ask relevant questions
    doc_names = [doc.metadata.get('source', '') for doc in documents]
    
    if any('attention' in name.lower() for name in doc_names):
        print("\n📄 Questions about 'Attention is All You Need' paper:")
        questions = [
            "What is the Transformer architecture?",
            "How does self-attention work?",
            "What are the advantages of Transformers over RNNs?"
        ]
        for q in questions:
            result = self.query(q)
            print(f"❓ {q}")
            print(f"💡 {result['answer'][:200]}...")
            print()
    
    if any('boomi' in name.lower() for name in doc_names):
        print("\n📄 Questions about Boomi document:")
        questions = [
            "What is Boomi used for?",
            "How does Boomi integration work?"
        ]
        for q in questions:
            result = self.query(q)
            print(f"❓ {q}")
            print(f"💡 {result['answer'][:200]}...")
            print()
    
    print("✅ Document RAG demonstration complete!")

def load_pdf_document(self, file_path):
    """Load a PDF document (text only)."""
    try:
        from langchain_community.document_loaders import PyPDFLoader
        
        print(f"📄 Loading PDF: {file_path.name}")
        loader = PyPDFLoader(str(file_path))
        documents = loader.load()
        
        for doc in documents:
            doc.metadata.update({
                "source": file_path.name,
                "type": "PDF_Text",
                "content_type": "text"
            })
        
        print(f"   ✅ Loaded {len(documents)} pages")
        return documents
        
    except Exception as e:
        print(f"❌ Error loading PDF: {e}")
        return []

def load_word_document(self, file_path):
    """Load a Word document."""
    try:
        from langchain_community.document_loaders import Docx2txtLoader
        
        print(f"📄 Loading Word document: {file_path.name}")
        loader = Docx2txtLoader(str(file_path))
        documents = loader.load()
        
        for doc in documents:
            doc.metadata.update({
                "source": file_path.name,
                "type": "Word_Text",
                "content_type": "text"
            })
        
        print(f"   ✅ Loaded Word document")
        return documents
        
    except Exception as e:
        print(f"❌ Error loading Word document: {e}")
        return []

def load_documents_from_folder(self, folder_path="EmbeddingDocs"):
    """Load all documents from the specified folder."""
    folder = Path(folder_path)
    
    if not folder.exists():
        print(f"❌ Folder {folder_path} not found")
        return []
    
    print(f"📁 Loading documents from: {folder}")
    all_documents = []
    
    # Find files
    pdf_files = list(folder.glob("*.pdf"))
    word_files = list(folder.glob("*.docx")) + list(folder.glob("*.doc"))
    
    print(f"   Found {len(pdf_files)} PDF files and {len(word_files)} Word files")
    
    # Process PDF files
    for pdf_file in pdf_files:
        docs = self.load_pdf_document(pdf_file)
        all_documents.extend(docs)
    
    # Process Word files
    for word_file in word_files:
        docs = self.load_word_document(word_file)
        all_documents.extend(docs)
    
    self.processed_content = all_documents
    print(f"📚 Total documents loaded: {len(all_documents)}")
    
    return all_documents

def create_document_knowledge_base(self, documents):
    """Create knowledge base from documents."""
    print(f"🔪 Processing {len(documents)} documents...")
    
    # Split documents
    chunks = self.text_splitter.split_documents(documents)
    print(f"📄 Created {len(chunks)} chunks")
    
    # Create vector store
    self.vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=self.embeddings,
        persist_directory=None
    )
    
    # Create QA chain
    self.qa_chain = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.vectorstore.as_retriever(search_kwargs={"k": 5}),
        return_source_documents=True
    )
    
    print("✅ Document knowledge base created!")

# Add methods to our class
ComprehensiveRAGDemo.demo_document_rag = demo_document_rag
ComprehensiveRAGDemo.load_pdf_document = load_pdf_document
ComprehensiveRAGDemo.load_word_document = load_word_document
ComprehensiveRAGDemo.load_documents_from_folder = load_documents_from_folder
ComprehensiveRAGDemo.create_document_knowledge_base = create_document_knowledge_base

print("✅ Document RAG methods added!")

In [None]:
# Run Document RAG Demo
rag_demo.demo_document_rag()

### 🎓 Document RAG Insights:

1. **File Format Support**: PDFs, Word docs automatically processed
2. **Text Extraction**: Handles complex document layouts
3. **Metadata Tracking**: Source attribution for answers
4. **Scalability**: Can handle hundreds of documents
5. **Real-World Ready**: Production-ready document processing

**Ask questions about your actual documents!**

In [None]:
# Interactive questions about your documents
questions = [
    "What is the main contribution of the Transformer architecture?",
    "How does attention mechanism work?",
    "What are the key components of Boomi integration?"
]

for q in questions:
    try:
        result = rag_demo.query(q)
        print(f"\n❓ {q}")
        print(f"💡 {result['answer'][:300]}...")
        print(f"📖 Sources: {len(result['source_documents'])} document chunks")
    except Exception as e:
        print(f"❌ Error: {e}")

---

# 🖼️ SECTION 3: Multi-Modal RAG (Text + Images)

The cutting-edge approach - understanding both text AND images!

### Revolutionary Capabilities:
- 🖼️ Extract images from PDFs
- 👁️ AI-powered image description using GPT-4o
- 🧠 Combined text + visual understanding
- 📊 Analyze charts, diagrams, and figures
- 🔍 Search across both content types

In [None]:
# Add Multi-Modal RAG methods to our class
def demo_multimodal_rag(self):
    """Demonstrate multi-modal RAG with images."""
    print("\n" + "="*60)
    print("🖼️ SECTION 3: MULTI-MODAL RAG DEMO")
    print("="*60)
    
    # Load documents with images
    documents = self.load_documents_with_images()
    
    if not documents:
        print("❌ No documents found in EmbeddingDocs folder")
        return
    
    # Create multi-modal knowledge base
    self.create_multimodal_knowledge_base(documents)
    
    # Demo questions focusing on visual content
    print("\n🎪 Multi-Modal RAG Demo Questions:")
    
    visual_questions = [
        "What diagrams or figures are shown in the documents?",
        "Describe any architectural illustrations or charts",
        "What visual elements help explain the concepts?",
        "Are there any mathematical formulas or equations shown?"
    ]
    
    for question in visual_questions:
        try:
            result = self.query(question)
            print(f"\n❓ {question}")
            print(f"💡 Answer: {result['answer']}")
            
            # Show if any image sources were used
            sources = result['source_documents']
            image_sources = [s for s in sources if s.metadata.get('type', '').endswith('_Image')]
            if image_sources:
                print(f"🖼️ Used {len(image_sources)} image descriptions in answer")
            
        except Exception as e:
            print(f"❌ Error: {e}")
    
    print("\n✅ Multi-Modal RAG demonstration complete!")

def extract_images_from_pdf(self, pdf_path):
    """Extract images from PDF and convert to base64."""
    try:
        import fitz  # PyMuPDF
        
        print(f"🖼️  Extracting images from: {pdf_path.name}")
        doc = fitz.open(pdf_path)
        images = []
        
        for page_num in range(len(doc)):
            page = doc.load_page(page_num)
            image_list = page.get_images()
            
            for img_index, img in enumerate(image_list):
                try:
                    # Get image data
                    xref = img[0]
                    pix = fitz.Pixmap(doc, xref)
                    
                    if pix.n - pix.alpha < 4:  # GRAY or RGB
                        # Convert to PNG bytes
                        img_data = pix.tobytes("png")
                        
                        # Convert to base64
                        img_base64 = base64.b64encode(img_data).decode()
                        
                        images.append({
                            "base64": img_base64,
                            "page": page_num + 1,
                            "index": img_index,
                            "source": pdf_path.name
                        })
                    
                    pix = None  # Free memory
                    
                except Exception as e:
                    print(f"   ⚠️  Error extracting image {img_index} from page {page_num + 1}: {e}")
                    continue
        
        doc.close()
        print(f"   ✅ Extracted {len(images)} images")
        return images
        
    except ImportError:
        print("❌ PyMuPDF not installed. Install with: pip install PyMuPDF")
        return []
    except Exception as e:
        print(f"❌ Error extracting images from {pdf_path.name}: {e}")
        return []

def describe_image(self, image_base64, source_info):
    """Generate description of image using GPT-4o Vision."""
    try:
        print(f"   🔍 Analyzing image from {source_info}...")
        
        # Use the latest OpenAI format for vision
        message = HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": "Describe this image in detail. Focus on any text, diagrams, charts, figures, tables, or important visual elements that might be relevant for answering questions about the document. Include any mathematical formulas, architectural diagrams, or technical illustrations you see."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_base64}",
                        "detail": "high"  # Use high detail for better analysis
                    }
                }
            ]
        )
        
        response = self.vision_llm.invoke([message])
        description = response.content
        
        print(f"   ✅ Generated description ({len(description)} chars)")
        return description
        
    except Exception as e:
        print(f"   ❌ Error describing image: {e}")
        return f"Image from {source_info} (description failed)"

# Add methods to our class
ComprehensiveRAGDemo.demo_multimodal_rag = demo_multimodal_rag
ComprehensiveRAGDemo.extract_images_from_pdf = extract_images_from_pdf
ComprehensiveRAGDemo.describe_image = describe_image

print("✅ Multi-Modal RAG methods added!")

In [None]:
# Add the remaining multi-modal methods
def load_pdf_with_images(self, file_path):
    """Load PDF with both text and images."""
    documents = []
    
    # Load text content
    try:
        from langchain_community.document_loaders import PyPDFLoader
        
        print(f"📄 Loading PDF text: {file_path.name}")
        loader = PyPDFLoader(str(file_path))
        text_docs = loader.load()
        
        for doc in text_docs:
            doc.metadata.update({
                "source": file_path.name,
                "type": "PDF_Text",
                "content_type": "text"
            })
        
        documents.extend(text_docs)
        print(f"   ✅ Loaded {len(text_docs)} text pages")
        
    except Exception as e:
        print(f"❌ Error loading PDF text: {e}")
    
    # Extract and process images
    images = self.extract_images_from_pdf(file_path)
    
    for img in images:
        # Generate description
        description = self.describe_image(
            img["base64"], 
            f"{img['source']} (page {img['page']})"
        )
        
        # Create document for image description
        img_doc = Document(
            page_content=f"Image from page {img['page']}: {description}",
            metadata={
                "source": file_path.name,
                "type": "PDF_Image",
                "content_type": "image",
                "page": img["page"],
                "image_base64": img["base64"]  # Store for potential display
            }
        )
        
        documents.append(img_doc)
    
    return documents

def load_documents_with_images(self, folder_path="EmbeddingDocs"):
    """Load all documents with multi-modal processing."""
    folder = Path(folder_path)
    
    if not folder.exists():
        print(f"❌ Folder {folder_path} not found")
        return []
    
    print(f"📁 Loading documents with multi-modal processing from: {folder}")
    all_documents = []
    
    # Find files
    pdf_files = list(folder.glob("*.pdf"))
    word_files = list(folder.glob("*.docx")) + list(folder.glob("*.doc"))
    
    print(f"   Found {len(pdf_files)} PDF files and {len(word_files)} Word files")
    
    # Process PDF files with images
    for pdf_file in pdf_files:
        docs = self.load_pdf_with_images(pdf_file)
        all_documents.extend(docs)
    
    # Process Word files
    for word_file in word_files:
        docs = self.load_word_document(word_file)
        all_documents.extend(docs)
    
    self.processed_content = all_documents
    print(f"📚 Total content pieces loaded: {len(all_documents)}")
    
    # Show content breakdown
    content_types = {}
    for doc in all_documents:
        content_type = doc.metadata.get('type', 'Unknown')
        content_types[content_type] = content_types.get(content_type, 0) + 1
    
    print("📊 Content breakdown:")
    for content_type, count in content_types.items():
        print(f"   {content_type}: {count}")
    
    return all_documents

def create_multimodal_knowledge_base(self, documents=None):
    """Create vector store from multi-modal documents."""
    if documents is None:
        documents = self.processed_content
    
    if not documents:
        print("❌ No documents to process")
        return
    
    print(f"🔪 Processing {len(documents)} content pieces...")
    
    # Split documents into chunks
    chunks = self.text_splitter.split_documents(documents)
    print(f"📄 Created {len(chunks)} chunks")
    
    # Create vector store
    print("🧮 Creating embeddings for multi-modal content...")
    self.vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=self.embeddings,
        persist_directory=None
    )
    
    # Create QA chain
    self.qa_chain = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.vectorstore.as_retriever(search_kwargs={"k": 5}),
        return_source_documents=True
    )
    
    print("✅ Multi-modal knowledge base created!")

# Add remaining methods to our class
ComprehensiveRAGDemo.load_pdf_with_images = load_pdf_with_images
ComprehensiveRAGDemo.load_documents_with_images = load_documents_with_images
ComprehensiveRAGDemo.create_multimodal_knowledge_base = create_multimodal_knowledge_base

print("✅ All Multi-Modal RAG methods added!")

In [None]:
# Run Multi-Modal RAG Demo
rag_demo.demo_multimodal_rag()

### 🎓 Multi-Modal RAG Breakthroughs:

1. **Image Extraction**: Automatically finds images in PDFs
2. **Vision AI**: GPT-4o describes visual content in detail
3. **Unified Search**: Text and image descriptions in same vector space
4. **Visual Understanding**: Can answer questions about diagrams, charts
5. **Future-Ready**: Cutting-edge AI capabilities

**Ask about visual elements in your documents!**

In [None]:
# Test multi-modal capabilities
visual_questions = [
    "What diagrams are shown in the Transformer paper?",
    "Describe the architecture illustrations",
    "What visual elements help explain the concepts?",
    "Are there any mathematical formulas shown in images?"
]

for q in visual_questions:
    try:
        result = rag_demo.query(q)
        print(f"\n❓ {q}")
        print(f"💡 {result['answer'][:250]}...")
        
        # Check if image sources were used
        sources = result['source_documents']
        image_sources = [s for s in sources if 'Image' in s.metadata.get('type', '')]
        if image_sources:
            print(f"🖼️ Used {len(image_sources)} image descriptions!")
        
    except Exception as e:
        print(f"❌ Error: {e}")

---

# 🎯 Comparison and Summary

## RAG Evolution Comparison:

| Feature | Basic Text RAG | Document RAG | Multi-Modal RAG |
|---------|---------------|--------------|----------------|
| **Data Source** | Sample text | PDF/Word files | Text + Images |
| **Processing** | Simple chunking | Document parsing | Vision AI analysis |
| **Understanding** | Text only | Text only | Text + Visual |
| **Use Cases** | Simple Q&A | Document search | Complex analysis |
| **Accuracy** | Good | Better | Best |
| **Complexity** | Low | Medium | High |

## 🎓 Learning Outcomes:

✅ **Understood RAG fundamentals** - retrieval + generation  
✅ **Experienced document processing** - real-world file handling  
✅ **Explored cutting-edge AI** - multi-modal capabilities  
✅ **Saw practical applications** - company knowledge bases  
✅ **Learned latest techniques** - GPT-4o vision integration  

## 🚀 Next Steps for Interns:

1. **Experiment** with different document types
2. **Try different chunk sizes** and see the impact
3. **Add metadata filtering** for more precise search
4. **Build a web interface** using Streamlit or Flask
5. **Implement evaluation metrics** to measure quality
6. **Explore other embedding models** and compare results

## 💡 Real-World Applications:

- 🏢 **Internal Knowledge Bases**: Company policies, procedures
- 📞 **Customer Support**: Product documentation, troubleshooting
- 📚 **Research Assistance**: Academic papers, technical reports
- 🔧 **Developer Tools**: Code documentation, API references
- 🎓 **Educational Platforms**: Course materials, study guides
- 🏥 **Healthcare**: Medical literature, diagnostic aids
- ⚖️ **Legal**: Case law, contract analysis

**Congratulations! You've mastered the complete RAG spectrum!** 🎉