HANDS-ON RAG (Retrieval-Augmented Generation) WORKSHOP

13 Oct 2025
Ramaih University of Applied Sciences
Instructor: Naganathan Muthuramalingam., PhD Scholar - School of Social Sciences

This script demonstrates a complete end-to-end RAG system implementation.

WHAT YOU'LL LEARN:
1. Document Loading and Processing
2. Text Chunking Strategies
3. Vector Embeddings and Storage
4. Retrieval Mechanisms
5. LLM Integration
6. Answer Validation and Grounding

WORKSHOP STRUCTURE:
- Part 0: Environment Setup and Library Version Check
- Part 1: Imports and Document Discovery
- Part 2: Document Loading and Text Chunking
- Part 3: Vector Embeddings & Knowledge Base Creation
- Part 4: Retrieval Configuration
- Part 5: Language Model Setup
- Part 6: Prompt Engineering for Grounding
- Part 7: RAG Chain Assembly
- Part 8: Answer Validation System
- Part 9: Hands-on Testing

SYSTEM REQUIREMENTS:
- Minimum 8GB RAM (16GB recommended for better performance)
- At least 20GB free disk space for models and vector databases
- Python 3.8+ installed
- Stable internet connection for initial model downloads
- Ollama installed (https://ollama.ai/)
- phi3:mini model downloaded via: ollama pull phi3:mini

INSTALLATION STEPS:
1. Install Python 3.8+
2. Install Ollama from https://ollama.ai/
3. Run: ollama pull phi3:mini
4. Install required Python packages (see Part 0 below)
5. Create 'data' folder and add PDF documents

PREREQUISITES:
- Basic Python knowledge
- Understanding of machine learning concepts
- Familiarity with NLP basics

In [1]:
# ========================================================================
# PART 0: ENVIRONMENT SETUP AND LIBRARY VERSION CHECK
# ========================================================================
# LEARNING OBJECTIVE: Verify environment setup and library compatibility

def check_library_versions():
    """
    WORKSHOP FUNCTION: Environment Verification
    
    PURPOSE: Check installed library versions for compatibility
    This helps ensure all students have the same environment setup
    """
    print("="*60)
    print("üîß WORKSHOP ENVIRONMENT CHECK")
    print("="*60)
    
    required_libraries = {
        'langchain': '0.3.27',
        'langchain_community': '0.3.29',
        'chromadb': '1.0.20',
        'pypdf': '6.0.0',
        'numpy': '6.0.0',
        'pathlib': 'built-in',
        'os': 'built-in',
        'sys': 'built-in'
    }
    
    print("üìã Checking required libraries and versions:")
    print("-" * 50)
    
    missing_libraries = []
    version_mismatches = []
    
    for library, min_version in required_libraries.items():
        try:
            if library in ['pathlib', 'os', 'sys']:
                print(f"‚úÖ {library}: {min_version}")
                continue
                
            if library == 'langchain':
                import langchain
                version = langchain.__version__
            elif library == 'langchain_community':
                import langchain_community
                version = getattr(langchain_community, '__version__', 'unknown')
            elif library == 'chromadb':
                import chromadb
                version = chromadb.__version__
            elif library == 'pypdf':
                import pypdf
                version = pypdf._version.__version__
            elif library == 'numpy':
                import numpy
                version = numpy.__version__
            
            print(f"‚úÖ {library}: {version}")
            
        except ImportError:
            print(f"‚ùå {library}: NOT INSTALLED")
            missing_libraries.append(library)
        except Exception as e:
            print(f"‚ö†Ô∏è  {library}: Error checking version - {e}")
    
    # Check Ollama availability (external dependency)
    print("\nü§ñ Checking Ollama setup:")
    print("-" * 30)
    try:
        import subprocess
        result = subprocess.run(['ollama', 'list'], capture_output=True, text=True, timeout=10)
        if result.returncode == 0:
            if 'phi3:mini' in result.stdout:
                print("‚úÖ Ollama: Installed and phi3:mini model available")
            else:
                print("‚ö†Ô∏è  Ollama: Installed but phi3:mini model missing")
                print("   Run: ollama pull phi3:mini")
        else:
            print("‚ùå Ollama: Not properly configured")
    except FileNotFoundError:
        print("‚ùå Ollama: Not installed")
        print("   Install from: https://ollama.ai/")
    except subprocess.TimeoutExpired:
        print("‚ö†Ô∏è  Ollama: Connection timeout - check if service is running")
    except Exception as e:
        print(f"‚ö†Ô∏è  Ollama: Error checking - {e}")
    
    # Summary and installation commands
    if missing_libraries:
        print(f"\n‚ùå MISSING LIBRARIES: {', '.join(missing_libraries)}")
        print("\nüì¶ EXACT INSTALLATION COMMANDS (Workshop Tested Versions):")
        print("pip install langchain==0.3.27")
        print("pip install langchain-community==0.3.29")
        print("pip install chromadb==1.0.20")
        print("pip install pypdf==6.0.0")
        print("pip install numpy==6.0.0")
        print("\nRun these commands and restart the workshop.")
        return False
    else:
        print("\n‚úÖ ALL LIBRARIES INSTALLED!")
        print("üöÄ Ready to proceed with the workshop!")
        return True

# Run environment check
environment_ready = check_library_versions()

if not environment_ready:
    print("\n‚ö†Ô∏è  PLEASE INSTALL MISSING LIBRARIES BEFORE CONTINUING")
    print("Uncomment the sys.exit() line below if you want to stop here")
    # sys.exit(1)  # Students can uncomment this to stop execution

üîß WORKSHOP ENVIRONMENT CHECK
üìã Checking required libraries and versions:
--------------------------------------------------
‚úÖ langchain: 0.3.27
‚úÖ langchain_community: 0.3.29
‚úÖ langchain: 0.3.27
‚úÖ langchain_community: 0.3.29
‚úÖ chromadb: 1.0.20
‚úÖ pypdf: 6.0.0
‚úÖ numpy: 1.26.4
‚úÖ pathlib: built-in
‚úÖ os: built-in
‚úÖ sys: built-in

ü§ñ Checking Ollama setup:
------------------------------
‚úÖ chromadb: 1.0.20
‚úÖ pypdf: 6.0.0
‚úÖ numpy: 1.26.4
‚úÖ pathlib: built-in
‚úÖ os: built-in
‚úÖ sys: built-in

ü§ñ Checking Ollama setup:
------------------------------
‚úÖ Ollama: Installed and phi3:mini model available

‚úÖ ALL LIBRARIES INSTALLED!
üöÄ Ready to proceed with the workshop!
‚úÖ Ollama: Installed and phi3:mini model available

‚úÖ ALL LIBRARIES INSTALLED!
üöÄ Ready to proceed with the workshop!


In [2]:
# ========================================================================
# PART 1: IMPORTS AND SETUP
# ========================================================================
# Standard library imports - Python's built-in modules
import os
import sys
from pathlib import Path

# LangChain Document Loaders & Processing - For handling different document types
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Vector Store and Embeddings - For semantic search capabilities
from langchain_community.vectorstores import Chroma

# Local LLM via Ollama - For running language models locally
from langchain_community.llms import Ollama

# RAG Chain - For combining retrieval and generation
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

In [3]:
# ========================================================================
# WORKSHOP ACTIVITY 1: DOCUMENT DISCOVERY
# ========================================================================
# LEARNING OBJECTIVE: Understand how to locate and validate data sources

# Define the path to your PDF directory
# TODO for students: Create a 'data' folder and add your PDF documents

data_dir = "./data"

# Find all PDF files in the directory recursively
# This uses Path.rglob() to search through all subdirectories

pdf_files = [str(p) for p in Path(data_dir).rglob("*.pdf") if p.is_file()]

# Validation: Always check if your data exists before processing
if not pdf_files:
    print(f"No PDFs found in {data_dir}. Please add your PDFs and update the `data_dir` variable.")
    print("WORKSHOP TIP: Create the './data' folder and add at least one PDF document")
else:
    print(f"‚úÖ Found {len(pdf_files)} PDF(s):")
    for f in pdf_files:
        print(f" - {f}")


‚úÖ Found 5 PDF(s):
 - data/The Collapsing Universe.pdf
 - data/A brief history of time.pdf
 - data/Introduction to Black Hole Astrophysics 2014.pdf
 - data/The Universe in a Nutshell - Stephen Hawking (2001).pdf
 - data/Greene The Elegant Universe.pdf


In [4]:
# ========================================================================
# WORKSHOP ACTIVITY 2: DOCUMENT LOADING AND PREPROCESSING
# ========================================================================
# LEARNING OBJECTIVE: Transform unstructured documents into structured data


print("\n" + "="*50)
print("PART 2: DOCUMENT LOADING & TEXT CHUNKING")
print("="*50)

# Initialize document storage
documents = []

# Process each PDF file
for file_path in pdf_files:
    try:
        print(f"\nüìÑ Processing: {os.path.basename(file_path)}")
        
        # PyPDFLoader: Specialized for PDF documents
        # WORKSHOP NOTE: Different loaders exist for different file types
        # (TextLoader, CSVLoader, JSONLoader, etc.)
        loader = PyPDFLoader(file_path)
        
        # Load documents - each page becomes a separate document
        docs = loader.load()
        
        # Add source metadata for traceability
        # WORKSHOP TIP: Metadata is crucial for citation and verification
        for doc in docs:
            doc.metadata["source"] = os.path.basename(file_path)
            
        documents.extend(docs)
        print(f"‚úÖ Loaded {len(docs)} pages from {os.path.basename(file_path)}")
        
    except Exception as e:
        print(f"‚ùå Error loading {file_path}: {e}")
        print("WORKSHOP TIP: Check file permissions and format compatibility")

print(f"\nüìä SUMMARY: Total pages loaded: {len(documents)}")



PART 2: DOCUMENT LOADING & TEXT CHUNKING

üìÑ Processing: The Collapsing Universe.pdf
‚úÖ Loaded 255 pages from The Collapsing Universe.pdf

üìÑ Processing: A brief history of time.pdf
‚úÖ Loaded 255 pages from The Collapsing Universe.pdf

üìÑ Processing: A brief history of time.pdf
‚úÖ Loaded 101 pages from A brief history of time.pdf

üìÑ Processing: Introduction to Black Hole Astrophysics 2014.pdf
‚úÖ Loaded 101 pages from A brief history of time.pdf

üìÑ Processing: Introduction to Black Hole Astrophysics 2014.pdf
‚úÖ Loaded 326 pages from Introduction to Black Hole Astrophysics 2014.pdf

üìÑ Processing: The Universe in a Nutshell - Stephen Hawking (2001).pdf
‚úÖ Loaded 326 pages from Introduction to Black Hole Astrophysics 2014.pdf

üìÑ Processing: The Universe in a Nutshell - Stephen Hawking (2001).pdf
‚úÖ Loaded 219 pages from The Universe in a Nutshell - Stephen Hawking (2001).pdf

üìÑ Processing: Greene The Elegant Universe.pdf
‚úÖ Loaded 219 pages from The Universe i

In [5]:
# ========================================================================
# WORKSHOP ACTIVITY 3: TEXT CHUNKING STRATEGY
# ========================================================================
# LEARNING OBJECTIVE: Understand why and how to split text optimally

print("\n" + "="*50)
print("PART 3: TEXT CHUNKING")
print("="*50)

# CONCEPT: Why do we chunk text?
# 1. LLMs have context length limitations
# 2. Smaller chunks = more precise retrieval
# 3. Better semantic matching
# 4. Improved processing speed

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600,      # WORKSHOP EXPERIMENT: Try different sizes (400, 800, 1200)
    chunk_overlap=130,   # WORKSHOP EXPERIMENT: Try different overlaps (0, 100, 200)
    separators=["\n\n", "\n", ". ", "! ", "? ", " ", ""]  # Hierarchical splitting
)

print("üîß Chunking Configuration:")
print(f"   - Chunk size: {text_splitter._chunk_size} characters")
print(f"   - Overlap: {text_splitter._chunk_overlap} characters")
print(f"   - Separators: {text_splitter._separators}")

# Split documents into chunks
texts = text_splitter.split_documents(documents)


# Add better metadata to each chunk
for i, text in enumerate(texts):
    text.metadata["chunk_id"] = i
    text.metadata["chunk_length"] = len(text.page_content)
    # Add first few words as preview
    text.metadata["preview"] = text.page_content[:50].replace("\n", " ")

# Validation
if not texts:
    print("‚ùå No text chunks created. Check your documents.")
    sys.exit(0)

print(f"‚úÖ Successfully split into {len(texts)} text chunks")

# WORKSHOP ACTIVITY: Examine chunk examples
print(f"\nüìù SAMPLE CHUNK (ID: 0):")
if texts:
    sample_chunk = texts[0]
    print(f"   Source: {sample_chunk.metadata.get('source', 'Unknown')}")
    print(f"   Length: {sample_chunk.metadata.get('chunk_length', 0)} characters")
    print(f"   Preview: {sample_chunk.metadata.get('preview', 'N/A')}...")


PART 3: TEXT CHUNKING
üîß Chunking Configuration:
   - Chunk size: 600 characters
   - Overlap: 130 characters
   - Separators: ['\n\n', '\n', '. ', '! ', '? ', ' ', '']
‚úÖ Successfully split into 5905 text chunks

üìù SAMPLE CHUNK (ID: 0):
   Source: The Collapsing Universe.pdf
   Length: 598 characters
   Preview: "PENETRATING" -SCIENCE  DIGEST "The  future  of  t...
‚úÖ Successfully split into 5905 text chunks

üìù SAMPLE CHUNK (ID: 0):
   Source: The Collapsing Universe.pdf
   Length: 598 characters
   Preview: "PENETRATING" -SCIENCE  DIGEST "The  future  of  t...


In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.schema import Document  # Import Document class

# Define the texts to embed
texts = [
    "Clinical trials are essential for medical advancements.",
    "Semantic search improves information retrieval.",
    "Embeddings represent text in a mathematical form."
]

# Convert texts to Document objects
documents = [Document(page_content=text) for text in texts]

# Initialize the embedding model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create vector database
vectorstore = Chroma.from_documents(
    documents=documents,  # Pass the list of Document objects
    embedding=embeddings,
    persist_directory="./chroma_clinicaltrial_db"
)

# Configure the retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 3,
        "fetch_k": 6,
        "lambda_mult": 0.8
    }
)

print("üîç Retrieval Configuration:")
print(f"   - Strategy: MMR (Maximum Marginal Relevance)")
print(f"   - Documents returned: 3")
print(f"   - Initial candidates: 8")
print(f"   - Relevance vs Diversity balance: 0.8")

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


üîç Retrieval Configuration:
   - Strategy: MMR (Maximum Marginal Relevance)
   - Documents returned: 3
   - Initial candidates: 8
   - Relevance vs Diversity balance: 0.8


In [7]:
# ========================================================================
# WORKSHOP ACTIVITY 6: LLM INTEGRATION
# ========================================================================
# LEARNING OBJECTIVE: Connect local language model for generation

print("\n" + "="*50)
print("PART 6: LANGUAGE MODEL SETUP")
print("="*50)

# PREREQUISITE: Install Ollama and pull a model
print("üìã PREREQUISITE CHECK:")
print("   1. Install Ollama: https://ollama.ai/")
print("   2. Run: ollama pull phi3:mini")
print("   3. Verify: ollama list")


try:
    llm = Ollama(
        model="phi3:mini",    # WORKSHOP NOTE: Lightweight model for laptops
        temperature=0.2,      # Low temperature = more deterministic responses
        num_thread=2,         # Adjust based on your CPU cores
    )
    
    # Test LLM connection
    print("\nüß™ Testing LLM connection...")
    test_response = llm.invoke("What is 2+2?")
    print(f"‚úÖ LLM Response: {test_response}")
    print("‚úÖ Language model initialized successfully!")
    
except Exception as e:
    print(f"‚ùå LLM Connection Failed: {e}")
    print("WORKSHOP TIP: Ensure Ollama is running and phi3:mini is installed")
    # TODO: Add fallback or alternative model suggestion



PART 6: LANGUAGE MODEL SETUP
üìã PREREQUISITE CHECK:
   1. Install Ollama: https://ollama.ai/
   2. Run: ollama pull phi3:mini
   3. Verify: ollama list

üß™ Testing LLM connection...


  llm = Ollama(


‚úÖ LLM Response: The sum of 2 and 2 is 4.
‚úÖ Language model initialized successfully!


In [8]:
# ========================================================================
# WORKSHOP ACTIVITY 7: PROMPT ENGINEERING
# ========================================================================
# LEARNING OBJECTIVE: Design prompts that enforce grounding

print("\n" + "="*50)
print("PART 7: PROMPT ENGINEERING FOR GROUNDING")
print("="*50)

# CONCEPT: Prompt engineering for RAG
# - Explicit instructions prevent hallucination
# - Structure ensures consistent output format
# - Citations enable verification

# Enhanced prompt template for better factual retrieval

prompt_template = """
You are a precise document analyst. Your task is to answer questions STRICTLY based on the provided context.

CRITICAL INSTRUCTIONS:
1. ONLY use information explicitly stated in the context below
2. If the context doesn't contain the answer, respond: "The provided documents do not contain information to answer this question."
3. Always cite which document/source your answer comes from
4. Do not make inferences beyond what is directly stated
5. If multiple sources contradict each other, mention the contradiction
6. Use exact quotes when possible, enclosed in quotation marks
7. For factual questions (like currency, population, etc.), scan ALL context carefully


Context Documents:
{context}

Question: {question}
Requirements for your answer:
- Start with the most relevant source
- Use direct quotes where applicable
- Clearly separate facts from different sources
- Look for keywords related to the question (currency, money, dollar, etc.)
- End with source citations

Answer:
"""


PROMPT = PromptTemplate(
    template=prompt_template, 
    input_variables=["context", "question"]
)
print("‚úÖ Prompt template created with grounding instructions")


PART 7: PROMPT ENGINEERING FOR GROUNDING
‚úÖ Prompt template created with grounding instructions


In [9]:
# ========================================================================
# WORKSHOP ACTIVITY 8: RAG CHAIN ASSEMBLY
# ========================================================================
# LEARNING OBJECTIVE: Combine all components into a working system

print("\n" + "="*50)
print("PART 8: RAG CHAIN ASSEMBLY")
print("="*50)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",    # WORKSHOP NOTE: "stuff" = include all context in prompt
    retriever=retriever,
    chain_type_kwargs={
        "prompt": PROMPT,
        "document_separator": "\n\n--- SOURCE DOCUMENT ---\n\n"
    },
    return_source_documents=True,  # Essential for verification
    verbose=False  # WORKSHOP TIP: Set to True for debugging
)

print("‚úÖ RAG chain assembled successfully!")
print("   Components connected: Retriever ‚Üí LLM ‚Üí Response")


PART 8: RAG CHAIN ASSEMBLY
‚úÖ RAG chain assembled successfully!
   Components connected: Retriever ‚Üí LLM ‚Üí Response


In [10]:
# ========================================================================
# WORKSHOP ACTIVITY 9: ANSWER VALIDATION SYSTEM
# ========================================================================
# LEARNING OBJECTIVE: Implement quality control for RAG responses

def validate_answer(answer, source_docs):
    """
    WORKSHOP FUNCTION: Answer Quality Assessment
    
    PURPOSE: Detect potential hallucinations and assess grounding quality
    
    PARAMETERS:
    - answer: Generated response from RAG system
    - source_docs: Retrieved documents used for context
    
    RETURNS:
    - confidence_score: Float between 0.0 and 1.0
    - warnings: List of quality issues detected
    """
    answer_lower = answer.lower()
    
    # Define hallucination indicators
    # WORKSHOP EXERCISE: Add more phrases students might identify
    hallucination_phrases = [
        "i think", "probably", "likely", "it seems", "perhaps", 
        "generally speaking", "typically", "usually", "in most cases"
    ]
    
    confidence_score = 1.0
    warnings = []
    
    # Check for uncertain language
    for phrase in hallucination_phrases:
        if phrase in answer_lower:
            confidence_score -= 0.2
            warnings.append(f"Uncertain language detected: '{phrase}'")
    
    # Verify source citation
    has_citations = any(doc.metadata['source'].lower() in answer_lower for doc in source_docs)
    if not has_citations:
        confidence_score -= 0.3
        warnings.append("Answer does not reference source documents")
    
    return max(0.0, confidence_score), warnings

def ask_question_with_validation(question):
    """
    WORKSHOP FUNCTION: Complete RAG Query with Validation
    
    This function demonstrates the full RAG pipeline:
    1. Question input
    2. Document retrieval
    3. Answer generation
    4. Quality validation
    5. Source verification
    """
    print(f"ü§î Question: {question}")
    print("\nüîç Retrieving relevant information...")
    
    # Execute RAG pipeline
    result = qa_chain.invoke({"query": question})
    answer = result["result"]
    source_docs = result["source_documents"]
    
    # Validate response quality
    confidence, warnings = validate_answer(answer, source_docs)
    
    # Display results with educational annotations
    print("\nüìù Answer:")
    print("="*50)
    print(answer)
    
    # Quality assessment
    print(f"\nüìä Quality Assessment:")
    print(f"   Confidence Score: {confidence:.2f}/1.0")
    
    if confidence >= 0.8:
        print("   ‚úÖ HIGH QUALITY: Well-grounded response")
    elif confidence >= 0.6:
        print("   ‚ö†Ô∏è  MEDIUM QUALITY: Review recommended")
    else:
        print("   ‚ùå LOW QUALITY: Potential hallucination detected")
    
    if warnings:
        print("\n‚ö†Ô∏è  Quality Warnings:")
        for warning in warnings:
            print(f"   ‚Ä¢ {warning}")
    
    # Enhanced source verification with keyword analysis
    print(f"\nüìö Retrieved Sources ({len(source_docs)} documents):")
    print("-" * 60)
    
    question_keywords = set(question.lower().split())
    
    for i, doc in enumerate(source_docs):
        content_keywords = set(doc.page_content.lower().split())
        keyword_overlap = question_keywords.intersection(content_keywords)
        
        print(f"{i+1}. Source: {doc.metadata['source']}")
        print(f"   Page: {doc.metadata.get('page', 'Unknown')}")
        print(f"   Keyword overlap: {list(keyword_overlap)}")
        print(f"   Content: {doc.page_content[:200]}...")
        print()
    
    # Suggest improvements if answer is not found
    if "do not contain information" in answer.lower():
        print("\nüí° TROUBLESHOOTING SUGGESTIONS:")
        print("1. Check if your question keywords appear in the documents")
        print("2. Try rephrasing the question with different terms")
        print("3. Verify the PDF content was properly extracted")
        print("4. Consider if the information spans multiple chunks")
        
        # Try alternative search terms
        if "currency" in question.lower():
            alt_terms = ["money", "dollar", "economic", "financial", "payment"]
            print(f"\nüîÑ Trying alternative search terms: {alt_terms}")
            for term in alt_terms:
                alt_docs = vectorstore.similarity_search(term, k=3)
                if alt_docs:
                    print(f"\n   Found content for '{term}':")
                    for doc in alt_docs[:1]:  # Show first match
                        print(f"   {doc.page_content[:100]}...")
    
    return result, confidence, warnings

In [11]:
# ========================================================================
# WORKSHOP ACTIVITY 10: HANDS-ON TESTING
# ========================================================================
# LEARNING OBJECTIVE: Test the complete RAG system

print("\n" + "="*80)
print("WORKSHOP DEMONSTRATION: TESTING THE RAG SYSTEM")
print("="*80)

# Sample question for demonstration
# WORKSHOP INSTRUCTION: Students should modify this question
question = "Explain the concept of the event horizon and the singularity within a black hole."

print("üß™ RUNNING SAMPLE QUERY...")
result, confidence, warnings = ask_question_with_validation(question)



WORKSHOP DEMONSTRATION: TESTING THE RAG SYSTEM
üß™ RUNNING SAMPLE QUERY...
ü§î Question: Explain the concept of the event horizon and the singularity within a black hole.

üîç Retrieving relevant information...

üìù Answer:
The event horizon of a black hole is described as "the boundary beyond which nothing can escape," indicating that it acts like an invisible barrier from SOURCE DOCUMENT. The singularity within the context refers to where gravity becomes infinitely strong, known as 'the central point' according to SOURCE DOCUMENT.


Both concepts are directly related but distinct aspects of a black hole: 
- "The event horizon is...beyond which nothing can escape," highlighting its role in trapping matter and radiation (SOURCE DOCUMENT).

- The singularity, on the other hand, represents an area where gravitational forces compress to their maximum extent within SOURCE DOCUMENT. 

Both statements are sourced from respective documents provided above without any contradictions betwee

In [12]:
# ========================================================================
# WORKSHOP ACTIVITY 10B: BATCH QUESTION TESTING
# ========================================================================
# LEARNING OBJECTIVE: Reuse the validated RAG pipeline for multiple queries

print("\n" + "="*80)
print("WORKSHOP DEMONSTRATION: BATCH QUESTIONS")
print("="*80)

# Predefined advanced questions (students can update this list)
batch_questions = [
    "Discuss the different 'arrows of time,' including the thermodynamic, psychological, and cosmological arrows. How are these arrows interconnected, and what do they imply about the ultimate fate of the universe?",
    "In the 'Many-Worlds' interpretation of quantum mechanics, describe the possibility of parallel universes. What is the evidence for this theory, and what makes it difficult to test or prove?",
    "Explain the concept of 'brane cosmology' as a proposed solution to the mysteries of dark matter and dark energy. How does the idea of our universe as a membrane in a higher-dimensional space address these issues?",
    "How does the discovery of dark energy, which suggests an accelerating expansion, challenge the core premise of a 'collapsing universe'?",
    "What is the significance of 'frame-dragging' in a rotating black hole, and how might it be observed or measured by an advanced civilization?"
]

# Store results for recap after the detailed outputs
batch_results = []

for idx, question in enumerate(batch_questions, start=1):
    print("\n" + "-"*80)
    print(f"Question {idx}:")
    print(question)
    print("-"*80)

    try:
        result_dict, confidence, warnings = ask_question_with_validation(question)
        batch_results.append({
            "question": question,
            "answer": result_dict.get("result", ""),
            "confidence": confidence,
            "warnings": warnings,
            "sources": [doc.metadata.get("source", "unknown") for doc in result_dict.get("source_documents", [])]
        })
    except Exception as error:
        print(f"‚ùå Error processing question {idx}: {error}")
        batch_results.append({"question": question, "error": str(error)})

# High-level recap (useful if students want a quick reference)
print("\n" + "="*80)
print("BATCH SUMMARY")
print("="*80)
for entry in batch_results:
    print(f"\n‚Ä¢ Question: {entry['question'][:80]}{'...' if len(entry['question']) > 80 else ''}")
    if "error" in entry:
        print(f"  Result: ‚ö†Ô∏è Error - {entry['error']}")
        continue

    print(f"  Confidence: {entry['confidence']:.2f}")
    if entry["warnings"]:
        print(f"  Warnings: {', '.join(entry['warnings'])}")
    else:
        print("  Warnings: None")

    if entry["sources"]:
        print(f"  Sources: {', '.join(entry['sources'])}")
    else:
        print("  Sources: None")


WORKSHOP DEMONSTRATION: BATCH QUESTIONS

--------------------------------------------------------------------------------
Question 1:
Discuss the different 'arrows of time,' including the thermodynamic, psychological, and cosmological arrows. How are these arrows interconnected, and what do they imply about the ultimate fate of the universe?
--------------------------------------------------------------------------------
ü§î Question: Discuss the different 'arrows of time,' including the thermodynamic, psychological, and cosmological arrows. How are these arrows interconnected, and what do they imply about the ultimate fate of the universe?

üîç Retrieving relevant information...
‚ùå Error processing question 1: 'source'

--------------------------------------------------------------------------------
Question 2:
In the 'Many-Worlds' interpretation of quantum mechanics, describe the possibility of parallel universes. What is the evidence for this theory, and what makes it difficult 

In [13]:
# ========================================================================
# WORKSHOP ACTIVITY 10: HANDS-ON TESTING
# ========================================================================
# LEARNING OBJECTIVE: Test the complete RAG system

print("\n" + "="*80)
print("WORKSHOP DEMONSTRATION: TESTING THE RAG SYSTEM")
print("="*80)

# Sample question for demonstration
# WORKSHOP INSTRUCTION: Students should modify this question
question = "What is the significance of 'frame-dragging' in a rotating black hole, and how might it be observed or measured by an advanced civilization?"

print("üß™ RUNNING SAMPLE QUERY...")
result, confidence, warnings = ask_question_with_validation(question)


WORKSHOP DEMONSTRATION: TESTING THE RAG SYSTEM
üß™ RUNNING SAMPLE QUERY...
ü§î Question: What is the significance of 'frame-dragging' in a rotating black hole, and how might it be observed or measured by an advanced civilization?

üîç Retrieving relevant information...

üìù Answer:
The concept of 'frame-dragging' is not discussed in any provided documents. Therefore, I must respond as follows based on my instructions and available information from SOURCE DOCUMENT ONE: "The event horizon of a black hole is the boundary beyond which nothing can escape, not even light." (SOURCE DOCUMENT ONE)

Since 'frame-dragging' was never mentioned in any document provided to me for analysis and my instructions dictate that I should only use information explicitly stated within these documents, it would be incorrect to speculate or infer about the significance of frame-dragging. Additionally, as there is no mention of how an advanced civilization might observe this phenomenon without direct eviden