# Building a Basic RAG Agent with GoodMem

## Overview

This tutorial will guide you through building a complete **Retrieval-Augmented Generation (RAG)** system using GoodMem's vector memory capabilities. By the end of this guide, you'll have a functional Q&A system that can:

- 🔍 **Semantically search** through your documents
- 📝 **Generate contextual answers** using retrieved information 
- 🏗️ **Scale to handle** large document collections

### What is RAG?

RAG combines the power of **retrieval** (finding relevant information) with **generation** (creating natural language responses). This approach allows AI systems to provide accurate, context-aware answers by:

1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the query with this context
3. **Generating** a comprehensive answer using both the query and retrieved information

### Why GoodMem for RAG?

GoodMem provides enterprise-grade vector storage with:
- **Multiple embedder support** for optimal retrieval accuracy
- **Streaming APIs** for real-time responses
- **Advanced post-processing** with reranking and summarization
- **Scalable architecture** for production workloads


## Prerequisites

Before starting, ensure you have:

- ✅ **GoodMem server running** locally or access to a remote instance
- ✅ **Python 3.9+** installed on your system
- ✅ **API key** for your GoodMem instance
- ✅ **OpenAI API key** (for embeddings and LLM)
- ✅ **Voyage AI API key** (optional, for reranking)

### Installing GoodMem

If you don't have GoodMem installed yet, you can install it with:

```bash
curl -s https://get.goodmem.ai | bash
```

**Environment setup:**
```bash
export GOODMEM_API_KEY="your-key-here"
export OPENAI_API_KEY="your-openai-key"
export VOYAGE_API_KEY="your-voyage-key"  # Optional
```

## Installation & Setup

First, let's install the required packages:

In [1]:
# Install required packages
!pip install goodmem-client openai python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Authentication & Configuration

### Why This Matters

GoodMem uses API key authentication to secure your vector memory data. Proper configuration ensures:
- **Secure access** to your GoodMem instance
- **Isolated environments** (development, staging, production)
- **Usage tracking** and access control per API key

### What We'll Do

1. Configure the GoodMem host URL (where your server is running)
2. Set up API key authentication
3. Verify the configuration is correct

### Configuration Options

- **Local development**: `http://localhost:8080` (default)
- **Remote/production**: Your deployed GoodMem URL
- **Environment variables**: Best practice for managing credentials

Let's configure our GoodMem client and test the connection:

In [2]:
import os
import json
import time
from typing import List, Dict, Optional
from dotenv import load_dotenv

# Load environment variables (optional)
load_dotenv()

# Configuration - Update these values for your setup
GOODMEM_HOST = os.getenv('GOODMEM_HOST', 'http://localhost:8080')
GOODMEM_API_KEY = os.getenv('GOODMEM_API_KEY', 'your-api-key-here')

print(f"GoodMem Host: {GOODMEM_HOST}")
print(f"API Key configured: {'Yes' if GOODMEM_API_KEY != 'your-api-key-here' else 'No - Please update'}")

GoodMem Host: http://localhost:8080
API Key configured: Yes


In [3]:
# Import GoodMem client libraries
from goodmem_client.api import SpacesApi, MemoriesApi, EmbeddersApi
from goodmem_client.configuration import Configuration
from goodmem_client.api_client import ApiClient
from goodmem_client.streaming import MemoryStreamClient
from goodmem_client.exceptions import ApiException

# Configure the API client
def create_goodmem_clients():
    """Create and configure GoodMem API clients."""
    configuration = Configuration()
    configuration.host = GOODMEM_HOST
    
    # Create API client instance
    api_client = ApiClient(configuration=configuration)
    
    # Add authentication header
    api_client.default_headers["x-api-key"] = GOODMEM_API_KEY
    
    # Create API instances
    spaces_api = SpacesApi(api_client=api_client)
    memories_api = MemoriesApi(api_client=api_client)
    embedders_api = EmbeddersApi(api_client=api_client)
    stream_client = MemoryStreamClient(api_client)
    
    return spaces_api, memories_api, embedders_api, stream_client, api_client

# Test connection
try:
    spaces_api, memories_api, embedders_api, stream_client, api_client = create_goodmem_clients()
    
    # Test the connection by listing spaces
    response = spaces_api.list_spaces()
    print(f"✅ Successfully connected to GoodMem!")
    print(f"   Found {len(getattr(response, 'spaces', []))} existing spaces")
    
except ApiException as e:
    print(f"❌ Error connecting to GoodMem: {e}")
    print("   Please check your API key and host configuration")
except Exception as e:
    print(f"❌ Unexpected error: {e}")

✅ Successfully connected to GoodMem!
   Found 0 existing spaces


## Creating an Embedder

### Why Embedders Matter

An **embedder** is the foundation of semantic search. It converts text into high-dimensional vectors (embeddings) that capture meaning:

```
Text: "vacation policy" → Vector: [0.23, -0.45, 0.67, ...]  (1536 dimensions)
```

These vectors enable:
- **Semantic similarity**: Find conceptually similar content, not just keyword matches
- **Context understanding**: Capture meaning beyond exact word matches
- **Efficient retrieval**: Fast vector comparisons using specialized indexes

### The RAG Pipeline Flow

```
Documents → Embedder → Vector Storage → Semantic Search → Retrieved Context
```

### Choosing an Embedder

**OpenAI `text-embedding-3-small`** (what we'll use):
- ✅ **High quality**: Excellent for most use cases
- ✅ **Fast**: Low latency for real-time applications  
- ✅ **1536 dimensions**: Good balance of quality and storage
- ✅ **Cost-effective**: $0.02 per 1M tokens

**Other options**:
- **text-embedding-3-large**: Higher quality, 3072 dimensions, more expensive
- **Voyage AI**: Specialized for search, excellent retrieval performance
- **Cohere**: Good multilingual support
- **Local models**: HuggingFace sentence transformers for privacy/offline

### What We'll Do

1. Check if an embedder already exists
2. If not, create an OpenAI embedder with proper authentication
3. Verify the embedder is ready for use

**Note**: You'll need an OpenAI API key set in your environment variable `OPENAI_API_KEY`.

In [4]:
from goodmem_client.models import EmbedderCreationRequest, ApiKeyAuth, EndpointAuthentication

def create_openai_embedder():
    """Create an OpenAI embedder for text embedding."""
    
    # Check if OPENAI_API_KEY is set
    openai_api_key = os.getenv('OPENAI_API_KEY')
    if not openai_api_key:
        print("❌ OPENAI_API_KEY environment variable not set!")
        print("   Please set your OpenAI API key:")
        print("   export OPENAI_API_KEY='your-api-key-here'")
        return None
    
    try:
        # Check if embedder already exists
        embedders_response = embedders_api.list_embedders()
        existing_embedders = getattr(embedders_response, 'embedders', [])
        
        # Look for existing OpenAI text-embedding-3-small embedder
        for embedder in existing_embedders:
            if (embedder.provider_type == "OPENAI" and 
                getattr(embedder, 'model_identifier', '') == "text-embedding-3-small"):
                print(f"✅ OpenAI embedder already exists!")
                print(f"   Display Name: {embedder.display_name}")
                print(f"   Embedder ID: {embedder.embedder_id}")
                print(f"   Model: {getattr(embedder, 'model_identifier', 'N/A')}")
                print(f"   Dimensionality: {getattr(embedder, 'dimensionality', 'N/A')}")
                return embedder
        
        # Create new embedder
        print("🔧 Creating new OpenAI embedder...")
        
        # Create ApiKeyAuth for OpenAI
        api_key_auth = ApiKeyAuth(
            inline_secret=openai_api_key,
            header_name="Authorization",
            prefix="Bearer "
        )
        
        # Wrap in EndpointAuthentication
        credentials = EndpointAuthentication(
            kind="CREDENTIAL_KIND_API_KEY",
            api_key=api_key_auth
        )
        
        # Create embedder request with corrected parameters
        embedder_request = EmbedderCreationRequest(
            display_name="OpenAI Text Embedding 3 Small",
            provider_type="OPENAI",
            endpoint_url="https://api.openai.com/v1",
            model_identifier="text-embedding-3-small",
            dimensionality=1536,  # INTEGER, not string
            api_path="/embeddings",
            distribution_type="DENSE",
            supported_modalities=["TEXT"],
            credentials=credentials  # EndpointAuthentication object
        )
        
        # Create the embedder
        new_embedder = embedders_api.create_embedder(embedder_request)
        
        print(f"✅ Successfully created OpenAI embedder!")
        print(f"   Display Name: {new_embedder.display_name}")
        print(f"   Embedder ID: {new_embedder.embedder_id}")
        print(f"   Provider: {new_embedder.provider_type}")
        print(f"   Model: {getattr(new_embedder, 'model_identifier', 'N/A')}")
        print(f"   Dimensionality: {getattr(new_embedder, 'dimensionality', 'N/A')}")
        
        return new_embedder
        
    except ApiException as e:
        print(f"❌ Error creating embedder: {e}")
        return None
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        return None

# Create or retrieve the OpenAI embedder
openai_embedder = create_openai_embedder()

🔧 Creating new OpenAI embedder...
✅ Successfully created OpenAI embedder!
   Display Name: OpenAI Text Embedding 3 Small
   Embedder ID: be251293-d618-4715-baf4-67003ff3025d
   Provider: ProviderType.OPENAI
   Model: text-embedding-3-small
   Dimensionality: 1536


## Creating Your First Space

### What is a Space?

A **Space** in GoodMem is a logical container for organizing related memories (documents). Think of it as a database or collection where you store and retrieve semantically similar content.

Each space has:
- **Associated embedders**: Which models convert text to vectors
- **Chunking configuration**: How documents are split into searchable pieces
- **Access controls**: Public or private, with permission management
- **Metadata labels**: For organization and filtering

### Use Cases for Multiple Spaces

You might create different spaces for:
- **By domain**: Technical docs, HR policies, product specs
- **By environment**: Development, staging, production
- **By customer**: Tenant-specific data in multi-tenant apps
- **By privacy level**: Public FAQ vs. internal knowledge base

### Why Chunking Matters

Documents are too large to search efficiently as whole units. Chunking:
- **Improves relevance**: Match specific sections, not entire documents
- **Enables context**: Return focused chunks that answer specific questions  
- **Optimizes retrieval**: Process and compare smaller text segments

**Our chunking strategy**:
- **256 characters**: Short enough for focused context, long enough for meaning
- **25 character overlap**: Ensures concepts spanning chunk boundaries aren't lost
- **Hierarchical separators**: Split on paragraphs first, then sentences, then words

### What We'll Do

1. List available embedders
2. Create a space with our embedder and chunking configuration
3. Add metadata labels for organization
4. Verify the space is ready

Let's create a space for our RAG demo:

In [5]:
# First, let's see what embedders are available
try:
    embedders_response = embedders_api.list_embedders()
    available_embedders = getattr(embedders_response, 'embedders', [])
    
    print(f"📋 Available Embedders ({len(available_embedders)}):")
    for i, embedder in enumerate(available_embedders):
        print(f"   {i+1}. {embedder.display_name} - {embedder.provider_type}")
        print(f"      Model: {getattr(embedder, 'model_identifier', 'N/A')}")
        print(f"      ID: {embedder.embedder_id}")
        print()
        
    if available_embedders:
        default_embedder = available_embedders[0]
        print(f"🎯 Using embedder: {default_embedder.display_name}")
    else:
        print("⚠️  No embedders found. You may need to configure an embedder first.")
        print("   Refer to the documentation: See https://docs.goodmem.ai/docs/reference/cli/goodmem_embedder_create/")
        
except ApiException as e:
    print(f"❌ Error listing embedders: {e}")
    default_embedder = None

📋 Available Embedders (1):
   1. OpenAI Text Embedding 3 Small - ProviderType.OPENAI
      Model: text-embedding-3-small
      ID: be251293-d618-4715-baf4-67003ff3025d

🎯 Using embedder: OpenAI Text Embedding 3 Small


In [6]:
from goodmem_client.models import SpaceCreationRequest, SpaceEmbedderConfig

# Create a space for our RAG demo
SPACE_NAME = "RAG Demo Knowledge Base"

# Define chunking configuration that we'll reuse throughout the tutorial
# Save this configuration to ensure consistency across all memory creation requests
DEMO_CHUNKING_CONFIG = {
    "recursive": {
        "chunk_size": 256,                     # 256 character chunks for optimal RAG performance
        "chunk_overlap": 25,                   # 25 character overlap between chunks
        "separators": ["\n\n", "\n", ". ", " ", ""],  # Hierarchical splitting
        "keep_strategy": "KEEP_END",           # Append separator to preceding chunk
        "separator_is_regex": False,           # Plain text separators
        "length_measurement": "CHARACTER_COUNT" # Measure by characters
    }
}

def create_demo_space():
    """Create a space for our RAG demonstration."""
    try:
        # Check if space already exists
        existing_spaces = spaces_api.list_spaces()
        for space in getattr(existing_spaces, 'spaces', []):
            if space.name == SPACE_NAME:
                print(f"📁 Space '{SPACE_NAME}' already exists")
                print(f"   Space ID: {space.space_id}")
                print("   To remove existing space, see https://docs.goodmem.ai/docs/reference/cli/goodmem_space_delete/")
                return space
        
        # Configure space embedders if we have available embedders
        space_embedders = []
        if available_embedders:
            space_embedders = [
                SpaceEmbedderConfig(
                    embedder_id=default_embedder.embedder_id,
                    default_retrieval_weight=1.0
                )
            ]
        
        # Create space request with our saved chunking configuration
        create_request = SpaceCreationRequest(
            name=SPACE_NAME,
            labels={
                "purpose": "rag-demo",
                "environment": "tutorial", 
                "content-type": "documentation"
            },
            space_embedders=space_embedders,
            public_read=False,  # Private space
            default_chunking_config=DEMO_CHUNKING_CONFIG  # Use our saved config
        )
        
        # Create the space
        new_space = spaces_api.create_space(create_request)
        
        print(f"✅ Created space: {new_space.name}")
        print(f"   Space ID: {new_space.space_id}")
        print(f"   Embedders: {len(new_space.space_embedders)}")
        print(f"   Labels: {dict(new_space.labels)}")
        print(f"   Chunking Config Saved: {DEMO_CHUNKING_CONFIG['recursive']['chunk_size']} chars with {DEMO_CHUNKING_CONFIG['recursive']['chunk_overlap']} overlap")
        print(f"   💡 This chunking config will be reused for all memory creation!")
        
        return new_space
        
    except ApiException as e:
        print(f"❌ Error creating space: {e}")
        return None

# Create our demo space
demo_space = create_demo_space()

✅ Created space: RAG Demo Knowledge Base
   Space ID: 4b58a640-865a-414e-99ca-d96691071111
   Embedders: 1
   Labels: {'purpose': 'rag-demo', 'environment': 'tutorial', 'content-type': 'documentation'}
   Chunking Config Saved: 256 chars with 25 overlap
   💡 This chunking config will be reused for all memory creation!


In [7]:
# Verify our space configuration
if demo_space:
    try:
        # Get detailed space information
        space_details = spaces_api.get_space(demo_space.space_id)
        
        print(f"🔍 Space Configuration:")
        print(f"   Name: {space_details.name}")
        print(f"   Owner ID: {space_details.owner_id}")
        print(f"   Public Read: {space_details.public_read}")
        print(f"   Created: {space_details.created_at}")
        print(f"   Labels: {dict(space_details.labels)}")
        
        print(f"\n🤖 Associated Embedders:")
        for embedder_assoc in space_details.space_embedders:
            print(f"   Embedder ID: {embedder_assoc.embedder_id}")
            print(f"   Retrieval Weight: {embedder_assoc.default_retrieval_weight}")
            
    except ApiException as e:
        print(f"❌ Error getting space details: {e}")
else:
    print("⚠️  No space available for the demo")

🔍 Space Configuration:
   Name: RAG Demo Knowledge Base
   Owner ID: cf5df949-31c6-4c54-af50-f8002107164e
   Public Read: False
   Created: 1764795760308
   Labels: {'purpose': 'rag-demo', 'environment': 'tutorial', 'content-type': 'documentation'}

🤖 Associated Embedders:
   Embedder ID: be251293-d618-4715-baf4-67003ff3025d
   Retrieval Weight: 1.0


## Adding Documents to Memory

### The Document Processing Pipeline

When you add a document to GoodMem, it goes through several automated steps:

```
1. Ingestion → 2. Chunking → 3. Embedding → 4. Indexing → 5. Ready for Search
```

**What happens**:
1. **Ingestion**: Document content and metadata are stored
2. **Chunking**: Text is split according to your configuration (256 chars, 25 overlap)
3. **Embedding**: Each chunk is converted to a vector by your embedder
4. **Indexing**: Vectors are indexed for fast similarity search
5. **Status**: Document marked as `COMPLETED` and ready for retrieval

### Single vs. Batch Operations

**Single memory creation** (`CreateMemory`):
- ✅ Good for: Real-time ingestion, single documents
- ✅ Synchronous processing with immediate status
- ⚠️ Higher overhead for bulk operations

**Batch memory creation** (`BatchCreateMemory`):
- ✅ Good for: Bulk imports, initial setup, periodic updates
- ✅ Lower overhead, efficient for multiple documents
- ✅ Async processing - check status via `ListMemories`
- ⚠️ Takes longer to get individual status feedback

### Metadata Best Practices

Rich metadata helps with:
- **Filtering**: Retrieve specific document types
- **Source attribution**: Show users where information came from
- **Organization**: Group and manage related documents
- **Debugging**: Track ingestion methods and dates

### What We'll Do

1. Load sample documents from local files
2. Create one document using single memory creation (to demo the API)
3. Create remaining documents using batch operation (more efficient)
4. Monitor processing status until all documents are ready

We'll use sample company documents that represent common business use cases:

In [8]:
import os
import base64

# Load our sample documents
def load_sample_documents():
    """Load sample documents from the sample_documents directory.
    
    Automatically discovers all files in the directory and handles:
    - .txt files: Read as plain text
    - .pdf files: Read as binary and base64 encode
    """
    documents = []
    sample_dir = "sample_documents"
    
    # Auto-discover all files in the directory
    if not os.path.exists(sample_dir):
        print(f"⚠️  Directory not found: {sample_dir}")
        return documents
    
    files = sorted(os.listdir(sample_dir))
    
    for filename in sorted(files):  # Sort for consistent ordering
        filepath = os.path.join(sample_dir, filename)
        
        # Skip directories
        if not os.path.isfile(filepath):
            continue
        
        # Determine file type by extension
        file_ext = os.path.splitext(filename)[1].lower()
        
        if file_ext == '.txt':
            # Handle text files
            try:
                with open(filepath, 'r', encoding='utf-8') as f:
                    content = f.read()
                
                documents.append({
                    'filename': filename,
                    'content': content,
                    'content_type': 'text/plain',
                    'is_binary': False
                })
                print(f"📄 Loaded: {filename} ({len(content):,} characters)")
                
            except Exception as e:
                print(f"⚠️  Error reading {filename}: {e}")
        
        elif file_ext == '.pdf':
            # Handle PDF files
            try:
                with open(filepath, 'rb') as f:
                    binary_content = f.read()
                
                # Base64 encode the binary content
                content_b64 = base64.b64encode(binary_content).decode('utf-8')
                
                documents.append({
                    'filename': filename,
                    'content_b64': content_b64,
                    'content_type': 'application/pdf',
                    'is_binary': True
                })
                print(f"📄 Loaded: {filename} ({len(binary_content):,} bytes, base64: {len(content_b64):,} chars)")
                
            except Exception as e:
                print(f"⚠️  Error reading {filename}: {e}")
        else:
            print(f"⚠️  Skipping unsupported file type: {filename}")
    
    return documents

# Load the documents
sample_docs = load_sample_documents()
print(f"\n📚 Total documents loaded: {len(sample_docs)}")

📄 Loaded: company_handbook.txt (2,342 characters)
📄 Loaded: employee_handbook.pdf (399,615 bytes, base64: 532,820 chars)
📄 Loaded: product_faq.txt (4,043 characters)
📄 Loaded: security_policy.txt (4,211 characters)
📄 Loaded: technical_documentation.txt (2,384 characters)

📚 Total documents loaded: 5


In [9]:
# Create the first memory individually to demonstrate single memory creation
from goodmem_client.models import MemoryCreationRequest

def create_single_memory(space_id: str, document: dict) -> dict:
    """Create a single memory in GoodMem to demonstrate individual memory creation."""
    try:
        # Prepare memory request based on content type
        if document.get('is_binary', False):
            # For binary files (PDF), use original_content_b64
            memory_request = MemoryCreationRequest(
                space_id=space_id,
                original_content_b64=document['content_b64'],  # Base64 encoded
                content_type=document['content_type'],          # application/pdf
                metadata={
                    "filename": document['filename'],
                    "source": "sample_documents",
                    "ingestion_method": "single"
                },
                chunkingConfig=DEMO_CHUNKING_CONFIG
            )
        else:
            # For text files, use original_content
            memory_request = MemoryCreationRequest(
                space_id=space_id,
                original_content=document['content'],          # Plain text
                content_type=document['content_type'],         # text/plain
                metadata={
                    "filename": document['filename'],
                    "source": "sample_documents",
                    "ingestion_method": "single"
                },
                chunkingConfig=DEMO_CHUNKING_CONFIG
            )

        # Create the memory
        memory = memories_api.create_memory(memory_request)
        
        return memory
        
    except ApiException as e:
        print(f"❌ Error creating memory for {document['filename']}: {e}")
        return None
    except Exception as e:
        print(f"❌ Unexpected error with {document['filename']}: {e}")
        return None

if demo_space and sample_docs:
    # Create the first document using single memory creation
    first_doc = sample_docs[0]
    print(f"📝 Creating first document using CreateMemory API:")
    print(f"   Document: {first_doc['filename']}")
    print(f"   Content Type: {first_doc['content_type']}")
    print(f"   Method: Individual memory creation")
    print()

    single_memory = create_single_memory(demo_space.space_id, first_doc)
        
    if single_memory:
        print(f"🎯 Single memory creation completed successfully!")
    else:
        print(f"⚠️  Single memory creation failed")
else:
    print("⚠️  Cannot create memory: missing space or documents")
    single_memory = None

📝 Creating first document using CreateMemory API:
   Document: company_handbook.txt
   Content Type: text/plain
   Method: Individual memory creation

🎯 Single memory creation completed successfully!


In [10]:
# Demonstrate retrieving a memory by ID using get_memory
import base64

if single_memory:
    try:
        print(f"📖 Retrieving memory details using get_memory API:")
        print(f"   Memory ID: {single_memory.memory_id}")
        print()
        
        # Retrieve the memory without content
        retrieved_memory = memories_api.get_memory(
            id=single_memory.memory_id,
            include_content=False
        )
        
        print(f"✅ Successfully retrieved memory:")
        print(f"   Memory ID: {retrieved_memory.memory_id}")
        print(f"   Space ID: {retrieved_memory.space_id}")
        print(f"   Status: {retrieved_memory.processing_status}")
        print(f"   Content Type: {retrieved_memory.content_type}")
        print(f"   Created At: {retrieved_memory.created_at}")
        print(f"   Updated At: {retrieved_memory.updated_at}")
        
        if retrieved_memory.metadata:
            print(f"\n   📋 Metadata:")
            for key, value in retrieved_memory.metadata.items():
                print(f"      {key}: {value}")
        
        # Now retrieve with content included
        print(f"\n📖 Retrieving memory with content:")
        retrieved_with_content = memories_api.get_memory(
            id=single_memory.memory_id,
            include_content=True
        )
        
        if retrieved_with_content.original_content:
            # Decode the base64 encoded content
            decoded_content = base64.b64decode(retrieved_with_content.original_content).decode('utf-8')
            
            print(f"✅ Content retrieved and decoded:")
            print(f"   Content length: {len(decoded_content)} characters")
            print(f"   First 200 chars: {decoded_content[:200]}...")
        else:
            print(f"⚠️  No content available")
            
    except ApiException as e:
        print(f"❌ Error retrieving memory: {e}")
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
else:
    print("⚠️  No memory available to retrieve")

📖 Retrieving memory details using get_memory API:
   Memory ID: fdee35c7-1ad2-483c-8a7c-679b406ea99d

✅ Successfully retrieved memory:
   Memory ID: fdee35c7-1ad2-483c-8a7c-679b406ea99d
   Space ID: 4b58a640-865a-414e-99ca-d96691071111
   Status: PENDING
   Content Type: text/plain
   Created At: 1764795760359
   Updated At: 1764795760359

   📋 Metadata:
      source: sample_documents
      filename: company_handbook.txt
      ingestion_method: single

📖 Retrieving memory with content:
✅ Content retrieved and decoded:
   Content length: 2342 characters
   First 200 chars: ACME Corporation Employee Handbook

Welcome to ACME Corporation! This handbook provides essential information about our company policies, procedures, and culture.

COMPANY OVERVIEW
ACME Corporation is...


In [11]:
# Create the remaining documents using batch memory creation
from goodmem_client.models import BatchMemoryCreationRequest

def create_batch_memories(space_id: str, documents: List[dict]) -> List[dict]:
    """Create multiple memories in GoodMem using batch creation for efficiency."""
    
    # Prepare batch memory requests using our saved chunking configuration
    memory_requests = []
    for i, doc in enumerate(documents):
        
        # Create memory request based on content type
        if doc.get('is_binary', False):
            # For binary files (PDF), use original_content_b64
            memory_request = MemoryCreationRequest(
                space_id=space_id,
                original_content_b64=doc['content_b64'],
                content_type=doc['content_type'],
                chunking_config=DEMO_CHUNKING_CONFIG,
                metadata={
                    "filename": doc['filename'],
                    "source": "sample_documents",
                    "ingestion_method": "batch"
                }
            )
        else:
            # For text files, use original_content
            memory_request = MemoryCreationRequest(
                space_id=space_id,
                original_content=doc['content'],
                content_type=doc['content_type'],
                chunking_config=DEMO_CHUNKING_CONFIG,
                metadata={
                    "filename": doc['filename'],
                    "source": "sample_documents",
                    "ingestion_method": "batch"
                }
            )
        
        memory_requests.append(memory_request)
    
    try:
        # Create batch request
        batch_request = BatchMemoryCreationRequest(
            requests=memory_requests
        )
        
        print(f"📦 Creating {len(memory_requests)} memories using BatchCreateMemory API:")
        # Execute batch creation - this returns None on success
        memories_api.batch_create_memory(batch_request)
        
    except ApiException as e:
        print(f"❌ Error during batch creation: {e}")
    except Exception as e:
        print(f"❌ Unexpected error during batch creation: {e}")

if demo_space and sample_docs and len(sample_docs) > 1:
    # Create the remaining documents (skip the first one we already created)
    remaining_docs = sample_docs[1:]  # All documents except the first
    create_batch_memories(demo_space.space_id, remaining_docs)
    
    print(f"\n📋 Total Memory Creation Summary:")
    print(f"   📄 Single CreateMemory: 1 document")
    print(f"   📦 Batch CreateMemory: {len(remaining_docs)} documents submitted")
    print(f"   ⏳ Check processing status in the next cell")
    
else:
    print("⚠️  Cannot create batch memories: insufficient documents or missing space")

📦 Creating 4 memories using BatchCreateMemory API:

📋 Total Memory Creation Summary:
   📄 Single CreateMemory: 1 document
   📦 Batch CreateMemory: 4 documents submitted
   ⏳ Check processing status in the next cell


In [12]:
# List all memories in our space to verify they're ready
if demo_space:
    try:
        memories_response = memories_api.list_memories(space_id=demo_space.space_id)
        memories = getattr(memories_response, 'memories', [])
        
        print(f"📚 Memories in space '{demo_space.name}':")
        print(f"   Total memories: {len(memories)}")
        print()
        
        for i, memory in enumerate(memories, 1):
            metadata = memory.metadata or {}
            filename = metadata.get('filename', 'Unknown')
            
            print(f"   {i}. {filename}")
            print(f"      Status: {memory.processing_status}")
            print(f"      Created: {memory.created_at}")
            print()
            
    except ApiException as e:
        print(f"❌ Error listing memories: {e}")

📚 Memories in space 'RAG Demo Knowledge Base':
   Total memories: 5

   1. employee_handbook.pdf
      Status: PENDING
      Created: 1764795760405

   2. product_faq.txt
      Status: PENDING
      Created: 1764795760405

   3. technical_documentation.txt
      Status: PENDING
      Created: 1764795760405

   4. security_policy.txt
      Status: PENDING
      Created: 1764795760405

   5. company_handbook.txt
      Status: PENDING
      Created: 1764795760359



In [13]:
# Monitor processing status for all created memories
def wait_for_processing_completion(space_id: str, max_wait_seconds: int = 120):
    """Wait for memories to finish processing."""
    print("⏳ Waiting for document processing to complete...")
    print("   💡 Note: Batch memories are processed asynchronously, so we check by listing all memories in the space")
    print()
    
    start_time = time.time()
    while time.time() - start_time < max_wait_seconds:
        try:
            # List memories in our space
            memories_response = memories_api.list_memories(space_id=space_id)
            memories = getattr(memories_response, 'memories', [])
            
            # Check processing status
            status_counts = {}
            for memory in memories:
                status = memory.processing_status
                status_counts[status] = status_counts.get(status, 0) + 1
            
            print(f"📊 Processing status: {dict(status_counts)} (Total: {len(memories)} memories)")
            
            # Check if all are completed
            if all(memory.processing_status == 'COMPLETED' for memory in memories):
                print("✅ All documents processed successfully!")
                return True
                
            # Check for any failures
            failed_count = status_counts.get('FAILED', 0)
            if failed_count > 0:
                print(f"❌ {failed_count} memories failed processing")
                return False
            
            time.sleep(5)  # Wait 5 seconds before checking again
            
        except ApiException as e:
            print(f"❌ Error checking processing status: {e}")
            return False
    
    print(f"⏰ Timeout waiting for processing (waited {max_wait_seconds}s)")
    return False

if demo_space:
    # Wait for processing to complete for all memories (single + batch)
    # Since batch_create_memory returns None, we monitor by listing all memories
    processing_complete = wait_for_processing_completion(demo_space.space_id)
    
    if processing_complete:
        print("🎉 Ready for semantic search and retrieval!")
        print(f"📈 Batch API benefit: Multiple documents submitted in a single API call")
        print(f"🔧 Consistent chunking: All memories use DEMO_CHUNKING_CONFIG")
    else:
        print("⚠️  Some documents may still be processing. You can continue with the tutorial.")
else:
    print("⚠️  Skipping processing check - no space available")
    processing_complete = False

⏳ Waiting for document processing to complete...
   💡 Note: Batch memories are processed asynchronously, so we check by listing all memories in the space

📊 Processing status: {'PENDING': 5} (Total: 5 memories)


📊 Processing status: {'COMPLETED': 5} (Total: 5 memories)
✅ All documents processed successfully!
🎉 Ready for semantic search and retrieval!
📈 Batch API benefit: Multiple documents submitted in a single API call
🔧 Consistent chunking: All memories use DEMO_CHUNKING_CONFIG


## Semantic Search & Retrieval

### Why Semantic Search?

**Traditional keyword search**:
- Matches exact words or simple variations
- Misses conceptually similar content with different wording
- Example: "vacation days" won't match "time off policy"

**Semantic search**:
- Understands meaning and context
- Finds conceptually similar content regardless of exact wording
- Example: "vacation days" successfully matches "time off policy"

### How It Works

```
Query: "vacation policy" 
   ↓ (embed with same embedder)
Query Vector: [0.23, -0.45, ...]
   ↓ (compare to all chunk vectors)
Most Similar Chunks: (by cosine similarity)
   1. "TIME OFF POLICY..." (score: -0.604)
   2. "Vacation requests..." (score: -0.544)
   3. "WORK HOURS..." (score: -0.458)
```

### Understanding Relevance Scores

GoodMem uses **cosine distance** (negative cosine similarity):
- **Lower values = more relevant** (e.g., -0.6 is better than -0.4)
- **Range**: Typically -1.0 (most similar) to 0.0 (unrelated)
- **Good threshold**: Results under -0.3 are usually relevant
- **Context matters**: Exact scores vary by embedder and content

### Streaming API Benefits

GoodMem's streaming API:
- **Real-time results**: Process chunks as they arrive
- **Low latency**: Start showing results immediately
- **Memory efficient**: No need to buffer entire result set
- **Progressive UI**: Update interface as more results come in

### What We'll Do

1. Implement a semantic search function using GoodMem's streaming API
2. Process different event types (chunks, memories, metadata)
3. Display results with relevance scores
4. Test with various queries to see semantic matching in action

Now comes the exciting part! Let's perform semantic search using GoodMem's streaming API. This will:

- **Find relevant chunks** based on semantic similarity
- **Stream results** in real-time
- **Include relevance scores** for ranking
- **Return structured data** for easy processing

In [14]:
def semantic_search(query: str, space_id: str, max_results: int = 5) -> List[dict]:
    """
    Perform semantic search using GoodMem's streaming API.
    
    Args:
        query: The search query
        space_id: ID of the space to search
        max_results: Maximum number of results to return
    
    Returns:
        List of search results with chunks and metadata
    """
    
    try:
        print(f"🔍 Searching for: '{query}'")
        print(f"📁 Space ID: {space_id}")
        print(f"📊 Max results: {max_results}")
        print("-" * 50)
        
        # Perform streaming search
        event_count = 0
        retrieved_chunks = []
        
        for event in stream_client.retrieve_memory_stream(
            message=query,
            space_ids=[space_id],
            requested_size=max_results,
            fetch_memory=True,
            fetch_memory_content=False,  # We don't need full content for this demo
            format="ndjson"
        ):
            event_count += 1
            
            if event.retrieved_item and event.retrieved_item.chunk:
                chunk_info = event.retrieved_item.chunk
                chunk_data = chunk_info.chunk
                
                retrieved_chunks.append({
                    'chunk_text': chunk_data.get('chunkText', ''),
                    'relevance_score': chunk_info.relevance_score,
                    'memory_index': chunk_info.memory_index,
                    'result_set_id': chunk_info.result_set_id,
                    'chunk_sequence': chunk_data.get('chunkSequenceNumber', 0)
                })
                
                print(f"📄 Chunk {len(retrieved_chunks)}:")
                print(f"   Relevance: {chunk_info.relevance_score:.3f}")
                print(f"   Text: {chunk_data.get('chunkText', '')}...")
                print()
        
        print(f"✅ Search completed: {len(retrieved_chunks)} chunks found, {event_count} events processed")
        return retrieved_chunks
        
    except Exception as e:
        print(f"❌ Error during search: {e}")
        return []

# Test semantic search with a sample query
if demo_space:
    sample_query = "What is the vacation policy for employees?"
    search_results = semantic_search(sample_query, demo_space.space_id)
else:
    print("⚠️  No space available for search")
    search_results = []

🔍 Searching for: 'What is the vacation policy for employees?'
📁 Space ID: 4b58a640-865a-414e-99ca-d96691071111
📊 Max results: 5
--------------------------------------------------
📄 Chunk 1:
   Relevance: -0.679
   Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 company holidays
- Personal days as needed with manager approval...

📄 Chunk 2:
   Relevance: -0.675
   Text: 1.  Eligibility 

 
All regular full-time employees are eligible for vacation benefits. 

 
2.  Accrual 

 
Eligible employees accrue vacation in accordance with the following scheduleix: 

 
Years of Continuous Service Rate of Accrual 

Date of hire through end of year 
5...

📄 Chunk 3:
   Relevance: -0.662
   Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off with pay so that they may be free from their regular duties for a period of rest and 
relaxation witho

In [15]:
# Let's try a few different queries to see how semantic search works
def test_multiple_queries(space_id: str):
    """Test semantic search with different types of queries."""
    
    test_queries = [
        "How do I reset my password?",
        "What are the security requirements for remote work?", 
        "API authentication and rate limits",
        "Employee benefits and health insurance",
        "How much does the software cost?"
    ]
    
    for i, query in enumerate(test_queries, 1):
        print(f"\n🔍 Test Query {i}: {query}")
        print("=" * 60)
        
        semantic_search(query, space_id, max_results=3)
        
        print("\n" + "-" * 60)

if demo_space:
    test_multiple_queries(demo_space.space_id)
else:
    print("⚠️  No space available for testing multiple queries")


🔍 Test Query 1: How do I reset my password?
🔍 Searching for: 'How do I reset my password?'
📁 Space ID: 4b58a640-865a-414e-99ca-d96691071111
📊 Max results: 3
--------------------------------------------------
📄 Chunk 1:
   Relevance: -0.370
   Text: password they use to gain access to computers or the Internet, as well as any change to 
such password.  Such notice must be made immediately. 

 
4. Compliance...

📄 Chunk 2:
   Relevance: -0.363
   Text: - No reuse of last 12 passwords
- Must be changed every 90 days for privileged accounts
- Multi-factor authentication required for all business systems
- Password managers recommended for personal password storage

ACCEPTABLE USE POLICY...

📄 Chunk 3:
   Relevance: -0.305
   Text: Each classification level has specific handling, storage, and transmission requirements outlined in our data handling procedures.

PASSWORD POLICY
Strong passwords are essential for system security:
- Minimum 12 characters with mix of letters, numbers, and symbo

## Advanced Features

Congratulations! 🎉 You've successfully built a semantic search system using GoodMem. Here's what you've accomplished:

### ✅ What You Built
- **Document ingestion pipeline** with automatic chunking and embedding
- **Semantic search system** with relevance scoring
- **Simple Q&A system** using GoodMem's vector capabilities

### 🚀 Next Steps for Advanced Implementation

#### Reranking
Improve search quality by adding a reranking stage. **Rerankers** are specialized models that re-score search results to improve relevance:

- **Two-stage retrieval**: Fast initial retrieval with embeddings, then precise reranking
- **Better relevance**: Rerankers use cross-attention to understand query-document relationships
- **Reduced costs**: Rerank only top-K results instead of entire corpus
- **Voyage AI reranker**: Industry-leading reranking model with state-of-the-art performance

The combination of fast embedding-based retrieval followed by accurate reranking provides the best balance of speed and quality for production RAG systems.

## Configuring a Reranker

To further improve search quality, we can add a **reranker** to our RAG pipeline. While embedders provide fast semantic search, rerankers use more sophisticated models to re-score the top results for better accuracy.

### Why Use Reranking?

1. **Higher Accuracy**: Rerankers use cross-encoder architectures that directly compare queries and documents
2. **Two-Stage Pipeline**: Fast retrieval with embeddings + precise reranking = optimal performance
3. **Cost Effective**: Only rerank top-K results (e.g., top 20) rather than entire corpus

### Voyage AI Reranker

We'll use Voyage AI's `rerank-2.5` model, which provides:
- **State-of-the-art performance** on reranking benchmarks
- **Fast inference** optimized for production use
- **Simple API** that integrates seamlessly with GoodMem

**Note**: You'll need a Voyage AI API key set in your environment variable `VOYAGE_API_KEY`.

In [16]:
from goodmem_client.api import RerankersApi
from goodmem_client.models import RerankerCreationRequest, ApiKeyAuth, EndpointAuthentication

def create_voyage_reranker():
    """Create a Voyage AI reranker for improving search results."""
    
    # Check if VOYAGE_API_KEY is set
    voyage_api_key = os.getenv('VOYAGE_API_KEY')
    if not voyage_api_key:
        print("❌ VOYAGE_API_KEY environment variable not set!")
        print("   Please set your Voyage AI API key:")
        print("   export VOYAGE_API_KEY='your-api-key-here'")
        return None
    
    try:
        # Create RerankersApi instance
        rerankers_api = RerankersApi(api_client=api_client)
        
        # Check if reranker already exists
        rerankers_response = rerankers_api.list_rerankers()
        existing_rerankers = getattr(rerankers_response, 'rerankers', [])
        
        # Look for existing Voyage rerank-2.5 reranker
        for reranker in existing_rerankers:
            if (reranker.provider_type == "VOYAGE" and 
                getattr(reranker, 'model_identifier', '') == "rerank-2.5"):
                print(f"✅ Voyage reranker already exists!")
                print(f"   Display Name: {reranker.display_name}")
                print(f"   Reranker ID: {reranker.reranker_id}")
                print(f"   Model: {getattr(reranker, 'model_identifier', 'N/A')}")
                return reranker
        
        # Create new reranker
        print("🔧 Creating new Voyage reranker...")
        
        # Create ApiKeyAuth for Voyage
        api_key_auth = ApiKeyAuth(
            inline_secret=voyage_api_key,
            header_name="Authorization",
            prefix="Bearer "
        )
        
        # Wrap in EndpointAuthentication
        credentials = EndpointAuthentication(
            kind="CREDENTIAL_KIND_API_KEY",
            api_key=api_key_auth
        )
        
        # Create reranker request
        reranker_request = RerankerCreationRequest(
            display_name="Voyage Rerank 2.5",
            provider_type="VOYAGE",
            endpoint_url="https://api.voyageai.com",
            model_identifier="rerank-2.5",
            api_path="/v1/rerank",
            supported_modalities=["TEXT"],
            credentials=credentials,
            description="Voyage AI reranker for improving search result relevance"
        )
        
        # Create the reranker
        new_reranker = rerankers_api.create_reranker(reranker_request)
        
        print(f"✅ Successfully created Voyage reranker!")
        print(f"   Display Name: {new_reranker.display_name}")
        print(f"   Reranker ID: {new_reranker.reranker_id}")
        print(f"   Provider: {new_reranker.provider_type}")
        print(f"   Model: {getattr(new_reranker, 'model_identifier', 'N/A')}")
        
        return new_reranker
        
    except ApiException as e:
        print(f"❌ Error creating reranker: {e}")
        return None
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        return None

# Create or retrieve the Voyage reranker
voyage_reranker = create_voyage_reranker()

🔧 Creating new Voyage reranker...
✅ Successfully created Voyage reranker!
   Display Name: Voyage Rerank 2.5
   Reranker ID: 472e9efc-e542-441b-86a5-34538c936f2b
   Provider: ProviderType.VOYAGE
   Model: rerank-2.5


## Registering an LLM

The final component in our RAG pipeline is the **LLM (Large Language Model)** - the generation component that creates natural language responses using the retrieved and reranked context.

### Role of LLMs in RAG

After retrieving and reranking relevant chunks, the LLM:
1. **Receives the query** and retrieved context
2. **Generates a response** that synthesizes information from multiple sources
3. **Maintains coherence** while staying grounded in the retrieved facts

### OpenAI GPT-4o-mini

We'll use OpenAI's `gpt-4o-mini` model, which provides:
- **Fast inference** with low latency for real-time applications
- **Cost-effective** pricing compared to larger models
- **High quality** responses suitable for most RAG use cases
- **Function calling** support for advanced workflows

**Note**: This uses the same `OPENAI_API_KEY` environment variable as the embedder.

In [17]:
from goodmem_client.api import LLMsApi
from goodmem_client.models import LLMCreationRequest, LLMCapabilities, ApiKeyAuth, EndpointAuthentication

def create_openai_llm():
    """Register OpenAI GPT-4o-mini LLM with GoodMem."""
    
    # Check if OPENAI_API_KEY is set
    openai_api_key = os.getenv('OPENAI_API_KEY')
    if not openai_api_key:
        print("❌ OPENAI_API_KEY environment variable not set!")
        print("   Please set your OpenAI API key:")
        print("   export OPENAI_API_KEY='your-api-key-here'")
        return None
    
    try:
        # Create LLMsApi instance
        llms_api = LLMsApi(api_client=api_client)
        
        # Check if LLM already exists
        llms_response = llms_api.list_llms()
        existing_llms = getattr(llms_response, 'llms', [])
        
        # Look for existing OpenAI gpt-4o-mini LLM
        for llm in existing_llms:
            if (llm.provider_type == "OPENAI" and 
                getattr(llm, 'model_identifier', '') == "gpt-4o-mini"):
                print(f"✅ OpenAI GPT-4o-mini LLM already exists!")
                print(f"   Display Name: {llm.display_name}")
                print(f"   LLM ID: {llm.llm_id}")
                print(f"   Model: {getattr(llm, 'model_identifier', 'N/A')}")
                return llm
        
        # Create new LLM
        print("🔧 Registering new OpenAI GPT-4o-mini LLM...")
        
        # Create ApiKeyAuth for OpenAI
        api_key_auth = ApiKeyAuth(
            inline_secret=openai_api_key,
            header_name="Authorization",
            prefix="Bearer "
        )
        
        # Wrap in EndpointAuthentication
        credentials = EndpointAuthentication(
            kind="CREDENTIAL_KIND_API_KEY",
            api_key=api_key_auth
        )
        
        # Define LLM capabilities
        capabilities = LLMCapabilities(
            supports_chat=True,
            supports_completion=False,
            supports_function_calling=True,
            supports_system_messages=True,
            supports_streaming=True,
            supports_sampling_parameters=True
        )
        
        # Create LLM request
        llm_request = LLMCreationRequest(
            display_name="OpenAI GPT-4o Mini",
            provider_type="OPENAI",
            endpoint_url="https://api.openai.com/v1",
            model_identifier="gpt-4o-mini",
            api_path="/chat/completions",
            supported_modalities=["TEXT"],
            credentials=credentials,
            capabilities=capabilities,
            description="OpenAI's GPT-4o Mini model for fast and efficient text generation"
        )
        
        # Register the LLM
        response = llms_api.create_llm(llm_request)
        
        # The response has an 'llm' attribute which contains the LLMResponse
        new_llm = response.llm
        
        print(f"✅ Successfully registered OpenAI GPT-4o-mini LLM!")
        print(f"   Display Name: {new_llm.display_name}")
        print(f"   LLM ID: {new_llm.llm_id}")
        print(f"   Provider: {new_llm.provider_type}")
        print(f"   Model: {new_llm.model_identifier}")
        
        return new_llm
        
    except ApiException as e:
        print(f"❌ Error registering LLM: {e}")
        return None
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        import traceback
        traceback.print_exc()
        return None

# Register or retrieve the OpenAI LLM
openai_llm = create_openai_llm()

🔧 Registering new OpenAI GPT-4o-mini LLM...
✅ Successfully registered OpenAI GPT-4o-mini LLM!
   Display Name: OpenAI GPT-4o Mini
   LLM ID: a1436a82-2b97-48a2-95a6-0e4625eaaf07
   Provider: LLMProviderType.OPENAI
   Model: gpt-4o-mini


## Enhanced RAG with Reranking and LLM Generation

Now that we have all the components configured (embedder, reranker, and LLM), let's use the complete RAG pipeline! This demonstrates the full power of GoodMem:

1. **Retrieval**: Fast semantic search finds relevant chunks
2. **Reranking**: Voyage AI reranker re-scores results for better relevance  
3. **Generation**: OpenAI GPT-4o-mini generates a coherent response using the reranked context

This provides significantly better answer quality compared to simple retrieval alone.

In [18]:
def semantic_search_with_rag(query: str, space_id: str, max_results: int = 5):
    """
    Perform semantic search with reranking and LLM generation.
    
    This demonstrates the complete RAG pipeline:
    1. Retrieval - Find relevant chunks using semantic search
    2. Reranking - Re-score results with Voyage AI reranker
    3. Generation - Generate answer with OpenAI GPT-4o-mini
    
    Args:
        query: The search query
        space_id: ID of the space to search
        max_results: Maximum number of results to return
    
    Returns:
        Dict containing the LLM response and reranked chunks
    """
    
    try:
        print(f"🔍 RAG Query: '{query}'")
        print(f"📁 Space ID: {space_id}")
        print(f"📊 Max results: {max_results}")
        print("=" * 70)
        
        # Check if we have reranker and LLM
        if not voyage_reranker or not openai_llm:
            print("❌ Reranker or LLM not configured!")
            print("   Please run the reranker and LLM configuration cells first.")
            return None
        
        event_count = 0
        llm_response = None
        reranked_chunks = []
        
        # Use retrieve_memory_stream with post-processor for RAG
        for event in stream_client.retrieve_memory_stream(
            message=query,
            space_ids=[space_id],
            requested_size=max_results,
            fetch_memory=True,
            fetch_memory_content=False,
            post_processor_name="com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
            post_processor_config={
                "llm_id": openai_llm.llm_id,
                "reranker_id": voyage_reranker.reranker_id,
                "relevance_threshold": 0.3,
                "max_results": max_results
            },
            format="ndjson"
        ):
            event_count += 1
            
            # Handle LLM-generated response
            if event.abstract_reply and not llm_response:
                llm_response = event.abstract_reply.text
                print(f"\n🤖 LLM Generated Response:")
                print(f"   {llm_response}")
                print()
                print("-" * 70)
                print(f"\n📚 Source Chunks (Reranked):")
                print()
            
            # Handle reranked chunks
            if event.retrieved_item and event.retrieved_item.chunk:
                chunk_info = event.retrieved_item.chunk
                chunk_data = chunk_info.chunk
                
                reranked_chunks.append({
                    'chunk_text': chunk_data.get('chunkText', ''),
                    'relevance_score': chunk_info.relevance_score
                })
                
                print(f"   📄 Chunk {len(reranked_chunks)}:")
                print(f"      Relevance: {chunk_info.relevance_score:.3f}")
                print(f"      Text: {chunk_data.get('chunkText', '')[:150]}...")
                print()
        
        print(f"✅ RAG completed: {event_count} events processed")
        print(f"   LLM response: {'✓' if llm_response else '✗'}")
        print(f"   Reranked chunks: {len(reranked_chunks)}")
        
        return {
            'llm_response': llm_response,
            'chunks': reranked_chunks
        }
        
    except Exception as e:
        print(f"❌ Error during RAG: {e}")
        import traceback
        traceback.print_exc()
        return None

# Test the complete RAG pipeline
if demo_space and voyage_reranker and openai_llm:
    print("Testing Complete RAG Pipeline with Reranker + LLM\n")
    
    test_query = "What is the vacation policy for employees?"
    rag_result = semantic_search_with_rag(test_query, demo_space.space_id, max_results=3)
else:
    print("⚠️  Cannot test RAG: missing space, reranker, or LLM")
    rag_result = None

Testing Complete RAG Pipeline with Reranker + LLM

🔍 RAG Query: 'What is the vacation policy for employees?'
📁 Space ID: 4b58a640-865a-414e-99ca-d96691071111
📊 Max results: 3
   📄 Chunk 1:
      Relevance: 0.863
      Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 ...

   📄 Chunk 2:
      Relevance: 0.824
      Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off with pay so that they may be free from their regula...

   📄 Chunk 3:
      Relevance: 0.770
      Text: 1.  Eligibility 

 
All regular full-time employees are eligible for vacation benefits. 

 
2.  Accrual 

 
Eligible employees accrue vacation in acco...


🤖 LLM Generated Response:
   The vacation policy for employees states that all full-time employees receive 15 days of paid vacation annually, which increases to 20 days after three years of service. Additionally,

## 🎉 Congratulations! What You Built

You've successfully built a complete **Retrieval-Augmented Generation (RAG) system** using GoodMem! Let's recap what you accomplished.

### Components You Configured

| Component | Purpose | Provider | Model |
|-----------|---------|----------|-------|
| **Embedder** | Convert text to vectors | OpenAI | text-embedding-3-small (1536d) |
| **Reranker** | Re-score search results | Voyage AI | rerank-2.5 |
| **LLM** | Generate natural language responses | OpenAI | gpt-4o-mini |

### The Complete RAG Pipeline

```
📄 Documents
   ↓ Chunking (256 chars, 25 overlap)
   ↓ Embedding (OpenAI)
🗄️  Vector Storage (GoodMem Space)
   ↓ 
🔍 User Query
   ↓ Semantic Search (retrieve top-K)
   ↓ Reranking (Voyage AI re-scores)
   ↓ Context Selection (most relevant chunks)
🤖 LLM Generation (GPT-4o-mini)
   ↓
✨ Natural Language Answer
```

### Key Concepts You Learned

1. **Embedders**: Transform text into semantic vectors for similarity search
2. **Spaces**: Logical containers for organizing and searching documents
3. **Chunking**: Breaking documents into optimal sizes for retrieval
4. **Semantic Search**: Finding conceptually similar content, not just keyword matches
5. **Reranking**: Two-stage retrieval for better precision
6. **Streaming API**: Real-time, memory-efficient result processing
7. **RAG Architecture**: Combining retrieval and generation for accurate, grounded responses

### Performance Improvements

**Basic search** (earlier in notebook):
- Fast retrieval using vector similarity
- Good recall, but may include less relevant results

**Enhanced RAG** (with reranker + LLM):
- Reranker improves precision significantly
- LLM synthesizes information from multiple chunks
- Better user experience with natural language answers
- Grounded in actual document content (no hallucinations)

### Next Steps & Advanced Topics

**Enhance Your RAG System**:
- **Multiple embedders**: Combine different embedders for better coverage
- **Custom chunking**: Tune chunk size/overlap for your content type
- **Metadata filtering**: Add filters to narrow search by document type, date, etc.
- **Hybrid search**: Combine semantic and keyword search
- **Context augmentation**: Include surrounding chunks for better LLM context

**Production Deployment**:
- **Monitoring**: Track query latency, relevance scores, user feedback
- **Scaling**: Horizontal scaling for high-traffic applications
- **Cost optimization**: Balance quality vs. API costs
- **Caching**: Cache frequent queries for faster responses

### Resources

- **Documentation**: [https://docs.goodmem.ai](https://docs.goodmem.ai)
- **API Reference**: [https://docs.goodmem.ai/docs/reference/sdk/python/](https://docs.goodmem.ai/docs/reference/sdk/python/)

---

**Great job!** You now have a solid foundation for building production RAG systems with GoodMem. 🚀