# Building a Basic RAG Agent with GoodMem

## Overview

This tutorial will guide you through building a complete **Retrieval-Augmented Generation (RAG)** system using GoodMem's vector memory capabilities. By the end of this guide, you'll have a functional Q&A system that can:

- üîç **Semantically search** through your documents
- üìù **Generate contextual answers** using retrieved information 
- üèóÔ∏è **Scale to handle** large document collections

### What is RAG?

RAG combines the power of **retrieval** (finding relevant information) with **generation** (creating natural language responses). This approach allows AI systems to provide accurate, context-aware answers by:

1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the query with this context
3. **Generating** a comprehensive answer using both the query and retrieved information

### Why GoodMem for RAG?

GoodMem provides enterprise-grade vector storage with:
- **Multiple embedder support** for optimal retrieval accuracy
- **Streaming APIs** for real-time responses
- **Advanced post-processing** with reranking and summarization
- **Scalable architecture** for production workloads


## Prerequisites

Before starting, ensure you have:

- ‚úÖ **GoodMem server running** locally or access to a remote instance
- ‚úÖ **Python 3.9+** installed on your system
- ‚úÖ **API key** for your GoodMem instance
- ‚úÖ **OpenAI API key** (for embeddings and LLM)
- ‚úÖ **Voyage AI API key** (for reranking)

## Installation & Setup

First, let's install the required packages:

In [None]:
# Install required packages
!pip install goodmem-client openai python-dotenv

## Authentication & Configuration

### Why This Matters

GoodMem uses API key authentication to secure your vector memory data. Proper configuration ensures:
- **Secure access** to your GoodMem instance
- **Isolated environments** (development, staging, production)
- **Usage tracking** and access control per API key

### What We'll Do

1. Configure the GoodMem host URL (where your server is running)
2. Set up API key authentication
3. Verify the configuration is correct

### Configuration Options

- **Local development**: `http://localhost:8080` (default)
- **Remote/production**: Your deployed GoodMem URL
- **Environment variables**: Best practice for managing credentials

Let's configure our GoodMem client and test the connection:

In [1]:
import os
import json
import time
from typing import List, Dict, Optional
from dotenv import load_dotenv

# Load environment variables (optional)
load_dotenv()

# Configuration - Update these values for your setup
GOODMEM_HOST = os.getenv('GOODMEM_HOST', 'http://localhost:8080')
GOODMEM_API_KEY = os.getenv('GOODMEM_API_KEY', 'your-api-key-here')

print(f"GoodMem Host: {GOODMEM_HOST}")
print(f"API Key configured: {'Yes' if GOODMEM_API_KEY != 'your-api-key-here' else 'No - Please update'}")

GoodMem Host: http://localhost:8080
API Key configured: Yes


Now let's test the connection to the Goodmem Server.

In [2]:
# Import GoodMem client libraries
from goodmem_client.api import SpacesApi, MemoriesApi, EmbeddersApi
from goodmem_client.configuration import Configuration
from goodmem_client.api_client import ApiClient
from goodmem_client.streaming import MemoryStreamClient
from goodmem_client.exceptions import ApiException

# Configure the API client
def create_goodmem_clients():
    """Create and configure GoodMem API clients."""
    configuration = Configuration(host=GOODMEM_HOST, 
                                  api_key={"ApiKeyAuth": GOODMEM_API_KEY})
    
    # Create API client instance
    api_client = ApiClient(configuration=configuration)
    
    # Create API instances
    spaces_api = SpacesApi(api_client=api_client)
    memories_api = MemoriesApi(api_client=api_client)
    embedders_api = EmbeddersApi(api_client=api_client)
    stream_client = MemoryStreamClient(api_client)
    
    return spaces_api, memories_api, embedders_api, stream_client, api_client

# Test connection
spaces_api, memories_api, embedders_api, stream_client, api_client = create_goodmem_clients()

# Test the connection by listing spaces
response = spaces_api.list_spaces()
print(f"‚úÖ Successfully connected to GoodMem!")
print(f"   Found {len(getattr(response, 'spaces', []))} existing spaces")

‚úÖ Successfully connected to GoodMem!
   Found 0 existing spaces


## Creating an Embedder

### Why Embedders Matter

An **embedder** is the foundation of semantic search. It converts text into high-dimensional vectors (embeddings) that capture meaning:

```
Text: "vacation policy" ‚Üí Vector: [0.23, -0.45, 0.67, ...]  (1536 dimensions)
```

These vectors enable:
- **Semantic similarity**: Find conceptually similar content, not just keyword matches
- **Context understanding**: Capture meaning beyond exact word matches
- **Efficient retrieval**: Fast vector comparisons using specialized indexes

### The RAG Pipeline Flow

```
Documents ‚Üí Embedder ‚Üí Vector Storage ‚Üí Semantic Search ‚Üí Retrieved Context
```

### Choosing an Embedder

**OpenAI `text-embedding-3-small`** (what we'll use):
- ‚úÖ **High quality**: Excellent for most use cases
- ‚úÖ **Fast**: Low latency for real-time applications  
- ‚úÖ **1536 dimensions**: Good balance of quality and storage
- ‚úÖ **Cost-effective**: $0.02 per 1M tokens

**Other options**:
- **text-embedding-3-large**: Higher quality, 3072 dimensions, more expensive
- **Voyage AI**: Specialized for search, excellent retrieval performance
- **Cohere**: Good multilingual support
- **Local models**: HuggingFace sentence transformers for privacy/offline

### What We'll Do

1. Create an OpenAI embedder with proper authentication
2. Verify the embedder is ready for use

**Note**: You'll need an OpenAI API key set in your environment variable `OPENAI_API_KEY`.

In [None]:
from goodmem_client.models import EmbedderCreationRequest, ApiKeyAuth, EndpointAuthentication

def create_openai_embedder():
    """Create an OpenAI embedder for text embedding."""
    
    # Check if OPENAI_API_KEY is set
    openai_api_key = os.getenv('OPENAI_API_KEY')
    if not openai_api_key:
        print("‚ùå OPENAI_API_KEY environment variable not set!")
        return None
    
    # Create ApiKeyAuth for OpenAI
    api_key_auth = ApiKeyAuth(
        inline_secret=openai_api_key,
        header_name="Authorization",
        prefix="Bearer "
    )
    
    # Wrap in EndpointAuthentication
    credentials = EndpointAuthentication(
        kind="CREDENTIAL_KIND_API_KEY",
        api_key=api_key_auth
    )
    
    # Create embedder request with corrected parameters
    embedder_request = EmbedderCreationRequest(
        display_name="OpenAI Text Embedding 3 Small",
        provider_type="OPENAI",
        endpoint_url="https://api.openai.com/v1",
        model_identifier="text-embedding-3-small",
        dimensionality=1536,  # INTEGER, not string
        api_path="/embeddings",
        distribution_type="DENSE",
        supported_modalities=["TEXT"],
        credentials=credentials  # EndpointAuthentication object
    )
    
    # Create the embedder
    new_embedder = embedders_api.create_embedder(embedder_request)
    return new_embedder

# Create or retrieve the OpenAI embedder
openai_embedder = create_openai_embedder()
print(f"‚úÖ Successfully created OpenAI embedder!")
print(f"   Display Name: {openai_embedder.display_name}")
print(f"   Embedder ID: {openai_embedder.embedder_id}")
print(f"   Provider: {openai_embedder.provider_type}")
print(f"   Model: {getattr(openai_embedder, 'model_identifier', 'N/A')}")
print(f"   Dimensionality: {getattr(openai_embedder, 'dimensionality', 'N/A')}")

‚úÖ Successfully created OpenAI embedder!
   Display Name: OpenAI Text Embedding 3 Small
   Embedder ID: 019b3292-6b7e-737c-bb4e-df055b642ea9
   Provider: ProviderType.OPENAI
   Model: text-embedding-3-small
   Dimensionality: 1536


## Creating Your First Space

### What is a Space?

A **Space** in GoodMem is a logical container for organizing related memories (documents). Think of it as a database or collection where you store and retrieve semantically similar content.

Each space has:
- **Associated embedders**: Which models convert text to vectors
- **Chunking configuration**: How documents are split into searchable pieces
- **Access controls**: Public or private, with permission management
- **Metadata labels**: For organization and filtering

### Use Cases for Multiple Spaces

You might create different spaces for:
- **By domain**: Technical docs, HR policies, product specs
- **By environment**: Development, staging, production
- **By customer**: Tenant-specific data in multi-tenant apps
- **By privacy level**: Public FAQ vs. internal knowledge base

### Chunking

Documents are too large to search efficiently as whole units. Chunking:
- **Improves relevance**: Match specific sections, not entire documents
- **Enables context**: Return focused chunks that answer specific questions  
- **Optimizes retrieval**: Process and compare smaller text segments

**Our chunking strategy**:
- **256 characters**: Short enough for focused context, long enough for meaning
- **25 character overlap**: Ensures concepts spanning chunk boundaries aren't lost
- **Hierarchical separators**: Split on paragraphs first, then sentences, then words

### What We'll Do

1. List available embedders
2. Create a space with our embedder and chunking configuration
3. Add metadata labels for organization
4. Verify the space is ready

Let's create a space for our RAG demo:

In [4]:
# First, let's see what embedders are available
embedders_response = embedders_api.list_embedders()
available_embedders = getattr(embedders_response, 'embedders', [])

print(f"üìã Available Embedders ({len(available_embedders)}):")
for i, embedder in enumerate(available_embedders):
    print(f"   {i+1}. {embedder.display_name} - {embedder.provider_type}")
    print(f"      Model: {getattr(embedder, 'model_identifier', 'N/A')}")
    print(f"      ID: {embedder.embedder_id}")
    print()
    
default_embedder = available_embedders[0]
print(f"üéØ Using embedder: {default_embedder.display_name}")

üìã Available Embedders (1):
   1. OpenAI Text Embedding 3 Small - ProviderType.OPENAI
      Model: text-embedding-3-small
      ID: 019b3292-6b7e-737c-bb4e-df055b642ea9

üéØ Using embedder: OpenAI Text Embedding 3 Small


Now that we have an embedder configured, let's create a space to store our documents.

In [5]:
from goodmem_client.models import SpaceCreationRequest, SpaceEmbedderConfig

# Create a space for our RAG demo
SPACE_NAME = "RAG Demo Knowledge Base"

# Define chunking configuration that we'll reuse throughout the tutorial
# Save this configuration to ensure consistency across all memory creation requests
DEMO_CHUNKING_CONFIG = {
    "recursive": {
        "chunk_size": 256,                     # 256 character chunks for optimal RAG performance
        "chunk_overlap": 25,                   # 25 character overlap between chunks
        "separators": ["\n\n", "\n", ". ", " ", ""],  # Hierarchical splitting
        "keep_strategy": "KEEP_END",           # Append separator to preceding chunk
        "separator_is_regex": False,           # Plain text separators
        "length_measurement": "CHARACTER_COUNT" # Measure by characters
    }
}

def create_demo_space():
    """Create a space for our RAG demonstration."""
    space_embedders = [
        SpaceEmbedderConfig(
            embedder_id=default_embedder.embedder_id,
            default_retrieval_weight=1.0
        )
    ]
    
    # Create space request with our saved chunking configuration
    create_request = SpaceCreationRequest(
        name=SPACE_NAME,
        labels={
            "purpose": "rag-demo",
            "environment": "tutorial", 
            "content-type": "documentation"
        },
        space_embedders=space_embedders,
        public_read=False,  # Private space
        default_chunking_config=DEMO_CHUNKING_CONFIG  # Use our saved config
    )
    
    # Create the space
    new_space = spaces_api.create_space(create_request)    
    return new_space

# Create our demo space
demo_space = create_demo_space()
print(f"‚úÖ Created space: {demo_space.name}")
print(f"   Space ID: {demo_space.space_id}")
print(f"   Embedders: {len(demo_space.space_embedders)}")
print(f"   Labels: {dict(demo_space.labels)}")

‚úÖ Created space: RAG Demo Knowledge Base
   Space ID: 019b3293-49c8-762d-a4df-799c77c0a5d4
   Embedders: 1
   Labels: {'purpose': 'rag-demo', 'environment': 'tutorial', 'content-type': 'documentation'}


Let's verify that our space was created successfully by retrieving its detailed configuration.

In [6]:
# Get detailed space information
space_details = spaces_api.get_space(demo_space.space_id)

print(f"üîç Space Configuration:")
print(f"   Name: {space_details.name}")
print(f"   Owner ID: {space_details.owner_id}")
print(f"   Public Read: {space_details.public_read}")
print(f"   Created: {space_details.created_at}")
print(f"   Labels: {dict(space_details.labels)}")

print(f"\nü§ñ Associated Embedders:")
for embedder_assoc in space_details.space_embedders:
    print(f"   Embedder ID: {embedder_assoc.embedder_id}")
    print(f"   Retrieval Weight: {embedder_assoc.default_retrieval_weight}")

üîç Space Configuration:
   Name: RAG Demo Knowledge Base
   Owner ID: cf5df949-31c6-4c54-af50-f8002107164e
   Public Read: False
   Created: 1766080072137
   Labels: {'purpose': 'rag-demo', 'environment': 'tutorial', 'content-type': 'documentation'}

ü§ñ Associated Embedders:
   Embedder ID: 019b3292-6b7e-737c-bb4e-df055b642ea9
   Retrieval Weight: 1.0


## Adding Documents to Memory

### The Document Processing Pipeline

When you add a document to GoodMem, it goes through several automated steps:

```
1. Ingestion ‚Üí 2. Chunking ‚Üí 3. Embedding ‚Üí 4. Indexing ‚Üí 5. Ready for Search
```

**What happens**:
1. **Ingestion**: Document content and metadata are stored
2. **Chunking**: Text is split according to your configuration (256 chars, 25 overlap)
3. **Embedding**: Each chunk is converted to a vector by your embedder
4. **Indexing**: Vectors are indexed for fast similarity search
5. **Status**: Document marked as `COMPLETED` and ready for retrieval

### Single vs. Batch Operations

**Single memory creation** (`CreateMemory`):
- ‚úÖ Good for: Real-time ingestion, single documents
- ‚úÖ Synchronous processing with immediate status
- ‚ö†Ô∏è Higher overhead for bulk operations

**Batch memory creation** (`BatchCreateMemory`):
- ‚úÖ Good for: Bulk imports, initial setup, periodic updates
- ‚úÖ Lower overhead, efficient for multiple documents
- ‚úÖ Async processing - check status via `ListMemories`
- ‚ö†Ô∏è Takes longer to get individual status feedback

### Metadata Best Practices

Rich metadata helps with:
- **Filtering**: Retrieve specific document types
- **Source attribution**: Show users where information came from
- **Organization**: Group and manage related documents
- **Debugging**: Track ingestion methods and dates

### What We'll Do

0. [Sample Document Link](https://github.com/PAIR-Systems-Inc/goodmem-samples/tree/main/cookbook/1_Building_a_basic_RAG_Agent_with_GoodMem/sample_documents)
1. Load sample documents from local files
2. Create one document using single memory creation (to demo the API)
3. Create remaining documents using batch operation (more efficient)
4. Monitor processing status until all documents are ready

We'll use sample company documents that represent common business use cases:

In [7]:
import os
import base64

# Load our sample documents
def load_sample_documents(sample_dir: str) -> List[Dict]:
    """Load sample documents from the sample_documents directory.
    
    Automatically discovers all files in the directory and handles:
    - .txt files: Read as plain text
    - .pdf files: Read as binary and base64 encode
    """
    documents = []
    
    # Auto-discover all files in the directory
    if not os.path.exists(sample_dir):
        print(f"‚ö†Ô∏è  Directory not found: {sample_dir}")
        return documents
    
    files = sorted(os.listdir(sample_dir))
    
    for filename in sorted(files):  # Sort for consistent ordering
        filepath = os.path.join(sample_dir, filename)
        
        # Skip directories
        if not os.path.isfile(filepath):
            continue
        
        # Determine file type by extension
        file_ext = os.path.splitext(filename)[1].lower()
        
        if file_ext == '.txt':
            # Handle text files
            with open(filepath, 'r', encoding='utf-8') as f:
                content = f.read()
            
            documents.append({
                'filename': filename,
                'content': content,
                'content_type': 'text/plain',
                'is_binary': False
            })
            print(f"üìÑ Loaded: {filename} ({len(content):,} characters)")
        
        elif file_ext == '.pdf':
            # Handle PDF files
            with open(filepath, 'rb') as f:
                binary_content = f.read()
            
            # Base64 encode the binary content
            content_b64 = base64.b64encode(binary_content).decode('utf-8')
            
            documents.append({
                'filename': filename,
                'content_b64': content_b64,
                'content_type': 'application/pdf',
                'is_binary': True
            })
            print(f"üìÑ Loaded: {filename} ({len(binary_content):,} bytes, base64: {len(content_b64):,} chars)")
        else:
            print(f"‚ö†Ô∏è  Skipping unsupported file type: {filename}")
    
    return documents

# Load the documents
sample_docs = load_sample_documents("sample_documents")
print(f"\nüìö Total documents loaded: {len(sample_docs)}")

üìÑ Loaded: company_handbook.txt (2,342 characters)
üìÑ Loaded: employee_handbook.pdf (399,615 bytes, base64: 532,820 chars)
üìÑ Loaded: product_faq.txt (4,043 characters)
üìÑ Loaded: security_policy.txt (4,211 characters)
üìÑ Loaded: technical_documentation.txt (2,384 characters)

üìö Total documents loaded: 5


Now that we have documents loaded, let's create memories from them. We'll start by creating one memory individually to demonstrate the API.

In [None]:
# Create the first memory individually to demonstrate single memory creation
from goodmem_client.models import MemoryCreationRequest

# Function to create a memory request
def build_memory_request(space_id: str, document: dict) -> MemoryCreationRequest:
    """Create a MemoryCreationRequest based on the document type."""
    if document.get('is_binary', False):
        # For binary files (PDF), use original_content_b64
        memory_request = MemoryCreationRequest(
            space_id=space_id,
            original_content_b64=document['content_b64'],  # Base64 encoded
            content_type=document['content_type'],          # application/pdf
            metadata={
                "filename": document['filename'],
                "source": "sample_documents",
            },
            chunkingConfig=DEMO_CHUNKING_CONFIG
        )
    else:
        # For text files, use original_content
        memory_request = MemoryCreationRequest(
            space_id=space_id,
            original_content=document['content'],          # Plain text
            content_type=document['content_type'],         # text/plain
            metadata={
                "filename": document['filename'],
                "source": "sample_documents",
            },
            chunkingConfig=DEMO_CHUNKING_CONFIG
        )
    return memory_request

first_doc = sample_docs[0]
single_memory = memories_api.create_memory(
    build_memory_request(demo_space.space_id, first_doc)
)
print(f"üìù Creating first document using CreateMemory API:")
print(f"   Document: {first_doc['filename']}")
print(f"   Content Type: {first_doc['content_type']}")
print(f"   Method: Individual memory creation")

üìù Creating first document using CreateMemory API:
   Document: company_handbook.txt
   Content Type: text/plain
   Method: Individual memory creation


Let's verify the memory was created successfully by retrieving it.

In [9]:
# Demonstrate retrieving a memory by ID using get_memory
retrieved_memory = memories_api.get_memory(
    id=single_memory.memory_id,
    include_content=True
)

print(f"\n‚úÖ Successfully retrieved memory:")
print(f"   Memory ID: {retrieved_memory.memory_id}")
print(f"   Space ID: {retrieved_memory.space_id}")
print(f"   Status: {retrieved_memory.processing_status}")
print(f"   Content Type: {retrieved_memory.content_type}")
print(f"   Created At: {retrieved_memory.created_at}")
print(f"   Updated At: {retrieved_memory.updated_at}")

if retrieved_memory.metadata:
    print(f"\n   üìã Metadata:")
    for key, value in retrieved_memory.metadata.items():
        print(f"      {key}: {value}")

if retrieved_memory.original_content:
    # Decode the base64 encoded content
    decoded_content = base64.b64decode(retrieved_memory.original_content).decode('utf-8')
    print(f"\n‚úÖ Content retrieved and decoded:")
    print(f"   Content length: {len(decoded_content)} characters")
    print(f"   First 200 chars: {decoded_content[:200]}...")


‚úÖ Successfully retrieved memory:
   Memory ID: 019b3295-5a9d-703d-9954-43e6153ff9c9
   Space ID: 019b3293-49c8-762d-a4df-799c77c0a5d4
   Status: COMPLETED
   Content Type: text/plain
   Created At: 1766080207518
   Updated At: 1766080211751

   üìã Metadata:
      source: sample_documents
      filename: company_handbook.txt
      ingestion_method: single

‚úÖ Content retrieved and decoded:
   Content length: 2342 characters
   First 200 chars: ACME Corporation Employee Handbook

Welcome to ACME Corporation! This handbook provides essential information about our company policies, procedures, and culture.

COMPANY OVERVIEW
ACME Corporation is...


For the remaining documents, we'll use batch creation which is more efficient for multiple documents.

In [10]:
# Create the remaining documents using batch memory creation
from goodmem_client.models import BatchMemoryCreationRequest

def create_batch_memories(space_id: str, documents: List[dict]) -> List[dict]:
    """Create multiple memories in GoodMem using batch creation for efficiency."""
    
    # Prepare batch memory requests using our saved chunking configuration
    memory_requests = []
    for i, doc in enumerate(documents):
        memory_requests.append(build_memory_request(space_id, doc))
    
    # Create batch request
    batch_request = BatchMemoryCreationRequest(
        requests=memory_requests
    )
    
    print(f"üì¶ Creating {len(memory_requests)} memories using BatchCreateMemory API:")
    # Execute batch creation - this returns None on success
    results = memories_api.batch_create_memory(batch_request).results
    return [item.memory for item in results]

# Create the remaining documents (skip the first one we already created)
remaining_docs = sample_docs[1:]  # All documents except the first
created_memories = create_batch_memories(demo_space.space_id, remaining_docs)

print(f"\nüìã Total Memory Creation Summary:")
print(f"   üìÑ Single CreateMemory: 1 document")
print(f"   üì¶ Batch CreateMemory: {len(remaining_docs)} documents submitted")
print(f"   ‚è≥ Check processing status in the next cell")

üì¶ Creating 4 memories using BatchCreateMemory API:

üìã Total Memory Creation Summary:
   üìÑ Single CreateMemory: 1 document
   üì¶ Batch CreateMemory: 4 documents submitted
   ‚è≥ Check processing status in the next cell


Let's check the status of all the memories we created to see if they're ready for searching.

In [11]:
# Get all memories we just ingested to verify they're ready
from goodmem_client.models import BatchMemoryRetrievalRequest

def batch_retrieve_memories(memory_ids: List[str]) -> List[dict]:
    """Retrieve multiple memories by their IDs using batch retrieval."""
    batch_request = BatchMemoryRetrievalRequest(
        memory_ids=memory_ids,
        include_content=False  # We don't need content for status check
    )
    results = memories_api.batch_get_memory(batch_request).results
    return [item.memory for item in results]

memory_ids = [single_memory.memory_id] + [mem.memory_id for mem in created_memories]
memories = batch_retrieve_memories(memory_ids)
for i, memory in enumerate(memories, 1):
    metadata = memory.metadata or {}
    filename = metadata.get('filename', 'Unknown')
    description = metadata.get('description', 'No description')
    
    print(f"   {i}. {filename}")
    print(f"      Status: {memory.processing_status}")
    print(f"      Description: {description}")
    print(f"      Created: {memory.created_at}\n")

   1. company_handbook.txt
      Status: COMPLETED
      Description: No description
      Created: 1766080207518

   2. employee_handbook.pdf
      Status: COMPLETED
      Description: No description
      Created: 1766080248638

   3. product_faq.txt
      Status: COMPLETED
      Description: No description
      Created: 1766080248638

   4. security_policy.txt
      Status: COMPLETED
      Description: No description
      Created: 1766080248638

   5. technical_documentation.txt
      Status: COMPLETED
      Description: No description
      Created: 1766080248638



Since batch memories are processed asynchronously, let's monitor their processing status until they're all ready.

In [12]:
# Monitor processing status for all created memories
def wait_for_processing_completion(memory_ids: List[str], max_wait_seconds: int = 120):
    """Wait for memories to finish processing."""
    start_time = time.time()
    while time.time() - start_time < max_wait_seconds:
        # List memories in our space
        memories = batch_retrieve_memories(memory_ids)
        
        # Check processing status
        status_counts = {}
        for memory in memories:
            status = memory.processing_status
            status_counts[status] = status_counts.get(status, 0) + 1
        
        print(f"üìä Processing status: {dict(status_counts)} (Total: {len(memories)} memories)")
        
        # Check if all are completed
        if all(memory.processing_status == 'COMPLETED' for memory in memories):
            print("‚úÖ All documents processed successfully!")
            return True
            
        # Check for any failures
        failed_count = status_counts.get('FAILED', 0)
        if failed_count > 0:
            print(f"‚ùå {failed_count} memories failed processing")
            return False
        
        time.sleep(5)  # Wait 5 seconds before checking again
    
    print(f"‚è∞ Timeout waiting for processing (waited {max_wait_seconds}s)")
    return False

# Wait for processing to complete for all memories (single + batch)
# Since batch_create_memory returns None, we monitor by listing all memories
print("‚è≥ Waiting for document processing to complete...")
print("üí° Note: Batch memories are processed asynchronously\n")
processing_complete = wait_for_processing_completion(memory_ids)

if processing_complete:
    print("üéâ Ready for semantic search and retrieval!")
else:
    print("‚ö†Ô∏è  Some documents may still be processing. You can continue with the tutorial.")

‚è≥ Waiting for document processing to complete...
üí° Note: Batch memories are processed asynchronously

üìä Processing status: {'COMPLETED': 5} (Total: 5 memories)
‚úÖ All documents processed successfully!
üéâ Ready for semantic search and retrieval!


## Semantic Search & Retrieval

### Why Semantic Search?

**Traditional keyword search**:
- Matches exact words or simple variations
- Misses conceptually similar content with different wording
- Example: "vacation days" won't match "time off policy"

**Semantic search**:
- Understands meaning and context
- Finds conceptually similar content regardless of exact wording
- Example: "vacation days" successfully matches "time off policy"

### How It Works

```
Query: "vacation policy" 
   ‚Üì (embed with same embedder)
Query Vector: [0.23, -0.45, ...]
   ‚Üì (compare to all chunk vectors)
Most Similar Chunks: (by cosine similarity)
   1. "TIME OFF POLICY..." (score: -0.604)
   2. "Vacation requests..." (score: -0.544)
   3. "WORK HOURS..." (score: -0.458)
```

### Understanding Relevance Scores

GoodMem uses **cosine distance** (negative cosine similarity):
- **Lower values = more relevant** (e.g., -0.6 is better than -0.4)
- **Range**: Typically -1.0 (most similar) to 0.0 (unrelated)
- **Good threshold**: Results under -0.3 are usually relevant
- **Context matters**: Exact scores vary by embedder and content

### Streaming API Benefits

GoodMem's streaming API:
- **Real-time results**: Process chunks as they arrive
- **Low latency**: Start showing results immediately
- **Memory efficient**: No need to buffer entire result set
- **Progressive UI**: Update interface as more results come in

### What We'll Do

1. Implement a semantic search function using GoodMem's streaming API
2. Process different event types (chunks, memories, metadata)
3. Display results with relevance scores
4. Test with various queries to see semantic matching in action

Now comes the exciting part! Let's perform semantic search using GoodMem's streaming API. This will:

- **Find relevant chunks** based on semantic similarity
- **Stream results** in real-time
- **Include relevance scores** for ranking
- **Return structured data** for easy processing

In [13]:
def semantic_search(query: str, space_id: str, max_results: int = 5) -> List[dict]:
    """
    Perform semantic search using GoodMem's streaming API.
    
    Args:
        query: The search query
        space_id: ID of the space to search
        max_results: Maximum number of results to return
    
    Returns:
        List of search results with chunks and metadata
    """
    
    print(f"üîç Searching for: '{query}'")
    print(f"üìÅ Space ID: {space_id}")
    print(f"üìä Max results: {max_results}")
    print("-" * 50)
    
    # Perform streaming search
    event_count = 0
    retrieved_chunks = []
    
    for event in stream_client.retrieve_memory_stream(
        message=query,
        space_ids=[space_id],
        requested_size=max_results,
        fetch_memory=True,
        fetch_memory_content=False,  # We don't need full content for this demo
        format="ndjson"
    ):
        event_count += 1
        
        if event.retrieved_item and event.retrieved_item.chunk:
            chunk_info = event.retrieved_item.chunk
            chunk_data = chunk_info.chunk
            
            retrieved_chunks.append({
                'chunk_text': chunk_data.get('chunkText', ''),
                'relevance_score': chunk_info.relevance_score,
                'memory_index': chunk_info.memory_index,
                'result_set_id': chunk_info.result_set_id,
                'chunk_sequence': chunk_data.get('chunkSequenceNumber', 0)
            })
            
            print(f"üìÑ Chunk {len(retrieved_chunks)}:")
            print(f"   Relevance: {chunk_info.relevance_score:.3f}")
            print(f"   Text: {chunk_data.get('chunkText', '')}...")
            print()
    
    print(f"‚úÖ Search completed: {len(retrieved_chunks)} chunks found, {event_count} events processed")
    return retrieved_chunks

# Test semantic search with a sample query
sample_query = "What is the vacation policy for employees?"
search_results = semantic_search(sample_query, demo_space.space_id)

üîç Searching for: 'What is the vacation policy for employees?'
üìÅ Space ID: 019b3293-49c8-762d-a4df-799c77c0a5d4
üìä Max results: 5
--------------------------------------------------
üìÑ Chunk 1:
   Relevance: -0.680
   Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 company holidays
- Personal days as needed with manager approval...

üìÑ Chunk 2:
   Relevance: -0.675
   Text: 1.  Eligibility 

 
All regular full-time employees are eligible for vacation benefits. 

 
2.  Accrual 

 
Eligible employees accrue vacation in accordance with the following scheduleix: 

 
Years of Continuous Service Rate of Accrual 

Date of hire through end of year 
5...

üìÑ Chunk 3:
   Relevance: -0.662
   Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off with pay so that they may be free from their regular duties for a period of rest and

Let's test our semantic search function with various queries to see how it finds relevant information.

In [14]:
# Let's try a few different queries to see how semantic search works
def test_multiple_queries(space_id: str):
    """Test semantic search with different types of queries."""
    
    test_queries = [
        "How do I reset my password?",
        "What are the security requirements for remote work?", 
        "API authentication and rate limits",
        "Employee benefits and health insurance",
        "How much does the software cost?"
    ]
    
    for i, query in enumerate(test_queries, 1):
        print(f"\nüîç Test Query {i}: {query}")
        print("=" * 60)
        
        semantic_search(query, space_id, max_results=3)
        
        print("\n" + "-" * 60)

test_multiple_queries(demo_space.space_id)


üîç Test Query 1: How do I reset my password?
üîç Searching for: 'How do I reset my password?'
üìÅ Space ID: 019b3293-49c8-762d-a4df-799c77c0a5d4
üìä Max results: 3
--------------------------------------------------
üìÑ Chunk 1:
   Relevance: -0.370
   Text: password they use to gain access to computers or the Internet, as well as any change to 
such password.  Such notice must be made immediately. 

 
4. Compliance...

üìÑ Chunk 2:
   Relevance: -0.363
   Text: - No reuse of last 12 passwords
- Must be changed every 90 days for privileged accounts
- Multi-factor authentication required for all business systems
- Password managers recommended for personal password storage

ACCEPTABLE USE POLICY...

üìÑ Chunk 3:
   Relevance: -0.306
   Text: Each classification level has specific handling, storage, and transmission requirements outlined in our data handling procedures.

PASSWORD POLICY
Strong passwords are essential for system security:
- Minimum 12 characters with mix of letter

## Troubleshooting Search Results

**Empty or weak results?** Try these:

- **Increase `max_results` or `maxResults`** ‚Üí More candidates to find relevant matches
- **Adjust chunking** ‚Üí Larger chunks (512) for context, smaller (128) for precision  
- **Check embedder** ‚Üí Verify API credentials and model configuration
- **Verify processing** ‚Üí Ensure all memories show `COMPLETED` status
- **Refine query** ‚Üí Be more specific with natural language

## Advanced Features

Congratulations! üéâ You've successfully built a semantic search system using GoodMem. Here's what you've accomplished:

### ‚úÖ What You Built
- **Document ingestion pipeline** with automatic chunking and embedding
- **Semantic search system** with relevance scoring
- **Simple Q&A system** using GoodMem's vector capabilities

### üöÄ Next Steps for Advanced Implementation

#### Reranking
Improve search quality by adding a reranking stage. **Rerankers** are specialized models that re-score search results to improve relevance:

- **Two-stage retrieval**: Fast initial retrieval with embeddings, then precise reranking
- **Better relevance**: Rerankers use cross-attention to understand query-document relationships
- **Reduced costs**: Rerank only top-K results instead of entire corpus
- **Voyage AI reranker**: Industry-leading reranking model with state-of-the-art performance

The combination of fast embedding-based retrieval followed by accurate reranking provides the best balance of speed and quality for production RAG systems.

## Configuring a Reranker

To further improve search quality, we can add a **reranker** to our RAG pipeline. While embedders provide fast semantic search, rerankers use more sophisticated models to re-score the top results for better accuracy.

### Why Use Reranking?

1. **Higher Accuracy**: Rerankers use cross-encoder architectures that directly compare queries and documents
2. **Two-Stage Pipeline**: Fast retrieval with embeddings + precise reranking = optimal performance
3. **Cost Effective**: Only rerank top-K results (e.g., top 20) rather than entire corpus

### Voyage AI Reranker

We'll use Voyage AI's `rerank-2.5` model, which provides:
- **State-of-the-art performance** on reranking benchmarks
- **Fast inference** optimized for production use
- **Simple API** that integrates seamlessly with GoodMem

**Note**: You'll need a Voyage AI API key set in your environment variable `VOYAGE_API_KEY`.

In [None]:
from goodmem_client.api import RerankersApi
from goodmem_client.models import RerankerCreationRequest, ApiKeyAuth, EndpointAuthentication

def create_voyage_reranker():
    """Create a Voyage AI reranker for improving search results."""
    
    # Check if VOYAGE_API_KEY is set
    voyage_api_key = os.getenv('VOYAGE_API_KEY')
    if not voyage_api_key:
        print("‚ùå VOYAGE_API_KEY environment variable not set!")
        return None
    
    # Create RerankersApi instance
    rerankers_api = RerankersApi(api_client=api_client)
    
    # Create ApiKeyAuth for Voyage
    api_key_auth = ApiKeyAuth(
        inline_secret=voyage_api_key,
        header_name="Authorization",
        prefix="Bearer "
    )
    
    # Wrap in EndpointAuthentication
    credentials = EndpointAuthentication(
        kind="CREDENTIAL_KIND_API_KEY",
        api_key=api_key_auth
    )
    
    # Create reranker request
    reranker_request = RerankerCreationRequest(
        display_name="Voyage Rerank 2.5",
        provider_type="VOYAGE",
        endpoint_url="https://api.voyageai.com",
        model_identifier="rerank-2.5",
        api_path="/v1/rerank",
        supported_modalities=["TEXT"],
        credentials=credentials,
        description="Voyage AI reranker for improving search result relevance"
    )
    
    # Create the reranker
    new_reranker = rerankers_api.create_reranker(reranker_request)    
    return new_reranker

# Create or retrieve the Voyage reranker
voyage_reranker = create_voyage_reranker()
print(f"‚úÖ Successfully created Voyage reranker!")
print(f"   Display Name: {voyage_reranker.display_name}")
print(f"   Reranker ID: {voyage_reranker.reranker_id}")
print(f"   Provider: {voyage_reranker.provider_type}")
print(f"   Model: {getattr(voyage_reranker, 'model_identifier', 'N/A')}")

‚úÖ Successfully created Voyage reranker!
   Display Name: Voyage Rerank 2.5
   Reranker ID: 019b3297-8264-7418-aafc-3fb7ce17a64e
   Provider: ProviderType.VOYAGE
   Model: rerank-2.5


## Registering an LLM

The final component in our RAG pipeline is the **LLM (Large Language Model)** - the generation component that creates natural language responses using the retrieved and reranked context.

### Role of LLMs in RAG

After retrieving and reranking relevant chunks, the LLM:
1. **Receives the query** and retrieved context
2. **Generates a response** that synthesizes information from multiple sources
3. **Maintains coherence** while staying grounded in the retrieved facts

### OpenAI GPT-4o-mini

We'll use OpenAI's `gpt-4o-mini` model, which provides:
- **Fast inference** with low latency for real-time applications
- **Cost-effective** pricing compared to larger models
- **High quality** responses suitable for most RAG use cases
- **Function calling** support for advanced workflows

**Note**: This uses the same `OPENAI_API_KEY` environment variable as the embedder.

In [None]:
from goodmem_client.api import LLMsApi
from goodmem_client.models import LLMCreationRequest, LLMCapabilities, ApiKeyAuth, EndpointAuthentication

def create_openai_llm():
    """Register OpenAI GPT-4o-mini LLM with GoodMem."""
    
    # Check if OPENAI_API_KEY is set
    openai_api_key = os.getenv('OPENAI_API_KEY')
    if not openai_api_key:
        print("‚ùå OPENAI_API_KEY environment variable not set!")
        return None
    
    # Create LLMsApi instance
    llms_api = LLMsApi(api_client=api_client)
    
    # Create ApiKeyAuth for OpenAI
    api_key_auth = ApiKeyAuth(
        inline_secret=openai_api_key,
        header_name="Authorization",
        prefix="Bearer "
    )
    
    # Wrap in EndpointAuthentication
    credentials = EndpointAuthentication(
        kind="CREDENTIAL_KIND_API_KEY",
        api_key=api_key_auth
    )
    
    # Define LLM capabilities
    capabilities = LLMCapabilities(
        supports_chat=True,
        supports_completion=False,
        supports_function_calling=True,
        supports_system_messages=True,
        supports_streaming=True,
        supports_sampling_parameters=True
    )
    
    # Create LLM request
    llm_request = LLMCreationRequest(
        display_name="OpenAI GPT-4o Mini",
        provider_type="OPENAI",
        endpoint_url="https://api.openai.com/v1",
        model_identifier="gpt-4o-mini",
        api_path="/chat/completions",
        supported_modalities=["TEXT"],
        credentials=credentials,
        capabilities=capabilities,
        description="OpenAI's GPT-4o Mini model for fast and efficient text generation"
    )
    
    # Register the LLM
    response = llms_api.create_llm(llm_request)
    
    # The response has an 'llm' attribute which contains the LLMResponse
    new_llm = response.llm    
    return new_llm

# Register or retrieve the OpenAI LLM
openai_llm = create_openai_llm()
print(f"‚úÖ Successfully registered OpenAI GPT-4o-mini LLM!")
print(f"   Display Name: {openai_llm.display_name}")
print(f"   LLM ID: {openai_llm.llm_id}")
print(f"   Provider: {openai_llm.provider_type}")
print(f"   Model: {openai_llm.model_identifier}")

‚úÖ Successfully registered OpenAI GPT-4o-mini LLM!
   Display Name: OpenAI GPT-4o Mini
   LLM ID: 019b3297-d2c8-772e-9c37-f3815d3c0097
   Provider: LLMProviderType.OPENAI
   Model: gpt-4o-mini


## Enhanced RAG with Reranking and LLM Generation

Now that we have all the components configured (embedder, reranker, and LLM), let's use the complete RAG pipeline! This demonstrates the full power of GoodMem:

1. **Retrieval**: Fast semantic search finds relevant chunks
2. **Reranking**: Voyage AI reranker
 re-scores results for better relevance  
3. **Generation**: OpenAI GPT-4o-mini generates a coherent response using the reranked context

This provides significantly better answer quality compared to simple retrieval alone.

In [17]:
def semantic_search_with_rag(
        query: str, 
        space_id: str, 
        first_stage_size: int = 5,
        max_results: int = 3, 
        reranker_id: Optional[str] = None, 
        llm_id: Optional[str] = None, 
        verbose: bool = False) -> Dict:
    """
    Perform semantic search with reranking and LLM generation.
    
    This demonstrates the complete RAG pipeline:
    1. Retrieval - Find relevant chunks using semantic search
    2. Reranking - Re-score results with Voyage AI reranker
    3. Generation - Generate answer with OpenAI GPT-4o-mini
    
    Args:
        query: The search query
        space_id: ID of the space to search
        first_stage_size: The number of results to retrieve before postprocessing
        max_results: Maximum number of results to return
    
    Returns:
        Dict containing the LLM response and reranked chunks
    """
    
    if verbose:
        print(f"üîç RAG Query: '{query}'")
        print(f"üìÅ Space ID: {space_id}")
        print(f"üìä Max results: {max_results}")
        print("=" * 70)
    
    event_count = 0
    llm_response = None
    reranked_chunks = []
    
    # Use retrieve_memory_stream with post-processor for RAG
    for event in stream_client.retrieve_memory_stream(
        message=query,
        space_ids=[space_id],
        requested_size=first_stage_size,
        fetch_memory=True,
        fetch_memory_content=False,
        post_processor_name="com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
        post_processor_config={
            "llm_id": llm_id,
            "reranker_id": reranker_id,
            "relevance_threshold": 0.3,
            "max_results": max_results
        },
        format="ndjson"
    ):
        event_count += 1
        
        # Handle LLM-generated response
        if event.abstract_reply and not llm_response:
            llm_response = event.abstract_reply.text
            if verbose:
                print(f"\nü§ñ LLM Generated Response:")
                print(f"   {llm_response}")
                print()
                print("-" * 70)
        
        # Handle reranked chunks
        if event.retrieved_item and event.retrieved_item.chunk:
            chunk_info = event.retrieved_item.chunk
            chunk_data = chunk_info.chunk
            
            reranked_chunks.append({
                'chunk_text': chunk_data.get('chunkText', ''),
                'relevance_score': chunk_info.relevance_score
            })
            
            if verbose:
                print(f"   üìÑ Chunk {len(reranked_chunks)}:")
                print(f"      Relevance: {chunk_info.relevance_score:.3f}")
                print(f"      Text: {chunk_data.get('chunkText', '')[:150]}...")
                print()
    
    if verbose:
        print(f"‚úÖ RAG completed: {event_count} events processed")
        print(f"   LLM response: {'‚úì' if llm_response else '‚úó'}")
        print(f"   Reranked chunks: {len(reranked_chunks)}")
    
    return {
        'llm_response': llm_response,
        'chunks': reranked_chunks
    }

# Test the complete RAG pipeline
test_query = "What is the vacation policy for employees?"
rag_result = semantic_search_with_rag(
    test_query, 
    demo_space.space_id, 
    first_stage_size=10,
    max_results=3,
    reranker_id=voyage_reranker.reranker_id, 
    llm_id=openai_llm.llm_id,
    verbose=True
)

üîç RAG Query: 'What is the vacation policy for employees?'
üìÅ Space ID: 019b3293-49c8-762d-a4df-799c77c0a5d4
üìä Max results: 3
   üìÑ Chunk 1:
      Relevance: 0.863
      Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 ...

   üìÑ Chunk 2:
      Relevance: 0.824
      Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off with pay so that they may be free from their regula...

   üìÑ Chunk 3:
      Relevance: 0.777
      Text: employees can use paid vacation time in minimum increments of one day.xii 

 
Accumulating Vacation: Employees are encouraged to use available paid va...


ü§ñ LLM Generated Response:
   The vacation policy for employees includes 15 days of paid vacation annually, which increases to 20 days after three years of employment. Additionally, employees receive 10 sick days per year, 8 company holidays, 

### Understanding Two-Stage Retrieval
When using the complete RAG pipeline with reranking, two parameters control the retrieval process:

**`first_stage_size` or `firstStageSize` or `requestedSize`** (Initial Retrieval)
- Number of chunks retrieved from semantic search (before reranking)
- Default: 20-50 chunks
- Higher values ‚Üí Better recall, but slower and more expensive reranking
- Think of it as casting a wide net

**`max_results` or `maxResults`** (Final Results)
- Number of top chunks to return after reranking
- Default: 3-5 chunks
- These chunks are sent to the LLM as context
- Think of it as keeping only the best catches

**Two-Stage Pipeline**:
```
Query ‚Üí Semantic Search ‚Üí Reranker ‚Üí Top Results ‚Üí LLM
```

**Best Practices**:
- Increase first stage size if missing relevant content
- Adjust max results based on LLM context window and cost

## LangChain / LlamaIndex Integration (Python)

### Building Agentic RAG

Now that we have a complete RAG pipeline, let's integrate it with **LangChain / LlamaIndex** to build an **agentic RAG system**. This allows an LLM agent to:

- **Decide when to search**: The agent determines if retrieval is needed
- **Use tools autonomously**: RAG becomes a tool the agent can call
- **Handle complex queries**: Multi-step reasoning with retrieval
- **Chain operations**: Combine retrieval with other capabilities

### What We'll Build

1. Wrap our `semantic_search_with_rag` function as a tool
2. Create an agent with access to the retrieval tool
3. Demonstrate the agent using retrieval to answer queries
4. Show multi-step reasoning with streaming responses

**LangChain**

In [None]:
# Install LangChain and required packages
!pip install langchain langchain-openai

Now let's wrap our RAG functionality as a tool so an agent can use it.

In [None]:
from langchain_core.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information from the company knowledge base to help answer a query.
    
    This tool searches through company documents including:
    - Employee handbooks and policies
    - Product documentation and FAQs
    - Security policies and procedures
    - Technical documentation
    
    Use this tool when you need specific information about company policies,
    products, or procedures to answer user questions accurately.
    
    Args:
        query: The search query or question to find relevant information for
    
    Returns:
        A formatted string containing relevant context from the knowledge base,
        along with the raw retrieved chunks as artifact data.
    """
    # Call our RAG pipeline
    result = semantic_search_with_rag(
        query=query,
        space_id=demo_space.space_id,
        first_stage_size=10,
        max_results=3,
        reranker_id=voyage_reranker.reranker_id,
        llm_id=None,
        verbose=False
    )
    
    serialized = "\n\n".join(
        f"Source Chunk {i+1} (Relevance: {chunk['relevance_score']:.3f}):\n{chunk['chunk_text']}"
        for i, chunk in enumerate(result['chunks'])
    )
    return serialized, result['chunks']

Now let's create a agent that can use this retrieval tool.

In [20]:
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent

# Create the LLM for the agent (using the same OpenAI model)
model = ChatOpenAI(model="gpt-4.1-mini", temperature=0)

# Define the tools available to the agent
tools = [retrieve_context]

# Create custom instructions for the agent
system_prompt = (
    "You are a helpful assistant with access to a company knowledge base. "
    "Use the retrieve_context tool to search for information when answering "
    "questions about company policies, products, procedures, or technical details. "
    "Always cite the source information when using retrieved context."
)

# Create the agent using LangChain's create_agent
agent = create_agent(model, tools, system_prompt=system_prompt)

Let's test the agent with a query to see it use the retrieval tool.

In [22]:
# Test the agent with a simple query
query = "What are the security requirements for remote work?"
# Stream the agent's response
for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()

print("\n" + "=" * 70)
print("‚úÖ Agent completed")


What are the security requirements for remote work?
Tool Calls:
  retrieve_context (call_FIElmZbLtnUsMKdGSFfKineN)
 Call ID: call_FIElmZbLtnUsMKdGSFfKineN
  Args:
    query: security requirements for remote work
Name: retrieve_context

Source Chunk 1 (Relevance: 0.867):
- Report suspicious emails or security incidents immediately

REMOTE WORK SECURITY
Remote employees must follow additional security measures:
- Use company-approved VPN for all work connections
- Ensure home WiFi networks use WPA3 encryption

Source Chunk 2 (Relevance: 0.672):
- Keep work devices physically secure and locked when unattended
- Use only approved cloud storage services for company data
- Install automatic security updates on all devices

INCIDENT RESPONSE
Security incidents must be reported immediately:

Source Chunk 3 (Relevance: 0.586):
- No reuse of last 12 passwords
- Must be changed every 90 days for privileged accounts
- Multi-factor authentication required for all business systems
- Password manage

**LlamaIndex**

In [None]:
# Install LlamaIndex and required packages
!pip install llama-index llama-index-llms-openai

Now let's wrap our RAG functionality as a tool so an agent can use it.

In [None]:
def retrieve_company_knowledge(query: str) -> str:
    """Retrieve information from the company knowledge base.
    
    This tool searches through company documents including employee handbooks,
    product documentation, security policies, and technical documentation.
    Use this when you need specific information about company policies,
    products, procedures, or technical details.
    
    Args:
        query: The search query or question to find relevant information for
    
    Returns:
        Formatted string containing relevant context and an AI-generated answer
    """
    # Call our RAG pipeline
    result = semantic_search_with_rag(
        query=query,
        space_id=demo_space.space_id,
        first_stage_size=10,
        max_results=3,
        reranker_id=voyage_reranker.reranker_id,
        llm_id=None,
        verbose=False
    )
    
    response_parts = []       
    # Add source chunks
    response_parts.append("\nSource context:")
    for i, chunk in enumerate(result['chunks']):
        response_parts.append(
            f"\nChunk {i+1} (Relevance: {chunk['relevance_score']:.3f}):\n{chunk['chunk_text'][:200]}..."
        )
    return "\n".join(response_parts)


Now let's create a agent that can use this retrieval tool.

In [25]:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

# Create the LLM for the agent
llm = OpenAI(model="gpt-4.1-mini", temperature=0)

# Create the agent with our retrieval tool
llamaindex_agent = FunctionAgent(
    tools=[retrieve_company_knowledge],
    llm=llm,
    system_prompt=(
        "You are a helpful assistant with access to a company knowledge base. "
        "Use the retrieve_company_knowledge tool to search for information when answering "
        "questions about company policies, products, procedures, or technical details. "
        "Always provide clear, accurate answers based on the retrieved information."
    )
)

Let's test the agent with a query to see it use the retrieval tool.

In [27]:
# Test the agent with a simple query
query = "What are the password requirements according to our security policy?"

print(f"Query: {query}")
print("=" * 70)
print()

# Run the agent and stream the response
response = await llamaindex_agent.run(query)

# Print the response
print("ü§ñ Agent Response:")
print(response)

print("\n" + "=" * 70)
print("‚úÖ Agent completed")

Query: What are the password requirements according to our security policy?

ü§ñ Agent Response:
According to our security policy, the password requirements are as follows:
- No reuse of the last 12 passwords.
- Passwords must be changed every 90 days for privileged accounts.
- Multi-factor authentication is required for all business systems.
- Use of password managers is recommended for personal passwords.

These measures are in place to ensure strong password security for our systems.

‚úÖ Agent completed


## üéâ Congratulations! What You Built

You've successfully built a complete **Retrieval-Augmented Generation (RAG) system** using GoodMem! Let's recap what you accomplished.

### Components You Configured

| Component | Purpose | Function |
|-----------|---------|----------|
| **Embedder** | Convert text to vectors | Transform documents into semantic embeddings |
| **Space** | Organize and store documents | Logical container with chunking configuration |
| **Memories** | Store searchable content | Documents chunked and indexed for retrieval |
| **Reranker** | Improve search precision | Re-score results for better relevance |
| **LLM** | Generate natural language | Create coherent answers from retrieved context |

### The Complete RAG Pipeline

```
üìÑ Documents
   ‚Üì Chunking (256 chars, 25 overlap)
   ‚Üì Embedding (convert to vectors)
üóÑÔ∏è  Vector Storage (GoodMem Space)
   ‚Üì 
üîç User Query
   ‚Üì Semantic Search (retrieve top-K)
   ‚Üì Reranking (re-score for precision)
   ‚Üì Context Selection (most relevant chunks)
ü§ñ LLM Generation (synthesize answer)
   ‚Üì
‚ú® Natural Language Answer
```

### Key Concepts You Learned

1. **Embedders**: Transform text into semantic vectors for similarity search
2. **Spaces**: Logical containers for organizing and searching documents
3. **Chunking**: Breaking documents into optimal sizes for retrieval
4. **Semantic Search**: Finding conceptually similar content, not just keyword matches
5. **Reranking**: Two-stage retrieval for better precision
6. **Streaming API**: Real-time, memory-efficient result processing
7. **RAG Architecture**: Combining retrieval and generation for accurate, grounded responses

### Performance Improvements

**Basic search** (retrieval only):
- Fast retrieval using vector similarity
- Good recall, but may include less relevant results

**Enhanced RAG** (with reranker + LLM):
- Reranker improves precision significantly
- LLM synthesizes information from multiple chunks
- Better user experience with natural language answers
- Grounded in actual document content (no hallucinations)

### Next Steps & Advanced Topics

**Enhance Your RAG System**:
- **Multiple embedders**: Combine different embedders for better coverage
- **Custom chunking**: Tune chunk size/overlap for your content type
- **Metadata filtering**: Add filters to narrow search by document type, date, etc.
- **Hybrid search**: Combine semantic and keyword search
- **Context augmentation**: Include surrounding chunks for better LLM context

**Production Deployment**:
- **Monitoring**: Track query latency, relevance scores, user feedback
- **Scaling**: Horizontal scaling for high-traffic applications
- **Cost optimization**: Balance quality vs. API costs
- **Caching**: Cache frequent queries for faster responses
- **Error handling**: Robust exception handling and retry logic

**Advanced Features**:
- **Multi-space search**: Query across multiple knowledge bases
- **Query expansion**: Transform queries for better retrieval
- **Result aggregation**: Combine and deduplicate results
- **Streaming generation**: Progressive LLM responses for real-time UX
- **Fine-tuning**: Customize models for your specific domain

### Resources

- **Documentation**: [https://docs.goodmem.ai](https://docs.goodmem.ai)
- **Community**: Join discussions and share your implementations
- **Examples**: Explore more advanced use cases and patterns

---

**Great job!** You now have a solid foundation for building production RAG systems with GoodMem. üöÄ
