# Building a Basic RAG Agent with GoodMem

## Overview

This tutorial will guide you through building a complete **Retrieval-Augmented Generation (RAG)** system using GoodMem's vector memory capabilities. By the end of this guide, you'll have a functional Q&A system that can:

- üîç **Semantically search** through your documents
- üìù **Generate contextual answers** using retrieved information 
- üèóÔ∏è **Scale to handle** large document collections

### What is RAG?

RAG combines the power of **retrieval** (finding relevant information) with **generation** (creating natural language responses). This approach allows AI systems to provide accurate, context-aware answers by:

1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the query with this context
3. **Generating** a comprehensive answer using both the query and retrieved information

### Why GoodMem for RAG?

GoodMem provides enterprise-grade vector storage with:
- **Multiple embedder support** for optimal retrieval accuracy
- **Streaming APIs** for real-time responses
- **Advanced post-processing** with reranking and summarization
- **Scalable architecture** for production workloads


## Prerequisites

Before starting, ensure you have:

- ‚úÖ **GoodMem server running** locally or access to a remote instance
- ‚úÖ **Python 3.9+** installed on your system
- ‚úÖ **API key** for your GoodMem instance
- ‚úÖ **OpenAI API key** (for embeddings and LLM)
- ‚úÖ **Voyage AI API key** (optional, for reranking)

### Installing GoodMem

If you don't have GoodMem installed yet, you can install it with:

```bash
curl -s https://get.goodmem.ai | bash
```

**Environment setup:**
```bash
export GOODMEM_API_KEY="your-key-here"
export OPENAI_API_KEY="your-openai-key"
export VOYAGE_API_KEY="your-voyage-key"  # Optional
```

## Installation & Setup

First, let's install the required packages:

In [1]:
%%bash
# The GoodMem CLI can be installed via:
# curl -s https://get.goodmem.ai | bash

# Verify installation
goodmem version


GoodMem CLI v1.0.183 (commit: 85978e9)


## Authentication & Configuration

### Why This Matters

GoodMem uses API key authentication to secure your vector memory data. Proper configuration ensures:
- **Secure access** to your GoodMem instance
- **Isolated environments** (development, staging, production)
- **Usage tracking** and access control per API key

### What We'll Do

1. Configure the GoodMem host URL (where your server is running)
2. Set up API key authentication
3. Verify the configuration is correct

### Configuration Options

- **Local development**: `http://localhost:8080` (default)
- **Remote/production**: Your deployed GoodMem URL
- **Environment variables**: Best practice for managing credentials

Let's configure our GoodMem client and test the connection:

In [13]:
import dotenv
dotenv.load_dotenv()

True

In [None]:
# Set environment variables for GoodMem CLI
%env GOODMEM_API_KEY=your-api-key-here
%env GOODMEM_GRPC=https://localhost:9090

# Set API keys for embedders/LLMs
%env OPENAI_API_KEY=your-openai-key
%env VOYAGE_API_KEY=your-voyage-key


In [95]:
%%bash
# Test connection to GoodMem by listing spaces
goodmem space list --server $GOODMEM_GRPC --api-key $GOODMEM_API_KEY

SPACE ID                             NAME                           CREATED              PUBLIC 
-----------------------------------------------------------------------------------------------
0fa8870f-427c-4234-bf30-e93504c41320 RAG Demo Knowledge Base CLI    2025-12-04 21:57:24  false  
4b58a640-865a-414e-99ca-d96691071111 RAG Demo Knowledge Base        2025-12-03 21:02:40  false  


## Creating an Embedder

### Why Embedders Matter

An **embedder** is the foundation of semantic search. It converts text into high-dimensional vectors (embeddings) that capture meaning:

```
Text: "vacation policy" ‚Üí Vector: [0.23, -0.45, 0.67, ...]  (1536 dimensions)
```

These vectors enable:
- **Semantic similarity**: Find conceptually similar content, not just keyword matches
- **Context understanding**: Capture meaning beyond exact word matches
- **Efficient retrieval**: Fast vector comparisons using specialized indexes

### The RAG Pipeline Flow

```
Documents ‚Üí Embedder ‚Üí Vector Storage ‚Üí Semantic Search ‚Üí Retrieved Context
```

### Choosing an Embedder

**OpenAI `text-embedding-3-small`** (what we'll use):
- ‚úÖ **High quality**: Excellent for most use cases
- ‚úÖ **Fast**: Low latency for real-time applications  
- ‚úÖ **1536 dimensions**: Good balance of quality and storage
- ‚úÖ **Cost-effective**: $0.02 per 1M tokens

**Other options**:
- **text-embedding-3-large**: Higher quality, 3072 dimensions, more expensive
- **Voyage AI**: Specialized for search, excellent retrieval performance
- **Cohere**: Good multilingual support
- **Local models**: HuggingFace sentence transformers for privacy/offline

### What We'll Do

1. Check if an embedder already exists
2. If not, create an OpenAI embedder with proper authentication
3. Verify the embedder is ready for use

**Note**: You'll need an OpenAI API key set in your environment variable `OPENAI_API_KEY`.

In [26]:
# Create OpenAI text-embedding-3-small embedder and save output
!goodmem embedder create \
  --display-name "OpenAI Text Embedding 3 Small" \
  --provider-type OPENAI \
  --endpoint-url "https://api.openai.com/v1" \
  --model-identifier "text-embedding-3-small" \
  --dimensionality 1536 \
  --api-path "/embeddings" \
  --distribution-type DENSE \
  --modality TEXT \
  --cred-api-key "$OPENAI_API_KEY" \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" \
  > /tmp/embedder_output.txt

# Parse the embedder ID using awk
embedder_id_list = !awk '/^ID:/ {{print $$2}}' /tmp/embedder_output.txt
embedder_id = embedder_id_list[0] if embedder_id_list else ""

# Set as environment variable for bash cells
%env EMBEDDER_ID={embedder_id}

# Display the full output
!cat /tmp/embedder_output.txt


env: EMBEDDER_ID=95cfbbfe-94b2-4296-be6c-8b5f16051295
Embedder created successfully!

ID:               95cfbbfe-94b2-4296-be6c-8b5f16051295
Display Name:     OpenAI Text Embedding 3 Small
Owner:            cf5df949-31c6-4c54-af50-f8002107164e
Provider Type:    OPENAI
Distribution:     DENSE
Endpoint URL:     https://api.openai.com/v1
API Path:         /embeddings
Model:            text-embedding-3-small
Dimensionality:   1536
Modalities:       TEXT
Created by:       cf5df949-31c6-4c54-af50-f8002107164e
Created at:       2025-12-04T21:54:59Z


## Creating Your First Space

### What is a Space?

A **Space** in GoodMem is a logical container for organizing related memories (documents). Think of it as a database or collection where you store and retrieve semantically similar content.

Each space has:
- **Associated embedders**: Which models convert text to vectors
- **Chunking configuration**: How documents are split into searchable pieces
- **Access controls**: Public or private, with permission management
- **Metadata labels**: For organization and filtering

### Use Cases for Multiple Spaces

You might create different spaces for:
- **By domain**: Technical docs, HR policies, product specs
- **By environment**: Development, staging, production
- **By customer**: Tenant-specific data in multi-tenant apps
- **By privacy level**: Public FAQ vs. internal knowledge base

### Why Chunking Matters

Documents are too large to search efficiently as whole units. Chunking:
- **Improves relevance**: Match specific sections, not entire documents
- **Enables context**: Return focused chunks that answer specific questions  
- **Optimizes retrieval**: Process and compare smaller text segments

**Our chunking strategy**:
- **256 characters**: Short enough for focused context, long enough for meaning
- **25 character overlap**: Ensures concepts spanning chunk boundaries aren't lost
- **Hierarchical separators**: Split on paragraphs first, then sentences, then words

### What We'll Do

1. List available embedders
2. Create a space with our embedder and chunking configuration
3. Add metadata labels for organization
4. Verify the space is ready

Let's create a space for our RAG demo:

In [27]:
%%bash
# List all available embedders
goodmem embedder list \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY"


EMBEDDER ID                          DISPLAY NAME                   PROVIDER   DIMENSIONS   CREATED             
---------------------------------------------------------------------------------------------------------------
be251293-d618-4715-baf4-67003ff3025d OpenAI Text Embedding 3 Small  OPENAI     1536         2025-12-03 21:02:40 
95cfbbfe-94b2-4296-be6c-8b5f16051295 OpenAI Text Embedding 3 Small  OPENAI     1536         2025-12-04 21:54:59 


In [34]:
# Create a space for our RAG demo with chunking configuration
!goodmem space create \
  --name "RAG Demo Knowledge Base CLI" \
  --embedder-id "$EMBEDDER_ID" \
  --embedder-weight 1.0 \
  --chunking recursive \
  --chunk-size 256 \
  --chunk-overlap 25 \
  --length-unit chars \
  --separator $'\\n\\n' \
  --separator $'\\n' \
  --separator ". " \
  --separator " " \
  --separator "" \
  --keep-separator end \
  --label purpose=rag-demo \
  --label environment=tutorial \
  --label content-type=documentation \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" \
  > /tmp/space_output.txt

# Parse the space ID using awk
space_id_list = !awk '/^ID:/ {{print $$2}}' /tmp/space_output.txt
space_id = space_id_list[0] if space_id_list else ""

# Set as environment variable for bash cells
%env SPACE_ID={space_id}

# Display the full output
!cat /tmp/space_output.txt


env: SPACE_ID=0fa8870f-427c-4234-bf30-e93504c41320
Space created successfully!

ID:         0fa8870f-427c-4234-bf30-e93504c41320
Name:       RAG Demo Knowledge Base CLI
Owner:      cf5df949-31c6-4c54-af50-f8002107164e
Created by: cf5df949-31c6-4c54-af50-f8002107164e
Created at: 2025-12-04T21:57:24Z
Public:     false
Embedder:   95cfbbfe-94b2-4296-be6c-8b5f16051295 (weight: 1)
Labels:
  purpose: rag-demo
  environment: tutorial
  content-type: documentation


In [35]:
%%bash
# Get detailed space information
goodmem space get "$SPACE_ID" \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY"


{
  "createdAt": "2025-12-04T21:57:24.453Z",
  "createdByID": "cf5df949-31c6-4c54-af50-f8002107164e",
  "defaultChunkingConfig": {
    "recursive": {
      "chunkOverlap": 25,
      "chunkSize": 256,
      "keepStrategy": "KEEP_END",
      "lengthMeasurement": "CHARACTER_COUNT",
      "separators": [
        "\\n\\n",
        "\\n",
        ". ",
        " "
      ]
    }
  },
  "labels": {
    "content-type": "documentation",
    "environment": "tutorial",
    "purpose": "rag-demo"
  },
  "name": "RAG Demo Knowledge Base CLI",
  "ownerID": "cf5df949-31c6-4c54-af50-f8002107164e",
  "spaceEmbedders": [
    {
      "createdAt": "2025-12-04T21:57:24.453Z",
      "createdByID": "cf5df949-31c6-4c54-af50-f8002107164e",
      "defaultRetrievalWeight": 1,
      "embedderID": "95cfbbfe-94b2-4296-be6c-8b5f16051295",
      "spaceID": "0fa8870f-427c-4234-bf30-e93504c41320",
      "updatedAt": "2025-12-04T21:57:24.453Z",
      "updatedByID": "cf5df949-31c6-4c54-af50-f8002107164e"
    }
  ],
  "spac

## Adding Documents to Memory

### The Document Processing Pipeline

When you add a document to GoodMem, it goes through several automated steps:

```
1. Ingestion ‚Üí 2. Chunking ‚Üí 3. Embedding ‚Üí 4. Indexing ‚Üí 5. Ready for Search
```

**What happens**:
1. **Ingestion**: Document content and metadata are stored
2. **Chunking**: Text is split according to your configuration (256 chars, 25 overlap)
3. **Embedding**: Each chunk is converted to a vector by your embedder
4. **Indexing**: Vectors are indexed for fast similarity search
5. **Status**: Document marked as `COMPLETED` and ready for retrieval

### Single vs. Batch Operations

**Single memory creation** (`CreateMemory`):
- ‚úÖ Good for: Real-time ingestion, single documents
- ‚úÖ Synchronous processing with immediate status
- ‚ö†Ô∏è Higher overhead for bulk operations

**Batch memory creation** (`BatchCreateMemory`):
- ‚úÖ Good for: Bulk imports, initial setup, periodic updates
- ‚úÖ Lower overhead, efficient for multiple documents
- ‚úÖ Async processing - check status via `ListMemories`
- ‚ö†Ô∏è Takes longer to get individual status feedback

### Metadata Best Practices

Rich metadata helps with:
- **Filtering**: Retrieve specific document types
- **Source attribution**: Show users where information came from
- **Organization**: Group and manage related documents
- **Debugging**: Track ingestion methods and dates

### What We'll Do

1. Load sample documents from local files
2. Create one document using single memory creation (to demo the API)
3. Create remaining documents using batch operation (more efficient)
4. Monitor processing status until all documents are ready

We'll use sample company documents that represent common business use cases:

In [44]:
%%bash
# Load and display sample documents from the directory
# Note: The CLI auto-detects file types and handles encoding automatically
SAMPLE_DIR="sample_documents"

echo "Loading documents from $SAMPLE_DIR:"
echo ""

# Count total files
TOTAL_FILES=$(find "$SAMPLE_DIR" -maxdepth 1 -type f \( -name "*.txt" -o -name "*.pdf" \) | wc -l)

# List all files with their sizes
for file in "$SAMPLE_DIR"/*.{txt,pdf}; do
  if [ -f "$file" ]; then
    filename=$(basename "$file")
    byte_count=$(wc -c < "$file")
    ext="${filename##*.}"

    if [ "$ext" = "txt" ]; then
      echo "üìÑ Loaded: $filename ($byte_count bytes, text/plain)"
    elif [ "$ext" = "pdf" ]; then
      echo "üìÑ Loaded: $filename ($byte_count bytes, application/pdf)"
    fi
  fi
done

echo ""
echo "üìö Total documents loaded: $TOTAL_FILES"
echo "üí° CLI will auto-detect file types and handle encoding during ingestion"


Loading documents from sample_documents:

üìÑ Loaded: company_handbook.txt (2342 bytes, text/plain)
üìÑ Loaded: product_faq.txt (4043 bytes, text/plain)
üìÑ Loaded: security_policy.txt (4211 bytes, text/plain)
üìÑ Loaded: technical_documentation.txt (2384 bytes, text/plain)
üìÑ Loaded: employee_handbook.pdf (399615 bytes, application/pdf)

üìö Total documents loaded: 5
üí° CLI will auto-detect file types and handle encoding during ingestion


In [74]:
%%bash
# Create the first memory individually (using first file alphabetically)
SAMPLE_DIR="sample_documents"
FIRST_FILE=$(find "$SAMPLE_DIR" -maxdepth 1 -type f \( -name "*.txt" -o -name "*.pdf" \) | sort | head -1)
FILENAME=$(basename "$FIRST_FILE")

echo "üìù Creating first document using memory create:"
echo "   Document: $FILENAME"
echo "   Method: Individual memory creation"
echo "   üí° CLI auto-detects content type from file extension"
echo ""

# Create memory with metadata and save output
goodmem memory create \
  --space-id "$SPACE_ID" \
  --file "$FIRST_FILE" \
  --metadata "filename=$FILENAME" \
  --metadata "source=sample_documents" \
  --metadata "ingestion_method=single" \
  --chunking recursive \
  --chunk-size 256 \
  --chunk-overlap 25 \
  --length-unit chars \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" \
  > /tmp/memory_output.txt

# Display the full output
cat /tmp/memory_output.txt

echo ""
echo "üéØ Single memory creation completed successfully!"


üìù Creating first document using memory create:
   Document: company_handbook.txt
   Method: Individual memory creation
   üí° CLI auto-detects content type from file extension

Memory created successfully!

ID:            59ba6aee-eaf8-4231-b82f-f5960dde35b6
Space ID:      0fa8870f-427c-4234-bf30-e93504c41320
Content Type:  text/plain
Status:        PENDING
Created by:    cf5df949-31c6-4c54-af50-f8002107164e
Created at:    2025-12-04T22:36:01Z
Metadata:
  filename: company_handbook.txt
  ingestion_method: single
  source: sample_documents

üéØ Single memory creation completed successfully!


In [76]:
# Capture the memory ID from the create output
first_memory_id_list = !awk '/^ID:/ {{print $$2}}' /tmp/memory_output.txt
first_memory_id = first_memory_id_list[0] if first_memory_id_list else ""

# Set as environment variable for bash cells
%env FIRST_MEMORY_ID={first_memory_id}

# Get the created memory with content included
!goodmem memory get "$FIRST_MEMORY_ID"


env: FIRST_MEMORY_ID=59ba6aee-eaf8-4231-b82f-f5960dde35b6
]11;?\[6nACME Corporation Employee Handbook

Welcome to ACME Corporation! This handbook provides essential information about our company policies, procedures, and culture.

COMPANY OVERVIEW
ACME Corporation is a leading technology company founded in 2010, specializing in innovative software solutions for businesses worldwide. Our mission is to empower organizations through cutting-edge technology and exceptional service.

WORK HOURS AND POLICIES
Standard work hours are 9:00 AM to 5:30 PM, Monday through Friday. We offer flexible working arrangements including remote work options. Employees are expected to maintain professional standards and communicate effectively with their teams.

TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 company holidays
- Personal days as needed with manager approval

Vacation requests should be sub

In [None]:
%%bash
# Delete the single memory created in Cell 15 so we can batch ingest all files together
if [ -n "$FIRST_MEMORY_ID" ]; then
  echo "üóëÔ∏è  Deleting single memory to demonstrate batch create-batch..."
  goodmem memory delete --force "$FIRST_MEMORY_ID" \
    --server "$GOODMEM_GRPC" \
    --api-key "$GOODMEM_API_KEY"
fi

echo ""
echo "üì¶ Creating all documents using batch create-batch:"
echo "   üí° This ingests all files from the directory at once"
echo ""

# Use create-batch to ingest all files from directory
# NOTE: the create-batch command does not run in jupyter cells (no tty)
# Run it in an actual terminal
goodmem memory create-batch \
  --space-id "$SPACE_ID" \
  --dir "sample_documents" \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY"

echo ""
echo "üìã Batch Memory Creation Summary:"
echo "   üì¶ Batch create-batch: All files ingested from $SAMPLE_DIR"
echo "   ‚è≥ Check processing status in the next cell"


üóëÔ∏è  Deleting single memory to demonstrate batch create-batch...


Failed to delete memory 59ba6aee-eaf8-4231-b82f-f5960dde35b6: Memory not found
Error: failed to delete memory



üì¶ Creating all documents using batch create-batch:
   üí° This ingests all files from the directory at once



Error: could not open a new TTY: open /dev/tty: no such device or address



üìã Batch Memory Creation Summary:
   üì¶ Batch create-batch: All files ingested from 
   ‚è≥ Check processing status in the next cell


In [87]:
%%bash
# List all memories in our space
echo "üìö Memories in space '$SPACE_ID':"
echo ""

MEMORIES=$(goodmem memory list \
  --space-id "$SPACE_ID" \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" --format json)

TOTAL=$(echo "$MEMORIES" | jq '.memories | length')
echo "   Total memories: $TOTAL"
echo ""

echo "$MEMORIES" | jq -r '.memories[] |
  "   \(.metadata.filename // "Unknown")
      Status: \(.processingStatus)
      Created: \(.createdAt)"'


üìö Memories in space '0fa8870f-427c-4234-bf30-e93504c41320':

   Total memories: 5

   technical_documentation.txt
      Status: COMPLETED
      Created: 2025-12-04T22:50:25.195Z
   security_policy.txt
      Status: COMPLETED
      Created: 2025-12-04T22:50:25.195Z
   employee_handbook.pdf
      Status: COMPLETED
      Created: 2025-12-04T22:50:25.195Z
   company_handbook.txt
      Status: COMPLETED
      Created: 2025-12-04T22:50:25.195Z
   product_faq.txt
      Status: COMPLETED
      Created: 2025-12-04T22:50:25.195Z


In [88]:
%%bash
# Wait for all memories to finish processing
echo "‚è≥ Waiting for document processing to complete..."
echo "   üí° Note: Polling memory list until all documents are COMPLETED"
echo ""

MAX_WAIT=120
ELAPSED=0

while [ $ELAPSED -lt $MAX_WAIT ]; do
  MEMORIES=$(goodmem memory list \
    --space-id "$SPACE_ID" \
    --server "$GOODMEM_GRPC" \
    --api-key "$GOODMEM_API_KEY" --format json)

  TOTAL=$(echo "$MEMORIES" | jq '.memories | length')
  COMPLETED=$(echo "$MEMORIES" | jq '[.memories[] | select(.processingStatus == "COMPLETED")] | length')
  FAILED=$(echo "$MEMORIES" | jq '[.memories[] | select(.processingStatus == "FAILED")] | length')

  echo "üìä Processing status: COMPLETED: $COMPLETED, TOTAL: $TOTAL"

  if [ "$COMPLETED" -eq "$TOTAL" ]; then
    echo "‚úÖ All documents processed successfully!"
    echo "üéâ Ready for semantic search and retrieval!"
    break
  fi

  if [ "$FAILED" -gt 0 ]; then
    echo "‚ùå $FAILED memories failed processing"
    break
  fi

  sleep 5
  ELAPSED=$((ELAPSED + 5))
done

if [ $ELAPSED -ge $MAX_WAIT ]; then
  echo "‚è∞ Timeout waiting for processing (waited ${MAX_WAIT}s)"
fi


‚è≥ Waiting for document processing to complete...
   üí° Note: Polling memory list until all documents are COMPLETED

üìä Processing status: COMPLETED: 5, TOTAL: 5
‚úÖ All documents processed successfully!
üéâ Ready for semantic search and retrieval!


## Semantic Search & Retrieval

### Why Semantic Search?

**Traditional keyword search**:
- Matches exact words or simple variations
- Misses conceptually similar content with different wording
- Example: "vacation days" won't match "time off policy"

**Semantic search**:
- Understands meaning and context
- Finds conceptually similar content regardless of exact wording
- Example: "vacation days" successfully matches "time off policy"

### How It Works

```
Query: "vacation policy" 
   ‚Üì (embed with same embedder)
Query Vector: [0.23, -0.45, ...]
   ‚Üì (compare to all chunk vectors)
Most Similar Chunks: (by cosine similarity)
   1. "TIME OFF POLICY..." (score: -0.604)
   2. "Vacation requests..." (score: -0.544)
   3. "WORK HOURS..." (score: -0.458)
```

### Understanding Relevance Scores

GoodMem uses **cosine distance** (negative cosine similarity):
- **Lower values = more relevant** (e.g., -0.6 is better than -0.4)
- **Range**: Typically -1.0 (most similar) to 0.0 (unrelated)
- **Good threshold**: Results under -0.3 are usually relevant
- **Context matters**: Exact scores vary by embedder and content

### Streaming API Benefits

GoodMem's streaming API:
- **Real-time results**: Process chunks as they arrive
- **Low latency**: Start showing results immediately
- **Memory efficient**: No need to buffer entire result set
- **Progressive UI**: Update interface as more results come in

### What We'll Do

1. Implement a semantic search function using GoodMem's streaming API
2. Process different event types (chunks, memories, metadata)
3. Display results with relevance scores
4. Test with various queries to see semantic matching in action

Now comes the exciting part! Let's perform semantic search using GoodMem's streaming API. This will:

- **Find relevant chunks** based on semantic similarity
- **Stream results** in real-time
- **Include relevance scores** for ranking
- **Return structured data** for easy processing

In [89]:
%%bash
# Perform semantic search using GoodMem's streaming memory retrieve
QUERY="What is the vacation policy for employees?"

echo "üîç Searching for: '$QUERY'"
echo "üìÅ Space ID: $SPACE_ID"
echo "üìä Max results: 5"
echo "--------------------------------------------------"

goodmem memory retrieve "$QUERY" \
  --space-id "$SPACE_ID" \
  --max-results 5 \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY"

echo ""
echo "‚úÖ Search completed"


üîç Searching for: 'What is the vacation policy for employees?'
üìÅ Space ID: 0fa8870f-427c-4234-bf30-e93504c41320
üìä Max results: 5
--------------------------------------------------
Searching memories...

Memory 91f9918c-d66a-4a51-a258-9f6f7f1e7b6c loaded
‚îå‚îÄ [src: memory[0] | stage: retrieve | relevance: -0.71]
‚îÇ
‚îÇ  has established the following vacation plan to provide eligible employees
‚îÇ  time off with pay so that they may be free from their regular duties for
‚îÇ  a period of rest and relaxation... [+391 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: b3c1dd16-93aa-4d0c-a8e5-2dade64daf05]
Memory 8b67ce84-ff23-4379-b928-c4557dd4a741 loaded

‚îå‚îÄ [src: memory[1] | stage: retrieve | relevance: -0.68]
‚îÇ
‚îÇ  We offer flexible working arrangements including remote work options.
‚îÇ  Employees are expected to maintain professional standards and communicate
‚îÇ  effectively with their teams. TIME... [+395 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: 3b878de8-a4ad-48bb-993c-8def7b77ed52]

‚îå‚îÄ [src: me

In [90]:
%%bash
# Test semantic search with different types of queries
QUERIES=(
  "How do I reset my password?"
  "What are the security requirements for remote work?"
  "API authentication and rate limits"
  "Employee benefits and health insurance"
  "How much does the software cost?"
)

for i in "${!QUERIES[@]}"; do
  query="${QUERIES[$i]}"
  echo ""
  echo "üîç Test Query $((i+1)): $query"
  echo "============================================================"

  goodmem memory retrieve "$query" \
    --space-id "$SPACE_ID" \
    --max-results 3 \
    --format stream \
    --server "$GOODMEM_GRPC" \
    --api-key "$GOODMEM_API_KEY"

  echo ""
  echo "------------------------------------------------------------"
done



üîç Test Query 1: How do I reset my password?
Searching memories...

Memory c6e5d3e9-a6e3-4c01-8720-653917cd64ea loaded
‚îå‚îÄ [src: memory[0] | stage: retrieve | relevance: -0.30]
‚îÇ
‚îÇ  for internal use only 3. CONFIDENTIAL: Sensitive information requiring
‚îÇ  special handling 4. RESTRICTED: Highly sensitive information with limited
‚îÇ  access Each classification level... [+393 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: cd59e3e1-c263-4e90-a31e-8b6990710a84]
Memory 91f9918c-d66a-4a51-a258-9f6f7f1e7b6c loaded

‚îå‚îÄ [src: memory[1] | stage: retrieve | relevance: -0.30]
‚îÇ
‚îÇ  well as any change to such password. Such notice must be made
‚îÇ  immediately. 4. Compliance Employees who violate any aspect of this
‚îÇ  policy or who demonstrate poor judgment... [+388 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: 47760b08-7228-482b-9a12-3204709939a8]

‚îå‚îÄ [src: memory[0] | stage: retrieve | relevance: -0.29]
‚îÇ
‚îÇ  Multi-factor authentication required for all business systems - Password
‚îÇ  managers recommen

## Advanced Features

Congratulations! üéâ You've successfully built a semantic search system using GoodMem. Here's what you've accomplished:

### ‚úÖ What You Built
- **Document ingestion pipeline** with automatic chunking and embedding
- **Semantic search system** with relevance scoring
- **Simple Q&A system** using GoodMem's vector capabilities

### üöÄ Next Steps for Advanced Implementation

#### Reranking
Improve search quality by adding a reranking stage. **Rerankers** are specialized models that re-score search results to improve relevance:

- **Two-stage retrieval**: Fast initial retrieval with embeddings, then precise reranking
- **Better relevance**: Rerankers use cross-attention to understand query-document relationships
- **Reduced costs**: Rerank only top-K results instead of entire corpus
- **Voyage AI reranker**: Industry-leading reranking model with state-of-the-art performance

The combination of fast embedding-based retrieval followed by accurate reranking provides the best balance of speed and quality for production RAG systems.

## Configuring a Reranker

To further improve search quality, we can add a **reranker** to our RAG pipeline. While embedders provide fast semantic search, rerankers use more sophisticated models to re-score the top results for better accuracy.

### Why Use Reranking?

1. **Higher Accuracy**: Rerankers use cross-encoder architectures that directly compare queries and documents
2. **Two-Stage Pipeline**: Fast retrieval with embeddings + precise reranking = optimal performance
3. **Cost Effective**: Only rerank top-K results (e.g., top 20) rather than entire corpus

### Voyage AI Reranker

We'll use Voyage AI's `rerank-2.5` model, which provides:
- **State-of-the-art performance** on reranking benchmarks
- **Fast inference** optimized for production use
- **Simple API** that integrates seamlessly with GoodMem

**Note**: You'll need a Voyage AI API key set in your environment variable `VOYAGE_API_KEY`.

In [92]:
# Create Voyage AI reranker
!goodmem reranker create \
  --display-name "Voyage Rerank 2.5" \
  --provider-type VOYAGE \
  --endpoint-url "https://api.voyageai.com" \
  --model-identifier "rerank-2.5" \
  --api-path "/v1/rerank" \
  --cred-api-key "$VOYAGE_API_KEY" \
  --description "Voyage AI reranker for improving search result relevance" \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" \
  > /tmp/reranker_output.txt

# Parse the reranker ID using awk
reranker_id_list = !awk '/^ID:/ {{print $$2}}' /tmp/reranker_output.txt
reranker_id = reranker_id_list[0] if reranker_id_list else ""

# Set as environment variable for bash cells
%env RERANKER_ID={reranker_id}

# Display the full output
!cat /tmp/reranker_output.txt


env: RERANKER_ID=d7ebc6cc-b5a4-47d4-b02d-c8973a73c212
Reranker created successfully!

ID:               d7ebc6cc-b5a4-47d4-b02d-c8973a73c212
Display Name:     Voyage Rerank 2.5
Description:      Voyage AI reranker for improving search result relevance
Owner:            cf5df949-31c6-4c54-af50-f8002107164e
Provider Type:    VOYAGE
Endpoint URL:     https://api.voyageai.com
API Path:         /v1/rerank
Model:            rerank-2.5
Created:          2025-12-04T15:00:05-08:00
Updated:          2025-12-04T15:00:05-08:00


## Registering an LLM

The final component in our RAG pipeline is the **LLM (Large Language Model)** - the generation component that creates natural language responses using the retrieved and reranked context.

### Role of LLMs in RAG

After retrieving and reranking relevant chunks, the LLM:
1. **Receives the query** and retrieved context
2. **Generates a response** that synthesizes information from multiple sources
3. **Maintains coherence** while staying grounded in the retrieved facts

### OpenAI GPT-4o-mini

We'll use OpenAI's `gpt-4o-mini` model, which provides:
- **Fast inference** with low latency for real-time applications
- **Cost-effective** pricing compared to larger models
- **High quality** responses suitable for most RAG use cases
- **Function calling** support for advanced workflows

**Note**: This uses the same `OPENAI_API_KEY` environment variable as the embedder.

In [93]:
# Register OpenAI GPT-4o-mini LLM
!goodmem llm create \
  --display-name "OpenAI GPT-4o Mini" \
  --provider-type OPENAI \
  --endpoint-url "https://api.openai.com/v1" \
  --model-identifier "gpt-4o-mini" \
  --api-path "/chat/completions" \
  --cred-api-key "$OPENAI_API_KEY" \
  --description "OpenAI's GPT-4o Mini model for fast and efficient text generation" \
  --supports-chat \
  --supports-function-calling \
  --supports-system-messages \
  --supports-streaming \
  --supports-sampling-parameters \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY" \
  > /tmp/llm_output.txt

# Parse the LLM ID using awk
llm_id_list = !awk '/^ID:/ {{print $$2}}' /tmp/llm_output.txt
llm_id = llm_id_list[0] if llm_id_list else ""

# Set as environment variable for bash cells
%env LLM_ID={llm_id}

# Display the full output
!cat /tmp/llm_output.txt


env: LLM_ID=6f8859af-489e-492a-a9ad-5e6fe1313dd1
LLM created successfully!

ID:               6f8859af-489e-492a-a9ad-5e6fe1313dd1
Display Name:     OpenAI GPT-4o Mini
Description:      OpenAI's GPT-4o Mini model for fast and efficient text generation
Owner:            cf5df949-31c6-4c54-af50-f8002107164e
Provider Type:    OPENAI
Endpoint URL:     https://api.openai.com/v1
API Path:         /chat/completions
Model:            gpt-4o-mini
Modalities:       TEXT
Capabilities:     Chat, Completion, Functions, System Messages, Streaming, Sampling Parameters
Created by:       cf5df949-31c6-4c54-af50-f8002107164e
Created at:       2025-12-04T23:01:06Z

Capability Inference:
  ‚úì Completion Support: true (detected from model family 'gpt-4o-mini')



## Enhanced RAG with Reranking and LLM Generation

Now that we have all the components configured (embedder, reranker, and LLM), let's use the complete RAG pipeline! This demonstrates the full power of GoodMem:

1. **Retrieval**: Fast semantic search finds relevant chunks
2. **Reranking**: Voyage AI reranker re-scores results for better relevance  
3. **Generation**: OpenAI GPT-4o-mini generates a coherent response using the reranked context

This provides significantly better answer quality compared to simple retrieval alone.

In [94]:
%%bash
# Perform semantic search with reranking and LLM generation using post-processor
TEST_QUERY="What is the vacation policy for employees?"

echo "Testing Complete RAG Pipeline with Reranker + LLM"
echo ""
echo "üîç RAG Query: '$TEST_QUERY'"
echo "üìÅ Space ID: $SPACE_ID"
echo "üìä Max results: 3"
echo "======================================================================"

# Create post-processor args JSON
cat > /tmp/rag_config.json <<EOF
{
  "llm_id": "$LLM_ID",
  "reranker_id": "$RERANKER_ID",
  "relevance_threshold": 0.3,
  "max_results": 3
}
EOF

# Run retrieval with ChatPostProcessorFactory for RAG
goodmem memory retrieve "$TEST_QUERY" \
  --space-id "$SPACE_ID" \
  --max-results 3 \
  --post-processor "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory" \
  --post-processor-args "@/tmp/rag_config.json" \
  --format stream \
  --server "$GOODMEM_GRPC" \
  --api-key "$GOODMEM_API_KEY"

echo ""
echo "‚úÖ RAG completed"

# Clean up temp file
rm -f /tmp/rag_config.json


Testing Complete RAG Pipeline with Reranker + LLM

üîç RAG Query: 'What is the vacation policy for employees?'
üìÅ Space ID: 0fa8870f-427c-4234-bf30-e93504c41320
üìä Max results: 3
Searching memories...

Memory 8b67ce84-ff23-4379-b928-c4557dd4a741 loaded
‚îå‚îÄ [src: memory[0] | stage: rerank | relevance: 0.89]
‚îÇ
‚îÇ  We offer flexible working arrangements including remote work options.
‚îÇ  Employees are expected to maintain professional standards and communicate
‚îÇ  effectively with their teams. TIME... [+395 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: 3b878de8-a4ad-48bb-993c-8def7b77ed52]
Memory 91f9918c-d66a-4a51-a258-9f6f7f1e7b6c loaded

‚îå‚îÄ [src: memory[1] | stage: rerank | relevance: 0.85]
‚îÇ
‚îÇ  has established the following vacation plan to provide eligible employees
‚îÇ  time off with pay so that they may be free from their regular duties for
‚îÇ  a period of rest and relaxation... [+391 chars]
‚îÇ
‚îî‚îÄ [Chunk ID: b3c1dd16-93aa-4d0c-a8e5-2dade64daf05]

‚îå‚îÄ [src: memory[1] | 

## üéâ Congratulations! What You Built

You've successfully built a complete **Retrieval-Augmented Generation (RAG) system** using GoodMem! Let's recap what you accomplished.

### Components You Configured

| Component | Purpose | Provider | Model |
|-----------|---------|----------|-------|
| **Embedder** | Convert text to vectors | OpenAI | text-embedding-3-small (1536d) |
| **Reranker** | Re-score search results | Voyage AI | rerank-2.5 |
| **LLM** | Generate natural language responses | OpenAI | gpt-4o-mini |

### The Complete RAG Pipeline

```
üìÑ Documents
   ‚Üì Chunking (256 chars, 25 overlap)
   ‚Üì Embedding (OpenAI)
üóÑÔ∏è  Vector Storage (GoodMem Space)
   ‚Üì 
üîç User Query
   ‚Üì Semantic Search (retrieve top-K)
   ‚Üì Reranking (Voyage AI re-scores)
   ‚Üì Context Selection (most relevant chunks)
ü§ñ LLM Generation (GPT-4o-mini)
   ‚Üì
‚ú® Natural Language Answer
```

### Key Concepts You Learned

1. **Embedders**: Transform text into semantic vectors for similarity search
2. **Spaces**: Logical containers for organizing and searching documents
3. **Chunking**: Breaking documents into optimal sizes for retrieval
4. **Semantic Search**: Finding conceptually similar content, not just keyword matches
5. **Reranking**: Two-stage retrieval for better precision
6. **Streaming API**: Real-time, memory-efficient result processing
7. **RAG Architecture**: Combining retrieval and generation for accurate, grounded responses

### Performance Improvements

**Basic search** (earlier in notebook):
- Fast retrieval using vector similarity
- Good recall, but may include less relevant results

**Enhanced RAG** (with reranker + LLM):
- Reranker improves precision significantly
- LLM synthesizes information from multiple chunks
- Better user experience with natural language answers
- Grounded in actual document content (no hallucinations)

### Next Steps & Advanced Topics

**Enhance Your RAG System**:
- **Multiple embedders**: Combine different embedders for better coverage
- **Custom chunking**: Tune chunk size/overlap for your content type
- **Metadata filtering**: Add filters to narrow search by document type, date, etc.
- **Hybrid search**: Combine semantic and keyword search
- **Context augmentation**: Include surrounding chunks for better LLM context

**Production Deployment**:
- **Monitoring**: Track query latency, relevance scores, user feedback
- **Scaling**: Horizontal scaling for high-traffic applications
- **Cost optimization**: Balance quality vs. API costs
- **Caching**: Cache frequent queries for faster responses

### Resources

- **Documentation**: [https://docs.goodmem.ai](https://docs.goodmem.ai)
- **API Reference**: [https://docs.goodmem.ai/docs/reference/sdk/python/](https://docs.goodmem.ai/docs/reference/sdk/python/)

---

**Great job!** You now have a solid foundation for building production RAG systems with GoodMem. üöÄ