# Building a Basic RAG Agent with GoodMem

## Overview

This tutorial will guide you through building a complete **Retrieval-Augmented Generation (RAG)** system using GoodMem's vector memory capabilities. By the end of this guide, you'll have a functional Q&A system that can:

- üîç **Semantically search** through your documents
- üìù **Generate contextual answers** using retrieved information 
- üèóÔ∏è **Scale to handle** large document collections

### What is RAG?

RAG combines the power of **retrieval** (finding relevant information) with **generation** (creating natural language responses). This approach allows AI systems to provide accurate, context-aware answers by:

1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the query with this context
3. **Generating** a comprehensive answer using both the query and retrieved information

### Why GoodMem for RAG?

GoodMem provides enterprise-grade vector storage with:
- **Multiple embedder support** for optimal retrieval accuracy
- **Streaming APIs** for real-time responses
- **Advanced post-processing** with reranking and summarization
- **Scalable architecture** for production workloads


## Prerequisites

Before starting, ensure you have:

- ‚úÖ **GoodMem server running** locally or access to a remote instance
- ‚úÖ **curl installed** on your system (`curl --version` to verify)
- ‚úÖ **jq installed** for JSON parsing (`jq --version` to verify)
- ‚úÖ **Python 3.9+** for running the notebook
- ‚úÖ **API key** for your GoodMem instance
- ‚úÖ **OpenAI API key** (for embeddings and LLM)
- ‚úÖ **Voyage AI API key** (for reranking)

### Installing GoodMem

If you don't have GoodMem installed yet, you can install it with:

```bash
curl -s https://get.goodmem.ai | bash
```

**Environment setup:**
```bash
export GOODMEM_HOST="http://localhost:8080"
export GOODMEM_API_KEY="your-key-here"
export OPENAI_API_KEY="your-openai-key"
export VOYAGE_API_KEY="your-voyage-key"
```

## Installation & Setup

First, let's verify that curl and jq are installed:

In [1]:
%%bash
# Verify curl is installed
echo "Checking curl installation:"
curl --version | head -1

echo ""
echo "Checking jq installation:"
jq --version

echo ""
echo "‚úÖ All required tools are installed!"

Checking curl installation:
curl 8.5.0 (x86_64-pc-linux-gnu) libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7

Checking jq installation:
jq-1.7

‚úÖ All required tools are installed!


## Authentication & Configuration

### Why This Matters

GoodMem uses API key authentication to secure your vector memory data. Proper configuration ensures:
- **Secure access** to your GoodMem instance
- **Isolated environments** (development, staging, production)
- **Usage tracking** and access control per API key

### What We'll Do

1. Configure the GoodMem host URL (where your server is running)
2. Set up API key authentication
3. Verify the configuration is correct

### Configuration Options

- **Local development**: `http://localhost:8080` (default)
- **Remote/production**: Your deployed GoodMem URL
- **Environment variables**: Best practice for managing credentials

Let's configure our GoodMem client and test the connection:

In [2]:
import dotenv
dotenv.load_dotenv()

True

In [1]:
# Set environment variables for GoodMem CLI
%env GOODMEM_API_KEY=your-api-key-here
%env GOODMEM_HOST=https://localhost:8080

# Set API keys for embedders/LLMs
%env OPENAI_API_KEY=your-openai-key
%env VOYAGE_API_KEY=your-voyage-key

env: GOODMEM_API_KEY=your-api-key-here
env: GOODMEM_HOST=https://localhost:8080
env: OPENAI_API_KEY=your-openai-key
env: VOYAGE_API_KEY=your-voyage-key


In [6]:
%%bash

# Test connection by listing spaces
echo "Testing connection to GoodMem API..."
echo ""

curl -s \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  $GOODMEM_HOST/v1/spaces | jq .

Testing connection to GoodMem API...

{
  "spaces": [],
  "nextToken": null
}


## Creating an Embedder

### Why Embedders Matter

An **embedder** is the foundation of semantic search. It converts text into high-dimensional vectors (embeddings) that capture meaning:

```
Text: "vacation policy" ‚Üí Vector: [0.23, -0.45, 0.67, ...]  (1536 dimensions)
```

These vectors enable:
- **Semantic similarity**: Find conceptually similar content, not just keyword matches
- **Context understanding**: Capture meaning beyond exact word matches
- **Efficient retrieval**: Fast vector comparisons using specialized indexes

### The RAG Pipeline Flow

```
Documents ‚Üí Embedder ‚Üí Vector Storage ‚Üí Semantic Search ‚Üí Retrieved Context
```

### Choosing an Embedder

**OpenAI `text-embedding-3-small`** (what we'll use):
- ‚úÖ **High quality**: Excellent for most use cases
- ‚úÖ **Fast**: Low latency for real-time applications  
- ‚úÖ **1536 dimensions**: Good balance of quality and storage
- ‚úÖ **Cost-effective**: $0.02 per 1M tokens

**Other options**:
- **text-embedding-3-large**: Higher quality, 3072 dimensions, more expensive
- **Voyage AI**: Specialized for search, excellent retrieval performance
- **Cohere**: Good multilingual support
- **Local models**: HuggingFace sentence transformers for privacy/offline

### What We'll Do

1. Check if an embedder already exists
2. If not, create an OpenAI embedder with proper authentication
3. Verify the embedder is ready for use

**Note**: You'll need an OpenAI API key set in your environment variable `OPENAI_API_KEY`.

In [7]:
%%bash
# Create OpenAI embedder using REST API
echo "Creating OpenAI text-embedding-3-small embedder..."
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "OpenAI Text Embedding 3 Small",
    "providerType": "OPENAI",
    "endpointUrl": "https://api.openai.com/v1",
    "modelIdentifier": "text-embedding-3-small",
    "dimensionality": 1536,
    "apiPath": "/embeddings",
    "distributionType": "DENSE",
    "supportedModalities": ["TEXT"],
    "credentials": {
      "kind": "CREDENTIAL_KIND_API_KEY",
      "apiKey": {
        "inlineSecret": "'"$OPENAI_API_KEY"'",
        "headerName": "Authorization",
        "prefix": "Bearer "
      }
    }
  }' \
  $GOODMEM_HOST/v1/embedders > /tmp/embedder_output.txt

cat /tmp/embedder_output.txt | jq .

Creating OpenAI text-embedding-3-small embedder...

{
  "embedderId": "019b2d88-7694-747a-9ed6-61fc36ef7a62",
  "displayName": "OpenAI Text Embedding 3 Small",
  "description": "",
  "providerType": "OPENAI",
  "endpointUrl": "https://api.openai.com/v1",
  "apiPath": "/embeddings",
  "modelIdentifier": "text-embedding-3-small",
  "dimensionality": 1536,
  "distributionType": "DENSE",
  "maxSequenceLength": null,
  "supportedModalities": [
    "TEXT"
  ],
  "labels": {},
  "version": "",
  "monitoringEndpoint": "",
  "ownerId": "cf5df949-31c6-4c54-af50-f8002107164e",
  "createdAt": 1765995476628,
  "updatedAt": 1765995476628,
  "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e"
}


In [9]:
# Extract and store embedder ID
embedder_id_list = !jq -r '.embedderId' /tmp/embedder_output.txt
embedder_id = embedder_id_list[0] if embedder_id_list else ""

%env EMBEDDER_ID={embedder_id}

env: EMBEDDER_ID=019b2d88-7694-747a-9ed6-61fc36ef7a62


## Creating Your First Space

### What is a Space?

A **Space** in GoodMem is a logical container for organizing related memories (documents). Think of it as a database or collection where you store and retrieve semantically similar content.

Each space has:
- **Associated embedders**: Which models convert text to vectors
- **Chunking configuration**: How documents are split into searchable pieces
- **Access controls**: Public or private, with permission management
- **Metadata labels**: For organization and filtering

### Use Cases for Multiple Spaces

You might create different spaces for:
- **By domain**: Technical docs, HR policies, product specs
- **By environment**: Development, staging, production
- **By customer**: Tenant-specific data in multi-tenant apps
- **By privacy level**: Public FAQ vs. internal knowledge base

### Why Chunking Matters

Documents are too large to search efficiently as whole units. Chunking:
- **Improves relevance**: Match specific sections, not entire documents
- **Enables context**: Return focused chunks that answer specific questions  
- **Optimizes retrieval**: Process and compare smaller text segments

**Our chunking strategy**:
- **256 characters**: Short enough for focused context, long enough for meaning
- **25 character overlap**: Ensures concepts spanning chunk boundaries aren't lost
- **Hierarchical separators**: Split on paragraphs first, then sentences, then words

### What We'll Do

1. List available embedders
2. Create a space with our embedder and chunking configuration
3. Add metadata labels for organization
4. Verify the space is ready

Let's create a space for our RAG demo:

In [10]:
%%bash
# List available embedders
echo "üìã Available Embedders:"
echo ""

curl -s \
  -H "x-api-key: $GOODMEM_API_KEY" \
  $GOODMEM_HOST/v1/embedders | jq '.embedders[] | {embedderId, displayName, providerType, modelIdentifier}'

üìã Available Embedders:

{
  "embedderId": "019b2d88-7694-747a-9ed6-61fc36ef7a62",
  "displayName": "OpenAI Text Embedding 3 Small",
  "providerType": "OPENAI",
  "modelIdentifier": "text-embedding-3-small"
}


In [11]:
%%bash
# Create space with embedder and chunking configuration
echo "Creating RAG Demo Knowledge Base (CURL)..."
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "RAG Demo Knowledge Base CURL",
    "labels": {
      "purpose": "rag-demo",
      "environment": "tutorial",
      "content-type": "documentation"
    },
    "spaceEmbedders": [
      {
        "embedderId": "'"$EMBEDDER_ID"'",
        "defaultRetrievalWeight": 1.0
      }
    ],
    "defaultChunkingConfig": {
      "recursive": {
        "chunkSize": 256,
        "chunkOverlap": 25,
        "separators": ["\\n\\n", "\\n", ". ", " ", ""],
        "keepStrategy": "KEEP_END",
        "lengthMeasurement": "CHARACTER_COUNT"
      }
    },
    "publicRead": false
  }' \
  $GOODMEM_HOST/v1/spaces > /tmp/space_output.txt

cat /tmp/space_output.txt | jq .

Creating RAG Demo Knowledge Base (CURL)...

{
  "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
  "name": "RAG Demo Knowledge Base CURL",
  "labels": {
    "purpose": "rag-demo",
    "environment": "tutorial",
    "content-type": "documentation"
  },
  "spaceEmbedders": [
    {
      "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
      "embedderId": "019b2d88-7694-747a-9ed6-61fc36ef7a62",
      "defaultRetrievalWeight": 1.0,
      "createdAt": 1765995565302,
      "updatedAt": 1765995565302,
      "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
      "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e"
    }
  ],
  "createdAt": 1765995565302,
  "updatedAt": 1765995565302,
  "ownerId": "cf5df949-31c6-4c54-af50-f8002107164e",
  "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "publicRead": false,
  "defaultChunkingConfig": {
    "none": null,
    "recursive": {
      "chunkSize": 256,
      "chunkOverlap": 25

In [12]:
# Extract and store space ID
space_id_list = !jq -r '.spaceId' /tmp/space_output.txt
space_id = space_id_list[0] if space_id_list else ""

%env SPACE_ID={space_id}

env: SPACE_ID=019b2d89-d0f6-75ec-bbfc-00a432208952


In [13]:
%%bash
# Get space details to verify configuration
echo "üîç Space Configuration:"
echo ""

curl -s \
  -H "x-api-key: $GOODMEM_API_KEY" \
  $GOODMEM_HOST/v1/spaces/$SPACE_ID | jq .

üîç Space Configuration:

{
  "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
  "name": "RAG Demo Knowledge Base CURL",
  "labels": {
    "purpose": "rag-demo",
    "environment": "tutorial",
    "content-type": "documentation"
  },
  "spaceEmbedders": [
    {
      "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
      "embedderId": "019b2d88-7694-747a-9ed6-61fc36ef7a62",
      "defaultRetrievalWeight": 1.0,
      "createdAt": 1765995565302,
      "updatedAt": 1765995565302,
      "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
      "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e"
    }
  ],
  "createdAt": 1765995565302,
  "updatedAt": 1765995565302,
  "ownerId": "cf5df949-31c6-4c54-af50-f8002107164e",
  "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "publicRead": false,
  "defaultChunkingConfig": {
    "none": null,
    "recursive": {
      "chunkSize": 256,
      "chunkOverlap": 25,
      "separato

## Adding Documents to Memory

### The Document Processing Pipeline

When you add a document to GoodMem, it goes through several automated steps:

```
1. Ingestion ‚Üí 2. Chunking ‚Üí 3. Embedding ‚Üí 4. Indexing ‚Üí 5. Ready for Search
```

**What happens**:
1. **Ingestion**: Document content and metadata are stored
2. **Chunking**: Text is split according to your configuration (256 chars, 25 overlap)
3. **Embedding**: Each chunk is converted to a vector by your embedder
4. **Indexing**: Vectors are indexed for fast similarity search
5. **Status**: Document marked as `COMPLETED` and ready for retrieval

### Single vs. Batch Operations

**Single memory creation** (`CreateMemory`):
- ‚úÖ Good for: Real-time ingestion, single documents
- ‚úÖ Synchronous processing with immediate status
- ‚ö†Ô∏è Higher overhead for bulk operations

**Batch memory creation** (`BatchCreateMemory`):
- ‚úÖ Good for: Bulk imports, initial setup, periodic updates
- ‚úÖ Lower overhead, efficient for multiple documents
- ‚úÖ Async processing - check status via `ListMemories`
- ‚ö†Ô∏è Takes longer to get individual status feedback

### Metadata Best Practices

Rich metadata helps with:
- **Filtering**: Retrieve specific document types
- **Source attribution**: Show users where information came from
- **Organization**: Group and manage related documents
- **Debugging**: Track ingestion methods and dates

### What We'll Do

1. Load sample documents from local files
2. Create one document using single memory creation (to demo the API)
3. Create remaining documents using batch operation (more efficient)
4. Monitor processing status until all documents are ready

We'll use sample company documents that represent common business use cases:

In [14]:
%%bash
# List available sample documents
echo "üìö Sample Documents:"
echo ""
ls -lh sample_documents/

üìö Sample Documents:

total 412K
-rw-rw-r-- 1 pair-system-owner pair-system-owner 2.3K Oct  3 10:16 company_handbook.txt
-rw-rw-r-- 1 pair-system-owner pair-system-owner 391K Dec  3 10:48 employee_handbook.pdf
-rw-rw-r-- 1 pair-system-owner pair-system-owner 4.0K Oct  3 10:16 product_faq.txt
-rw-rw-r-- 1 pair-system-owner pair-system-owner 4.2K Oct  3 10:16 security_policy.txt
-rw-rw-r-- 1 pair-system-owner pair-system-owner 2.4K Oct  3 10:16 technical_documentation.txt


In [15]:
%%bash
# Create first memory using a text document
echo "üìù Creating memory from company_handbook.txt..."
echo ""

# Read file content and escape for JSON
CONTENT=$(cat sample_documents/company_handbook.txt | jq -Rs .)

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "spaceId": "'"$SPACE_ID"'",
    "originalContent": '"$CONTENT"',
    "contentType": "text/plain",
    "metadata": {
      "filename": "company_handbook.txt",
      "source": "sample_documents",
      "ingestion_method": "single"
    },
    "chunkingConfig": {
      "recursive": {
        "chunkSize": 256,
        "chunkOverlap": 25,
        "separators": ["\\n\\n", "\\n", ". ", " ", ""],
        "keepStrategy": "KEEP_END",
        "lengthMeasurement": "CHARACTER_COUNT"
      }
    }
  }' \
  $GOODMEM_HOST/v1/memories > /tmp/single_memory.json

cat /tmp/single_memory.json | jq .

üìù Creating memory from company_handbook.txt...

{
  "memoryId": "019b2d8a-26c5-70cf-84ff-51c39833bf09",
  "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
  "originalContentRef": "",
  "contentType": "text/plain",
  "processingStatus": "PENDING",
  "metadata": {
    "filename": "company_handbook.txt",
    "source": "sample_documents",
    "ingestion_method": "single"
  },
  "createdAt": 1765995587271,
  "updatedAt": 1765995587271,
  "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "chunkingConfig": {
    "none": null,
    "recursive": {
      "chunkSize": 256,
      "chunkOverlap": 25,
      "separators": [
        "\\n\\n",
        "\\n",
        ". ",
        " ",
        ""
      ],
      "keepStrategy": "KEEP_END",
      "separatorIsRegex": false,
      "lengthMeasurement": "CHARACTER_COUNT"
    },
    "sentence": null
  }
}


In [None]:
# Extract and store memory ID from single creation
single_memory_id_list = !jq -r '.memoryId' /tmp/single_memory.json
single_memory_id = single_memory_id_list[0] if single_memory_id_list else ""

# Initialize MEMORY_IDS list
MEMORY_IDS = [single_memory_id]

print(f"   Total IDs tracked: {len(MEMORY_IDS)}")

‚úÖ Stored memory ID: 019b2d8a-26c5-70cf-84ff-51c39833bf09
   Total IDs tracked: 1


In [17]:
# Prepare batch memory creation for remaining documents
import base64
import json
import os

sample_dir = "sample_documents"
files_to_process = [
    "employee_handbook.pdf",
    "product_faq.txt",
    "security_policy.txt",
    "technical_documentation.txt"
]

# Chunking config
chunking_config = {
    "recursive": {
        "chunkSize": 256,
        "chunkOverlap": 25,
        "separators": ["\n\n", "\n", ". ", " ", ""],
        "keepStrategy": "KEEP_END",
        "lengthMeasurement": "CHARACTER_COUNT"
    }
}

memory_requests = []

for filename in files_to_process:
    filepath = os.path.join(sample_dir, filename)
    
    if filename.endswith('.pdf'):
        # Handle PDF with base64 encoding
        with open(filepath, 'rb') as f:
            content_b64 = base64.b64encode(f.read()).decode()
        
        memory_request = {
            "spaceId": os.environ['SPACE_ID'],
            "originalContentB64": content_b64,
            "contentType": "application/pdf",
            "metadata": {
                "filename": filename,
                "source": "sample_documents",
                "ingestion_method": "batch"
            },
            "chunkingConfig": chunking_config
        }
    else:
        # Handle text file
        with open(filepath, 'r') as f:
            content = f.read()
        
        memory_request = {
            "spaceId": os.environ['SPACE_ID'],
            "originalContent": content,
            "contentType": "text/plain",
            "metadata": {
                "filename": filename,
                "source": "sample_documents",
                "ingestion_method": "batch"
            },
            "chunkingConfig": chunking_config
        }
    
    memory_requests.append(memory_request)

# Save batch payload
batch_payload = {"requests": memory_requests}
with open('/tmp/batch_memories.json', 'w') as f:
    json.dump(batch_payload, f, indent=2)

print(f"‚úÖ Prepared {len(memory_requests)} memory requests for batch creation")

‚úÖ Prepared 4 memory requests for batch creation


In [24]:
%%bash
# Execute batch memory creation
echo "üì¶ Creating remaining memories in batch..."
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/batch_memories.json \
  $GOODMEM_HOST/v1/memories:batchCreate > /tmp/batch_response.json

# Display the response
cat /tmp/batch_response.json | jq .

echo ""
echo "‚úÖ Batch memory creation completed"

üì¶ Creating remaining memories in batch...

{
  "results": [
    {
      "success": true,
      "memory": {
        "memoryId": "019b2d8d-6766-7126-89e5-57640b2741da",
        "spaceId": "019b2d89-d0f6-75ec-bbfc-00a432208952",
        "originalContentRef": "",
        "contentType": "application/pdf",
        "processingStatus": "PENDING",
        "metadata": {
          "filename": "employee_handbook.pdf",
          "source": "sample_documents",
          "ingestion_method": "batch"
        },
        "createdAt": 1765995800421,
        "updatedAt": 1765995800421,
        "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
        "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e",
        "chunkingConfig": {
          "none": null,
          "recursive": {
            "chunkSize": 256,
            "chunkOverlap": 25,
            "separators": [
              "\n\n",
              "\n",
              ". ",
              " ",
              ""
            ],
            "keepStr

In [26]:
# Extract batch memory IDs and combine with single ID
import json

# Read batch response - it returns array of created memories
with open('/tmp/batch_response.json', 'r') as f:
    batch_result = json.load(f)
batch_memory_ids = [m['memory']['memoryId'] for m in batch_result['results']]

# Combine with single memory ID
MEMORY_IDS.extend(batch_memory_ids)

# Save to file for bash access
with open('/tmp/memory_ids.json', 'w') as f:
    json.dump({"memoryIds": MEMORY_IDS}, f)

print(f"‚úÖ Extracted {len(batch_memory_ids)} batch memory IDs")
print(f"   Total memory IDs tracked: {len(MEMORY_IDS)}")
print(f"   Saved to /tmp/memory_ids.json for batch status checking")

‚úÖ Extracted 4 batch memory IDs
   Total memory IDs tracked: 5
   Saved to /tmp/memory_ids.json for batch status checking


In [31]:
%%bash
# Check status of our tracked memories using batchGet
echo "üìö Checking status of tracked memories:"
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d @/tmp/memory_ids.json \
  $GOODMEM_HOST/v1/memories:batchGet \
  | jq '.results[] | {memoryId: .memory.memoryId, filename: .memory.metadata.filename, status: .memory.processingStatus}'

üìö Checking status of tracked memories:

{
  "memoryId": "019b2d8a-26c5-70cf-84ff-51c39833bf09",
  "filename": "company_handbook.txt",
  "status": "COMPLETED"
}
{
  "memoryId": "019b2d8d-6766-7126-89e5-57640b2741da",
  "filename": "employee_handbook.pdf",
  "status": "COMPLETED"
}
{
  "memoryId": "019b2d8d-6766-7126-89e5-57640b2741db",
  "filename": "product_faq.txt",
  "status": "COMPLETED"
}
{
  "memoryId": "019b2d8d-6766-7126-89e5-57640b2741dc",
  "filename": "security_policy.txt",
  "status": "COMPLETED"
}
{
  "memoryId": "019b2d8d-6766-7126-89e5-57640b2741dd",
  "filename": "technical_documentation.txt",
  "status": "COMPLETED"
}


In [32]:
%%bash
# Wait for all memories to complete processing
echo "‚è≥ Waiting for document processing to complete..."
echo "   üí° Using batchGet to check status of tracked memory IDs"
echo ""

MAX_WAIT=120
ELAPSED=0

while [ $ELAPSED -lt $MAX_WAIT ]; do
  # Call batchGet to check status
  RESPONSE=$(curl -s -X POST \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Content-Type: application/json" \
    -d @/tmp/memory_ids.json \
    $GOODMEM_HOST/v1/memories:batchGet)
  
  # Count total, completed, and failed memories using jq
  TOTAL=$(echo "$RESPONSE" | jq '.results | length')
  COMPLETED=$(echo "$RESPONSE" | jq '[.results[].memory.processingStatus] | map(select(. == "COMPLETED")) | length')
  FAILED=$(echo "$RESPONSE" | jq '[.results[].memory.processingStatus] | map(select(. == "FAILED")) | length')
  
  echo "üìä Processing status: COMPLETED: $COMPLETED/$TOTAL"
  
  # Check if all completed
  if [ "$COMPLETED" -eq "$TOTAL" ]; then
    echo "‚úÖ All documents processed successfully!"
    echo "üéâ Ready for semantic search and retrieval!"
    break
  fi
  
  # Check if any failed
  if [ "$FAILED" -gt 0 ]; then
    echo "‚ùå $FAILED memories failed processing"
    # Show which ones failed
    echo "$RESPONSE" | jq -r '.results[] | select(.memory.processingStatus == "FAILED") | "   Failed: \(.memory.metadata.filename // .memory.memoryId)"'
    break
  fi
  
  # Wait 5 seconds before next check
  sleep 5
  ELAPSED=$((ELAPSED + 5))
done

# Check for timeout
if [ $ELAPSED -ge $MAX_WAIT ]; then
  echo "‚è∞ Timeout waiting for processing (waited ${MAX_WAIT}s)"
fi


‚è≥ Waiting for document processing to complete...
   üí° Using batchGet to check status of tracked memory IDs

üìä Processing status: COMPLETED: 5/5
‚úÖ All documents processed successfully!
üéâ Ready for semantic search and retrieval!


## Semantic Search & Retrieval

### Why Semantic Search?

**Traditional keyword search**:
- Matches exact words or simple variations
- Misses conceptually similar content with different wording
- Example: "vacation days" won't match "time off policy"

**Semantic search**:
- Understands meaning and context
- Finds conceptually similar content regardless of exact wording
- Example: "vacation days" successfully matches "time off policy"

### How It Works

```
Query: "vacation policy" 
   ‚Üì (embed with same embedder)
Query Vector: [0.23, -0.45, ...]
   ‚Üì (compare to all chunk vectors)
Most Similar Chunks: (by cosine similarity)
   1. "TIME OFF POLICY..." (score: -0.604)
   2. "Vacation requests..." (score: -0.544)
   3. "WORK HOURS..." (score: -0.458)
```

### Understanding Relevance Scores

GoodMem uses **cosine distance** (negative cosine similarity):
- **Lower values = more relevant** (e.g., -0.6 is better than -0.4)
- **Range**: Typically -1.0 (most similar) to 0.0 (unrelated)
- **Good threshold**: Results under -0.3 are usually relevant
- **Context matters**: Exact scores vary by embedder and content

### Streaming API Benefits

GoodMem's streaming API:
- **Real-time results**: Process chunks as they arrive
- **Low latency**: Start showing results immediately
- **Memory efficient**: No need to buffer entire result set
- **Progressive UI**: Update interface as more results come in

### What We'll Do

1. Implement a semantic search function using GoodMem's streaming API
2. Process different event types (chunks, memories, metadata)
3. Display results with relevance scores
4. Test with various queries to see semantic matching in action

Now comes the exciting part! Let's perform semantic search using GoodMem's streaming API. This will:

- **Find relevant chunks** based on semantic similarity
- **Stream results** in real-time
- **Include relevance scores** for ranking
- **Return structured data** for easy processing

In [37]:
%%bash
# Perform semantic search using streaming retrieve API
echo "üîç Searching for: 'What is the vacation policy for employees?'"
echo "üìÅ Space ID: $SPACE_ID"
echo "üìä Max results: 5"
echo "--------------------------------------------------"

curl -s -X POST --no-buffer \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Accept: application/x-ndjson" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the vacation policy for employees?",
    "spaceKeys": [{"spaceId": "'"$SPACE_ID"'"}],
    "requestedSize": 5,
    "fetchMemory": true,
    "fetchMemoryContent": false
  }' \
  $GOODMEM_HOST/v1/memories:retrieve

echo ""
echo "‚úÖ Search completed"

üîç Searching for: 'What is the vacation policy for employees?'
üìÅ Space ID: 019b2d89-d0f6-75ec-bbfc-00a432208952
üìä Max results: 5
--------------------------------------------------
{"resultSetBoundary":{"resultSetId":"019b2d9a-9bfd-7336-9235-4cefa8d63e5f","kind":"BEGIN","stageName":"retrieve","expectedItems":5}}
{"memoryDefinition":{"memoryId":"019b2d8d-6766-7126-89e5-57640b2741da","spaceId":"019b2d89-d0f6-75ec-bbfc-00a432208952","originalContentRef":"","contentType":"application/pdf","processingStatus":"COMPLETED","metadata":{"source":"sample_documents","filename":"employee_handbook.pdf","ingestion_method":"batch"},"createdAt":1765995800421,"updatedAt":1765995804037,"createdById":"cf5df949-31c6-4c54-af50-f8002107164e","updatedById":"cf5df949-31c6-4c54-af50-f8002107164e"}}
{"retrievedItem":{"chunk":{"resultSetId":"019b2d9a-9bfd-7336-9235-4cefa8d63e5f","chunk":{"chunkId":"019b2d8d-6bd9-716b-b51d-d40dd3b89b9e","memoryId":"019b2d8d-6766-7126-89e5-57640b2741da","chunkSequenceNumber"

In [38]:
%%bash
# Test semantic search with multiple queries
QUERIES=(
  "How do I reset my password?"
  "What are the security requirements for remote work?"
  "API authentication and rate limits"
  "Employee benefits and health insurance"
  "How much does the software cost?"
)

for i in "${!QUERIES[@]}"; do
  query="${QUERIES[$i]}"
  echo ""
  echo "üîç Test Query $((i+1)): $query"
  echo "============================================================"
  
  curl -s -X POST --no-buffer \
    -H "x-api-key: $GOODMEM_API_KEY" \
    -H "Accept: application/x-ndjson" \
    -H "Content-Type: application/json" \
    -d '{
      "message": "'"$query"'",
      "spaceKeys": [{"spaceId": "'"$SPACE_ID"'"}],
      "requestedSize": 3,
      "fetchMemory": true
    }' \
    $GOODMEM_HOST/v1/memories:retrieve | head -20
  
  echo ""
  echo "------------------------------------------------------------"
done


üîç Test Query 1: How do I reset my password?
{"resultSetBoundary":{"resultSetId":"019b2d9b-30ac-7379-a2e8-37fb9d35dd6f","kind":"BEGIN","stageName":"retrieve","expectedItems":3}}
{"memoryDefinition":{"memoryId":"019b2d8d-6766-7126-89e5-57640b2741da","spaceId":"019b2d89-d0f6-75ec-bbfc-00a432208952","originalContentRef":"","contentType":"application/pdf","processingStatus":"COMPLETED","metadata":{"source":"sample_documents","filename":"employee_handbook.pdf","ingestion_method":"batch"},"createdAt":1765995800421,"updatedAt":1765995804037,"createdById":"cf5df949-31c6-4c54-af50-f8002107164e","updatedById":"cf5df949-31c6-4c54-af50-f8002107164e"}}
{"retrievedItem":{"chunk":{"resultSetId":"019b2d9b-30ac-7379-a2e8-37fb9d35dd6f","chunk":{"chunkId":"019b2d8d-6bd9-716b-b51d-d40dd3b89c9f","memoryId":"019b2d8d-6766-7126-89e5-57640b2741da","chunkSequenceNumber":2327,"chunkText":"password they use to gain access to computers or the Internet, as well as any change to \nsuch password.  Such notice mus

## Advanced Features

Congratulations! üéâ You've successfully built a semantic search system using GoodMem. Here's what you've accomplished:

### ‚úÖ What You Built
- **Document ingestion pipeline** with automatic chunking and embedding
- **Semantic search system** with relevance scoring
- **Simple Q&A system** using GoodMem's vector capabilities

### üöÄ Next Steps for Advanced Implementation

#### Reranking
Improve search quality by adding a reranking stage. **Rerankers** are specialized models that re-score search results to improve relevance:

- **Two-stage retrieval**: Fast initial retrieval with embeddings, then precise reranking
- **Better relevance**: Rerankers use cross-attention to understand query-document relationships
- **Reduced costs**: Rerank only top-K results instead of entire corpus
- **Voyage AI reranker**: Industry-leading reranking model with state-of-the-art performance

The combination of fast embedding-based retrieval followed by accurate reranking provides the best balance of speed and quality for production RAG systems.

## Configuring a Reranker

To further improve search quality, we can add a **reranker** to our RAG pipeline. While embedders provide fast semantic search, rerankers use more sophisticated models to re-score the top results for better accuracy.

### Why Use Reranking?

1. **Higher Accuracy**: Rerankers use cross-encoder architectures that directly compare queries and documents
2. **Two-Stage Pipeline**: Fast retrieval with embeddings + precise reranking = optimal performance
3. **Cost Effective**: Only rerank top-K results (e.g., top 20) rather than entire corpus

### Voyage AI Reranker

We'll use Voyage AI's `rerank-2.5` model, which provides:
- **State-of-the-art performance** on reranking benchmarks
- **Fast inference** optimized for production use
- **Simple API** that integrates seamlessly with GoodMem

**Note**: You'll need a Voyage AI API key set in your environment variable `VOYAGE_API_KEY`.

In [39]:
%%bash
# Create Voyage AI reranker
echo "üîß Creating Voyage AI rerank-2.5 reranker..."
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "Voyage Rerank 2.5",
    "providerType": "VOYAGE",
    "endpointUrl": "https://api.voyageai.com",
    "modelIdentifier": "rerank-2.5",
    "apiPath": "/v1/rerank",
    "supportedModalities": ["TEXT"],
    "credentials": {
      "kind": "CREDENTIAL_KIND_API_KEY",
      "apiKey": {
        "inlineSecret": "'"$VOYAGE_API_KEY"'",
        "headerName": "Authorization",
        "prefix": "Bearer "
      }
    },
    "description": "Voyage AI reranker for improving search result relevance"
  }' \
  $GOODMEM_HOST/v1/rerankers > /tmp/reranker_output.txt

cat /tmp/reranker_output.txt | jq .

üîß Creating Voyage AI rerank-2.5 reranker...

{
  "rerankerId": "019b2d9b-9e6d-70a1-bced-49509193f4c3",
  "displayName": "Voyage Rerank 2.5",
  "description": "Voyage AI reranker for improving search result relevance",
  "providerType": "VOYAGE",
  "endpointUrl": "https://api.voyageai.com",
  "apiPath": "/v1/rerank",
  "modelIdentifier": "rerank-2.5",
  "supportedModalities": [
    "TEXT"
  ],
  "labels": {},
  "version": null,
  "monitoringEndpoint": null,
  "ownerId": "cf5df949-31c6-4c54-af50-f8002107164e",
  "createdAt": 1765996732013,
  "updatedAt": 1765996732013,
  "createdById": "cf5df949-31c6-4c54-af50-f8002107164e",
  "updatedById": "cf5df949-31c6-4c54-af50-f8002107164e"
}


In [40]:
# Extract and store reranker ID
reranker_id_list = !jq -r '.rerankerId' /tmp/reranker_output.txt
reranker_id = reranker_id_list[0] if reranker_id_list else ""

%env RERANKER_ID={reranker_id}

env: RERANKER_ID=019b2d9b-9e6d-70a1-bced-49509193f4c3


## Registering an LLM

The final component in our RAG pipeline is the **LLM (Large Language Model)** - the generation component that creates natural language responses using the retrieved and reranked context.

### Role of LLMs in RAG

After retrieving and reranking relevant chunks, the LLM:
1. **Receives the query** and retrieved context
2. **Generates a response** that synthesizes information from multiple sources
3. **Maintains coherence** while staying grounded in the retrieved facts

### OpenAI GPT-4o-mini

We'll use OpenAI's `gpt-4o-mini` model, which provides:
- **Fast inference** with low latency for real-time applications
- **Cost-effective** pricing compared to larger models
- **High quality** responses suitable for most RAG use cases
- **Function calling** support for advanced workflows

**Note**: This uses the same `OPENAI_API_KEY` environment variable as the embedder.

In [41]:
%%bash
# Register OpenAI GPT-4o-mini LLM
echo "üîß Registering OpenAI GPT-4o-mini LLM..."
echo ""

curl -s -X POST \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "OpenAI GPT-4o Mini",
    "providerType": "OPENAI",
    "endpointUrl": "https://api.openai.com/v1",
    "modelIdentifier": "gpt-4o-mini",
    "apiPath": "/chat/completions",
    "supportedModalities": ["TEXT"],
    "credentials": {
      "kind": "CREDENTIAL_KIND_API_KEY",
      "apiKey": {
        "inlineSecret": "'"$OPENAI_API_KEY"'",
        "headerName": "Authorization",
        "prefix": "Bearer "
      }
    },
    "capabilities": {
      "supportsChat": true,
      "supportsCompletion": false,
      "supportsFunctionCalling": true,
      "supportsSystemMessages": true,
      "supportsStreaming": true,
      "supportsSamplingParameters": true
    },
    "description": "OpenAI GPT-4o Mini model for fast and efficient text generation"
  }' \
  $GOODMEM_HOST/v1/llms > /tmp/llm_output.txt

cat /tmp/llm_output.txt | jq .

üîß Registering OpenAI GPT-4o-mini LLM...

{
  "llm": {
    "llmId": "019b2d9b-dedb-744c-b0e4-194b33b0266a",
    "displayName": "OpenAI GPT-4o Mini",
    "description": "OpenAI GPT-4o Mini model for fast and efficient text generation",
    "providerType": "OPENAI",
    "endpointUrl": "https://api.openai.com/v1",
    "apiPath": "/chat/completions",
    "modelIdentifier": "gpt-4o-mini",
    "supportedModalities": [
      "TEXT"
    ],
    "labels": {},
    "version": null,
    "monitoringEndpoint": null,
    "capabilities": {
      "supportsChat": true,
      "supportsCompletion": false,
      "supportsFunctionCalling": true,
      "supportsSystemMessages": true,
      "supportsStreaming": true,
      "supportsSamplingParameters": true
    },
    "defaultSamplingParams": null,
    "maxContextLength": null,
    "clientConfig": null,
    "ownerId": "cf5df949-31c6-4c54-af50-f8002107164e",
    "createdAt": 1765996748507,
    "updatedAt": 1765996748507,
    "createdById": "cf5df949-31c6-4c54

In [42]:
# Extract and store LLM ID
llm_id_list = !jq -r '.llm.llmId' /tmp/llm_output.txt
llm_id = llm_id_list[0] if llm_id_list else ""

%env LLM_ID={llm_id}

env: LLM_ID=019b2d9b-dedb-744c-b0e4-194b33b0266a


## Enhanced RAG with Reranking and LLM Generation

Now that we have all the components configured (embedder, reranker, and LLM), let's use the complete RAG pipeline! This demonstrates the full power of GoodMem:

1. **Retrieval**: Fast semantic search finds relevant chunks
2. **Reranking**: Voyage AI reranker re-scores results for better relevance  
3. **Generation**: OpenAI GPT-4o-mini generates a coherent response using the reranked context

This provides significantly better answer quality compared to simple retrieval alone.

In [46]:
%%bash
# Execute complete RAG pipeline with reranker and LLM
echo "Testing Complete RAG Pipeline with Reranker + LLM"
echo ""
echo "üîç RAG Query: 'What is the vacation policy for employees?'"
echo "üìÅ Space ID: $SPACE_ID"
echo "üìä Max results: 3"
echo "======================================================================"

curl -s -X POST --no-buffer \
  -H "x-api-key: $GOODMEM_API_KEY" \
  -H "Accept: application/x-ndjson" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the vacation policy for employees?",
    "spaceKeys": [{"spaceId": "'"$SPACE_ID"'"}],
    "requestedSize": 3,
    "fetchMemory": true,
    "fetchMemoryContent": false,
    "postProcessor": {
      "name": "com.goodmem.retrieval.postprocess.ChatPostProcessorFactory",
      "config": {
        "llm_id": "'"$LLM_ID"'",
        "reranker_id": "'"$RERANKER_ID"'",
        "relevance_threshold": 0.3,
        "max_results": 3
      }
    }
  }' \
  $GOODMEM_HOST/v1/memories:retrieve

echo ""
echo "‚úÖ RAG pipeline completed"

Testing Complete RAG Pipeline with Reranker + LLM

üîç RAG Query: 'What is the vacation policy for employees?'
üìÅ Space ID: 019b2d89-d0f6-75ec-bbfc-00a432208952
üìä Max results: 3
{"resultSetBoundary":{"resultSetId":"019b2d9d-5136-7627-b2bb-e4009e3d856f","kind":"BEGIN","stageName":"rerank","expectedItems":3}}
{"memoryDefinition":{"memoryId":"019b2d8a-964a-7214-8865-58c92cc822b3","spaceId":"019b2d89-d0f6-75ec-bbfc-00a432208952","originalContentRef":"","contentType":"application/pdf","processingStatus":"COMPLETED","metadata":{"source":"sample_documents","filename":"employee_handbook.pdf","ingestion_method":"batch"},"createdAt":1765995615817,"updatedAt":1765995621094,"createdById":"cf5df949-31c6-4c54-af50-f8002107164e","updatedById":"cf5df949-31c6-4c54-af50-f8002107164e"}}
{"retrievedItem":{"chunk":{"resultSetId":"019b2d9d-5136-7627-b2bb-e4009e3d856f","chunk":{"chunkId":"019b2d8a-a031-73f1-84ec-558bd64c3c3a","memoryId":"019b2d8a-964a-7214-8865-58c92cc822b3","chunkSequenceNumber":1435,

## üéâ Congratulations! What You Built

You've successfully built a complete **Retrieval-Augmented Generation (RAG) system** using GoodMem! Let's recap what you accomplished.

### Components You Configured

| Component | Purpose | Function |
|-----------|---------|----------|
| **Embedder** | Convert text to vectors | Transform documents into semantic embeddings |
| **Space** | Organize and store documents | Logical container with chunking configuration |
| **Memories** | Store searchable content | Documents chunked and indexed for retrieval |
| **Reranker** | Improve search precision | Re-score results for better relevance |
| **LLM** | Generate natural language | Create coherent answers from retrieved context |

### The Complete RAG Pipeline

```
üìÑ Documents
   ‚Üì Chunking (256 chars, 25 overlap)
   ‚Üì Embedding (convert to vectors)
üóÑÔ∏è  Vector Storage (GoodMem Space)
   ‚Üì 
üîç User Query
   ‚Üì Semantic Search (retrieve top-K)
   ‚Üì Reranking (re-score for precision)
   ‚Üì Context Selection (most relevant chunks)
ü§ñ LLM Generation (synthesize answer)
   ‚Üì
‚ú® Natural Language Answer
```

### Key Concepts You Learned

1. **Embedders**: Transform text into semantic vectors for similarity search
2. **Spaces**: Logical containers for organizing and searching documents
3. **Chunking**: Breaking documents into optimal sizes for retrieval
4. **Semantic Search**: Finding conceptually similar content, not just keyword matches
5. **Reranking**: Two-stage retrieval for better precision
6. **Streaming API**: Real-time, memory-efficient result processing
7. **RAG Architecture**: Combining retrieval and generation for accurate, grounded responses

### Performance Improvements

**Basic search** (retrieval only):
- Fast retrieval using vector similarity
- Good recall, but may include less relevant results

**Enhanced RAG** (with reranker + LLM):
- Reranker improves precision significantly
- LLM synthesizes information from multiple chunks
- Better user experience with natural language answers
- Grounded in actual document content (no hallucinations)

### Next Steps & Advanced Topics

**Enhance Your RAG System**:
- **Multiple embedders**: Combine different embedders for better coverage
- **Custom chunking**: Tune chunk size/overlap for your content type
- **Metadata filtering**: Add filters to narrow search by document type, date, etc.
- **Hybrid search**: Combine semantic and keyword search
- **Context augmentation**: Include surrounding chunks for better LLM context

**Production Deployment**:
- **Monitoring**: Track query latency, relevance scores, user feedback
- **Scaling**: Horizontal scaling for high-traffic applications
- **Cost optimization**: Balance quality vs. API costs
- **Caching**: Cache frequent queries for faster responses
- **Error handling**: Robust exception handling and retry logic

**Advanced Features**:
- **Multi-space search**: Query across multiple knowledge bases
- **Query expansion**: Transform queries for better retrieval
- **Result aggregation**: Combine and deduplicate results
- **Streaming generation**: Progressive LLM responses for real-time UX
- **Fine-tuning**: Customize models for your specific domain

### Resources

- **Documentation**: [https://docs.goodmem.ai](https://docs.goodmem.ai)
- **Community**: Join discussions and share your implementations
- **Examples**: Explore more advanced use cases and patterns

---

**Great job!** You now have a solid foundation for building production RAG systems with GoodMem. üöÄ
