# Building a Basic RAG Agent with GoodMem

## Overview

This tutorial will guide you through building a complete **Retrieval-Augmented Generation (RAG)** system using GoodMem's vector memory capabilities. By the end of this guide, you'll have a functional Q&A system that can:

- üîç **Semantically search** through your documents
- üìù **Generate contextual answers** using retrieved information 
- üèóÔ∏è **Scale to handle** large document collections

### What is RAG?

RAG combines the power of **retrieval** (finding relevant information) with **generation** (creating natural language responses). This approach allows AI systems to provide accurate, context-aware answers by:

1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the query with this context
3. **Generating** a comprehensive answer using both the query and retrieved information

### Why GoodMem for RAG?

GoodMem provides enterprise-grade vector storage with:
- **Multiple embedder support** for optimal retrieval accuracy
- **Streaming APIs** for real-time responses
- **Advanced post-processing** with reranking and summarization
- **Scalable architecture** for production workloads


## Prerequisites

Before starting, ensure you have:

- ‚úÖ **GoodMem server running** (install with: `curl -s https://get.goodmem.ai | bash`)
- ‚úÖ **Java 1.8+** installed
- ‚úÖ **Maven 3.8.3+ or Gradle 7.2+** for dependency management
- ‚úÖ **API key** for your GoodMem instance
- ‚úÖ **OpenAI API key** (For embedder and LLM - set `OPENAI_API_KEY` environment variable)
- ‚úÖ **Voyage AI API key** (For reranker - set `VOYAGE_API_KEY` environment variable)

**Note**: The OpenAI and Voyage AI API keys are only required if you want to use the advanced RAG features (embedder creation, reranking, and LLM generation) demonstrated in the later sections of this tutorial.

## Installation & Setup

First, let's install the required packages:

In [1]:
%%loadFromPOM
<dependency>
  <groupId>ai.pairsys.goodmem</groupId>
  <artifactId>goodmem-client-java</artifactId>
  <version>1.0.7</version>
</dependency>

<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.10.1</version>
</dependency>

## Authentication & Configuration

### Why This Matters

GoodMem uses API key authentication to secure your vector memory data. Proper configuration ensures:
- **Secure access** to your GoodMem instance
- **Isolated environments** (development, staging, production)
- **Usage tracking** and access control per API key

### What We'll Do

1. Configure the GoodMem host URL (where your server is running)
2. Set up API key authentication
3. Verify the configuration is correct

### Configuration Options

- **Local development**: `http://localhost:8080` (default)
- **Remote/production**: Your deployed GoodMem URL
- **Environment variables**: Best practice for managing credentials

Let's configure our GoodMem client and test the connection:

In [None]:
import ai.pairsys.goodmem.client.ApiClient;
import ai.pairsys.goodmem.client.Configuration;
import ai.pairsys.goodmem.client.auth.ApiKeyAuth;
import ai.pairsys.goodmem.client.ApiException;

// Import the API classes we'll use
import ai.pairsys.goodmem.client.api.SpacesApi;
import ai.pairsys.goodmem.client.api.MemoriesApi;
import ai.pairsys.goodmem.client.api.EmbeddersApi;
import ai.pairsys.goodmem.client.api.RerankersApi;
import ai.pairsys.goodmem.client.api.LlmsApi;

import ai.pairsys.goodmem.client.StreamingClient;
import ai.pairsys.goodmem.client.StreamingClient.*;

// Import model classes
import ai.pairsys.goodmem.client.model.*;

import java.util.*;
import java.util.stream.Stream;
import java.util.stream.Collectors;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.io.*;
import java.nio.file.*;

// Configuration - Update these values for your setup
String GOODMEM_HOST = System.getenv().getOrDefault("GOODMEM_HOST", "http://localhost:8080");
String GOODMEM_API_KEY = System.getenv().getOrDefault("GOODMEM_API_KEY", "");

System.out.println("GoodMem Host: " + GOODMEM_HOST);
System.out.println("API Key configured: " + (!"your-api-key-here".equals(GOODMEM_API_KEY) ? "Yes" : "No - Please update"));

// Create and configure API client
ApiClient defaultClient = Configuration.getDefaultApiClient();
defaultClient.setBasePath(GOODMEM_HOST);
defaultClient.addDefaultHeader("X-API-Key", GOODMEM_API_KEY);

// Set up authentication
ApiKeyAuth apiKeyAuth = (ApiKeyAuth) defaultClient.getAuthentication("ApiKeyAuth");
apiKeyAuth.setApiKey(GOODMEM_API_KEY);

// Create API instances
SpacesApi spacesApi = new SpacesApi(defaultClient);
MemoriesApi memoriesApi = new MemoriesApi(defaultClient);
EmbeddersApi embeddersApi = new EmbeddersApi(defaultClient);

System.out.println("‚úÖ GoodMem client configured successfully!");

GoodMem Host: http://localhost:8080
API Key configured: Yes
‚úÖ GoodMem client configured successfully!


In [3]:
// Test connection by listing existing spaces
try {
    ListSpacesResponse response = spacesApi.listSpaces(null, null, null, null, null, null, null);
    
    System.out.println("‚úÖ Successfully connected to GoodMem!");
    List<Space> spaces = response.getSpaces();
    if (spaces != null) {
        System.out.println("   Found " + spaces.size() + " existing spaces");
    } else {
        System.out.println("   Found 0 existing spaces");
    }
    
} catch (ApiException e) {
    System.out.println("‚ùå Error connecting to GoodMem: " + e.getMessage());
    System.out.println("   Please check your API key and host configuration");
    System.out.println("   Response code: " + e.getCode());
} catch (Exception e) {
    System.out.println("‚ùå Unexpected error: " + e.getMessage());
    e.printStackTrace();
}

‚úÖ Successfully connected to GoodMem!
   Found 0 existing spaces


## Creating an Embedder

### Why Embedders Matter

An **embedder** is the foundation of semantic search. It converts text into high-dimensional vectors (embeddings) that capture meaning:

```
Text: "vacation policy" ‚Üí Vector: [0.23, -0.45, 0.67, ...]  (1536 dimensions)
```

These vectors enable:
- **Semantic similarity**: Find conceptually similar content, not just keyword matches
- **Context understanding**: Capture meaning beyond exact word matches
- **Efficient retrieval**: Fast vector comparisons using specialized indexes

### The RAG Pipeline Flow

```
Documents ‚Üí Embedder ‚Üí Vector Storage ‚Üí Semantic Search ‚Üí Retrieved Context
```

### Choosing an Embedder

**OpenAI `text-embedding-3-small`** (what we'll use):
- ‚úÖ **High quality**: Excellent for most use cases
- ‚úÖ **Fast**: Low latency for real-time applications  
- ‚úÖ **1536 dimensions**: Good balance of quality and storage
- ‚úÖ **Cost-effective**: $0.02 per 1M tokens

**Other options**:
- **text-embedding-3-large**: Higher quality, 3072 dimensions, more expensive
- **Voyage AI**: Specialized for search, excellent retrieval performance
- **Cohere**: Good multilingual support
- **Local models**: HuggingFace sentence transformers for privacy/offline

### What We'll Do

1. Check if an embedder already exists
2. If not, create an OpenAI embedder with proper authentication
3. Verify the embedder is ready for use

**Note**: You'll need an OpenAI API key set in your environment variable `OPENAI_API_KEY`.

In [None]:
String openaiApiKey = System.getenv().getOrDefault("OPENAI_API_KEY", "");

if (openaiApiKey == null || openaiApiKey.isEmpty()) {
    System.out.println("‚ùå OPENAI_API_KEY environment variable not set!");
    System.out.println("   Please set your OpenAI API key:");
    System.out.println("   export OPENAI_API_KEY='your-api-key-here'");
} else {
    try {
        // Check if embedder already exists
        ListEmbeddersResponse embeddersResponse = embeddersApi.listEmbedders(null, null, null);
        List<EmbedderResponse> existingEmbedders = embeddersResponse.getEmbedders();

        EmbedderResponse existingEmbedder = null;
        for (EmbedderResponse embedder : existingEmbedders) {
            if ("OPENAI".equals(embedder.getProviderType().toString()) &&
                "text-embedding-3-small".equals(embedder.getModelIdentifier())) {
                existingEmbedder = embedder;
                break;
            }
        }

        if (existingEmbedder != null) {
            System.out.println("‚úÖ OpenAI embedder already exists!");
            System.out.println("   Display Name: " + existingEmbedder.getDisplayName());
            System.out.println("   Embedder ID: " + existingEmbedder.getEmbedderId());
            System.out.println("   Model: " + existingEmbedder.getModelIdentifier());
            System.out.println("   Dimensionality: " + existingEmbedder.getDimensionality());
        } else {
            System.out.println("üîß Creating new OpenAI embedder...");

            // Create API key authentication
            ai.pairsys.goodmem.client.model.ApiKeyAuth apiKeyAuth = new ai.pairsys.goodmem.client.model.ApiKeyAuth()
                .inlineSecret(openaiApiKey)
                .headerName("Authorization")
                .prefix("Bearer ");

            EndpointAuthentication credentials = new EndpointAuthentication()
                .kind(CredentialKind.CREDENTIAL_KIND_API_KEY)
                .apiKey(apiKeyAuth);

            // Create embedder request
            EmbedderCreationRequest embedderRequest = new EmbedderCreationRequest()
                .displayName("OpenAI Text Embedding 3 Small")
                .providerType(ProviderType.OPENAI)
                .endpointUrl("https://api.openai.com/v1")
                .modelIdentifier("text-embedding-3-small")
                .dimensionality(1536)
                .apiPath("/embeddings")
                .distributionType(DistributionType.DENSE)
                .credentials(credentials);

            EmbedderResponse newEmbedder = embeddersApi.createEmbedder(embedderRequest);

            System.out.println("‚úÖ Successfully created OpenAI embedder!");
            System.out.println("   Display Name: " + newEmbedder.getDisplayName());
            System.out.println("   Embedder ID: " + newEmbedder.getEmbedderId());
            System.out.println("   Provider: " + newEmbedder.getProviderType());
            System.out.println("   Model: " + newEmbedder.getModelIdentifier());
            System.out.println("   Dimensionality: " + newEmbedder.getDimensionality());
        }
    } catch (ApiException e) {
        System.out.println("‚ùå Error creating embedder: " + e.getMessage());
    }
}

üîß Creating new OpenAI embedder...
‚úÖ Successfully created OpenAI embedder!
   Display Name: OpenAI Text Embedding 3 Small
   Embedder ID: d313d1ff-79aa-4014-b99b-358f2ef99972
   Provider: OPENAI
   Model: text-embedding-3-small
   Dimensionality: 1536


## Creating Your First Space

### What is a Space?

A **Space** in GoodMem is a logical container for organizing related memories (documents). Think of it as a database or collection where you store and retrieve semantically similar content.

Each space has:
- **Associated embedders**: Which models convert text to vectors
- **Chunking configuration**: How documents are split into searchable pieces
- **Access controls**: Public or private, with permission management
- **Metadata labels**: For organization and filtering

### Use Cases for Multiple Spaces

You might create different spaces for:
- **By domain**: Technical docs, HR policies, product specs
- **By environment**: Development, staging, production
- **By customer**: Tenant-specific data in multi-tenant apps
- **By privacy level**: Public FAQ vs. internal knowledge base

### Why Chunking Matters

Documents are too large to search efficiently as whole units. Chunking:
- **Improves relevance**: Match specific sections, not entire documents
- **Enables context**: Return focused chunks that answer specific questions  
- **Optimizes retrieval**: Process and compare smaller text segments

**Our chunking strategy**:
- **256 characters**: Short enough for focused context, long enough for meaning
- **25 character overlap**: Ensures concepts spanning chunk boundaries aren't lost
- **Hierarchical separators**: Split on paragraphs first, then sentences, then words

### What We'll Do

1. List available embedders
2. Create a space with our embedder and chunking configuration
3. Add metadata labels for organization
4. Verify the space is ready

Let's create a space for our RAG demo:

In [5]:
// First, let's see what embedders are available
List<EmbedderResponse> availableEmbedders = new ArrayList<>();
EmbedderResponse defaultEmbedder = null;

try {
    ListEmbeddersResponse embeddersResponse = embeddersApi.listEmbedders(null, null, null);
    availableEmbedders = embeddersResponse.getEmbedders();
    

    System.out.println("üìã Available Embedders (" + availableEmbedders.size() + "):");
    for (int i = 0; i < availableEmbedders.size(); i++) {
        EmbedderResponse embedder = availableEmbedders.get(i);
        System.out.println("   " + (i + 1) + ". " + embedder.getDisplayName() + " - " + embedder.getProviderType());
        System.out.println("      Model: " + (embedder.getModelIdentifier() != null ? embedder.getModelIdentifier() : "N/A"));
        System.out.println("      ID: " + embedder.getEmbedderId());
        System.out.println();
    }
    
    if (!availableEmbedders.isEmpty()) {
        defaultEmbedder = availableEmbedders.get(0);
        System.out.println("üéØ Using embedder: " + defaultEmbedder.getDisplayName());
    } else {
        System.out.println("‚ö†Ô∏è  No embedders found. You may need to configure an embedder first.");
        System.out.println("   Refer to the documentation: See https://docs.goodmem.ai/docs/reference/cli/goodmem_embedder_create/");
    }
    
} catch (ApiException e) {
    System.out.println("‚ùå Error listing embedders: " + e.getMessage());
    defaultEmbedder = null;
}

üìã Available Embedders (1):
   1. OpenAI Text Embedding 3 Small - OPENAI
      Model: text-embedding-3-small
      ID: d313d1ff-79aa-4014-b99b-358f2ef99972

üéØ Using embedder: OpenAI Text Embedding 3 Small


In [6]:
// Create a space for our RAG demo
String SPACE_NAME = "RAG Demo Knowledge Base (Java)";
Space demoSpace = null;

// Define chunking configuration that we'll reuse throughout the tutorial
// Using fromJson for easier construction
String chunkingConfigJson = """
{
  "recursive": {
    "chunkSize": 256,
    "chunkOverlap": 25,
    "separators": ["\\n\\n", "\\n", ". ", " ", ""],
    "keepStrategy": "KEEP_END",
    "separatorIsRegex": false,
    "lengthMeasurement": "CHARACTER_COUNT"
  }
}
""";

ChunkingConfiguration demoChunkingConfig = ChunkingConfiguration.fromJson(chunkingConfigJson);

System.out.println("üìã Demo Chunking Configuration:");
System.out.println("   Chunk Size: " + demoChunkingConfig.getRecursive().getChunkSize() + " characters");
System.out.println("   Overlap: " + demoChunkingConfig.getRecursive().getChunkOverlap() + " characters");
System.out.println("   Strategy: " + demoChunkingConfig.getRecursive().getKeepStrategy());
System.out.println("   üí° This chunking config will be reused for all memory creation!");
System.out.println();

try {
    // Check if space already exists
    ListSpacesResponse existingSpaces = spacesApi.listSpaces(null, null, null, null, null, null, null);
    
    if (existingSpaces.getSpaces() != null) {
        for (Space space : existingSpaces.getSpaces()) {
            if (SPACE_NAME.equals(space.getName())) {
                System.out.println("üìÅ Space '" + SPACE_NAME + "' already exists");
                System.out.println("   Space ID: " + space.getSpaceId());
                System.out.println("   To remove existing space, see https://docs.goodmem.ai/docs/reference/cli/goodmem_space_delete/");
                demoSpace = space;
                break;
            }
        }
    }
    
    // Create space if it doesn't exist
    if (demoSpace == null) {
        // Configure space embedders if we have available embedders
        List<SpaceEmbedderConfig> spaceEmbedders = new ArrayList<>();
        if (defaultEmbedder != null) {
            SpaceEmbedderConfig embedderConfig = new SpaceEmbedderConfig();
            embedderConfig.setEmbedderId(defaultEmbedder.getEmbedderId());
            embedderConfig.setDefaultRetrievalWeight(1.0);
            spaceEmbedders.add(embedderConfig);
        }
        
        // Create space request with our saved chunking configuration
        SpaceCreationRequest createRequest = new SpaceCreationRequest();
        createRequest.setName(SPACE_NAME);
        
        Map<String, String> labels = new HashMap<>();
        labels.put("purpose", "rag-demo");
        labels.put("environment", "tutorial");
        labels.put("content-type", "documentation");
        labels.put("language", "java");
        createRequest.setLabels(labels);
        
        createRequest.setSpaceEmbedders(spaceEmbedders);
        createRequest.setPublicRead(false);  // Private space
        createRequest.setDefaultChunkingConfig(demoChunkingConfig);  // Use our saved config
        
        // Create the space
        demoSpace = spacesApi.createSpace(createRequest);
        
        System.out.println("‚úÖ Created space: " + demoSpace.getName());
        System.out.println("   Space ID: " + demoSpace.getSpaceId());
        System.out.println("   Embedders: " + (demoSpace.getSpaceEmbedders() != null ? demoSpace.getSpaceEmbedders().size() : 0));
        System.out.println("   Labels: " + demoSpace.getLabels());
        System.out.println("   Chunking Config Saved: " + demoChunkingConfig.getRecursive().getChunkSize() + " chars with " + demoChunkingConfig.getRecursive().getChunkOverlap() + " overlap");
    }
    
} catch (ApiException e) {
    System.out.println("‚ùå Error creating space: " + e.getMessage());
    System.out.println("   Response code: " + e.getCode());
    demoSpace = null;
} catch (Exception e) {
    System.out.println("‚ùå Error parsing chunking configuration: " + e.getMessage());
    demoSpace = null;
}

üìã Demo Chunking Configuration:
   Chunk Size: 256 characters
   Overlap: 25 characters
   Strategy: KEEP_END
   üí° This chunking config will be reused for all memory creation!

‚úÖ Created space: RAG Demo Knowledge Base (Java)
   Space ID: 845d400b-3604-4efc-9b4e-c190d0241561
   Embedders: 1
   Labels: {environment=tutorial, purpose=rag-demo, content-type=documentation, language=java}
   Chunking Config Saved: 256 chars with 25 overlap


In [7]:
// Verify our space configuration
if (demoSpace != null) {
    try {
        // Get detailed space information
        Space spaceDetails = spacesApi.getSpace(demoSpace.getSpaceId());
        
        System.out.println("üîç Space Configuration:");
        System.out.println("   Name: " + spaceDetails.getName());
        System.out.println("   Owner ID: " + spaceDetails.getOwnerId());
        System.out.println("   Public Read: " + spaceDetails.getPublicRead());
        System.out.println("   Created: " + spaceDetails.getCreatedAt());
        System.out.println("   Labels: " + spaceDetails.getLabels());
        
        System.out.println("\nü§ñ Associated Embedders:");
        if (spaceDetails.getSpaceEmbedders() != null) {
            for (SpaceEmbedder embedderAssoc : spaceDetails.getSpaceEmbedders()) {
                System.out.println("   Embedder ID: " + embedderAssoc.getEmbedderId());
                System.out.println("   Retrieval Weight: " + embedderAssoc.getDefaultRetrievalWeight());
            }
        } else {
            System.out.println("   No embedders configured");
        }
        
    } catch (ApiException e) {
        System.out.println("‚ùå Error getting space details: " + e.getMessage());
    }
} else {
    System.out.println("‚ö†Ô∏è  No space available for the demo");
}

üîç Space Configuration:
   Name: RAG Demo Knowledge Base (Java)
   Owner ID: cf5df949-31c6-4c54-af50-f8002107164e
   Public Read: false
   Created: 1764981059556
   Labels: {purpose=rag-demo, language=java, environment=tutorial, content-type=documentation}

ü§ñ Associated Embedders:
   Embedder ID: d313d1ff-79aa-4014-b99b-358f2ef99972
   Retrieval Weight: 1.0


## Adding Documents to Memory

### The Document Processing Pipeline

When you add a document to GoodMem, it goes through several automated steps:

```
1. Ingestion ‚Üí 2. Chunking ‚Üí 3. Embedding ‚Üí 4. Indexing ‚Üí 5. Ready for Search
```

**What happens**:
1. **Ingestion**: Document content and metadata are stored
2. **Chunking**: Text is split according to your configuration (256 chars, 25 overlap)
3. **Embedding**: Each chunk is converted to a vector by your embedder
4. **Indexing**: Vectors are indexed for fast similarity search
5. **Status**: Document marked as `COMPLETED` and ready for retrieval

### Single vs. Batch Operations

**Single memory creation** (`CreateMemory`):
- ‚úÖ Good for: Real-time ingestion, single documents
- ‚úÖ Synchronous processing with immediate status
- ‚ö†Ô∏è Higher overhead for bulk operations

**Batch memory creation** (`BatchCreateMemory`):
- ‚úÖ Good for: Bulk imports, initial setup, periodic updates
- ‚úÖ Lower overhead, efficient for multiple documents
- ‚úÖ Async processing - check status via `ListMemories`
- ‚ö†Ô∏è Takes longer to get individual status feedback

### Metadata Best Practices

Rich metadata helps with:
- **Filtering**: Retrieve specific document types
- **Source attribution**: Show users where information came from
- **Organization**: Group and manage related documents
- **Debugging**: Track ingestion methods and dates

### What We'll Do

1. Load sample documents from local files
2. Create one document using single memory creation (to demo the API)
3. Create remaining documents using batch operation (more efficient)
4. Monitor processing status until all documents are ready

We'll use sample company documents that represent common business use cases:

In [8]:
// Helper class to hold document information
class DocumentInfo {
    String filename;
    String content;           // For text files
    String contentB64;        // For binary files (base64 encoded)
    String contentType;       // "text/plain" or "application/pdf"
    boolean isBinary;         // true for PDFs, false for text
    
    // Constructor for text files
    DocumentInfo(String filename, String content, String contentType) {
        this.filename = filename;
        this.content = content;
        this.contentType = contentType;
        this.isBinary = false;
    }
    
    // Constructor for binary files
    DocumentInfo(String filename, String contentB64, String contentType, boolean isBinary) {
        this.filename = filename;
        this.contentB64 = contentB64;
        this.contentType = contentType;
        this.isBinary = isBinary;
    }
}

// Load our sample documents with auto-discovery
List<DocumentInfo> loadSampleDocuments() {
    /**
     * Load sample documents from the sample_documents directory.
     * 
     * Automatically discovers all files in the directory and handles:
     * - .txt files: Read as plain text
     * - .pdf files: Read as binary and base64 encode
     */
    List<DocumentInfo> documents = new ArrayList<>();
    String sampleDir = "sample_documents";
    
    // Check if directory exists
    Path dirPath = Paths.get(sampleDir);
    if (!Files.exists(dirPath)) {
        System.out.println("‚ö†Ô∏è  Directory not found: " + sampleDir);
        return documents;
    }
    
    try {
        // Auto-discover and sort files - using a more compatible approach
        List<Path> files = new ArrayList<>();
        Files.list(dirPath)
            .filter(Files::isRegularFile)
            .sorted()
            .forEach(files::add);
        
        for (Path filePath : files) {
            String filename = filePath.getFileName().toString();
            
            // Determine file extension
            String fileExt = "";
            int lastDot = filename.lastIndexOf('.');
            if (lastDot > 0) {
                fileExt = filename.substring(lastDot).toLowerCase();
            }
            
            if (".txt".equals(fileExt)) {
                // Handle text files
                try {
                    byte[] bytes = Files.readAllBytes(filePath);
                    String content = new String(bytes, java.nio.charset.StandardCharsets.UTF_8);
                    documents.add(new DocumentInfo(filename, content, "text/plain"));
                    System.out.println("üìÑ Loaded: " + filename + " (" + String.format("%,d", content.length()) + " characters)");
                } catch (IOException e) {
                    System.out.println("‚ö†Ô∏è  Error reading " + filename + ": " + e.getMessage());
                }
                
            } else if (".pdf".equals(fileExt)) {
                // Handle PDF files with base64 encoding
                try {
                    byte[] binaryContent = Files.readAllBytes(filePath);
                    String contentB64 = Base64.getEncoder().encodeToString(binaryContent);
                    documents.add(new DocumentInfo(filename, contentB64, "application/pdf", true));
                    System.out.println("üìÑ Loaded: " + filename + " (" + String.format("%,d", binaryContent.length) + " bytes, base64: " + String.format("%,d", contentB64.length()) + " chars)");
                } catch (IOException e) {
                    System.out.println("‚ö†Ô∏è  Error reading " + filename + ": " + e.getMessage());
                }
                
            } else {
                System.out.println("‚ö†Ô∏è  Skipping unsupported file type: " + filename);
            }
        }
        
    } catch (IOException e) {
        System.out.println("‚ùå Error listing files in directory: " + e.getMessage());
    }
    
    return documents;
}

// Load the documents
List<DocumentInfo> sampleDocs = loadSampleDocuments();
System.out.println("\nüìö Total documents loaded: " + sampleDocs.size());

üìÑ Loaded: company_handbook.txt (2,342 characters)
üìÑ Loaded: employee_handbook.pdf (399,615 bytes, base64: 532,820 chars)
üìÑ Loaded: product_faq.txt (4,043 characters)
üìÑ Loaded: security_policy.txt (4,211 characters)
üìÑ Loaded: technical_documentation.txt (2,384 characters)

üìö Total documents loaded: 5


In [9]:
// Create the first memory individually to demonstrate single memory creation
Memory createSingleMemory(String spaceId, DocumentInfo document) {
    try {
        // Create memory request
        MemoryCreationRequest memoryRequest = new MemoryCreationRequest();
        memoryRequest.setSpaceId(spaceId);
        memoryRequest.setContentType(document.contentType);
        memoryRequest.setChunkingConfig(demoChunkingConfig);
        
        // Use appropriate content field based on binary flag
        if (document.isBinary) {
            memoryRequest.setOriginalContentB64(document.contentB64);  // Base64 for PDFs
        } else {
            memoryRequest.setOriginalContent(document.content);         // Plain text
        }
        
        Map<String, String> metadata = new HashMap<>();
        metadata.put("filename", document.filename);
        metadata.put("source", "sample_documents");
        metadata.put("ingestion_method", "single");  // Track how this was ingested
        memoryRequest.setMetadata(metadata);
        
        // Create the memory
        Memory memory = memoriesApi.createMemory(memoryRequest);
        
        System.out.println("‚úÖ Created single memory: " + document.filename);
        System.out.println("   Memory ID: " + memory.getMemoryId());
        System.out.println("   Content Type: " + document.contentType);
        System.out.println("   Status: " + memory.getProcessingStatus());
        System.out.println();
        
        return memory;
        
    } catch (ApiException e) {
        System.out.println("‚ùå Error creating memory for " + document.filename + ": " + e.getMessage());
        return null;
    } catch (Exception e) {
        System.out.println("‚ùå Unexpected error with " + document.filename + ": " + e.getMessage());
        return null;
    }
}

Memory singleMemory = null;
if (demoSpace != null && !sampleDocs.isEmpty()) {
    // Create the first document using single memory creation
    DocumentInfo firstDoc = sampleDocs.get(0);
    System.out.println("üìù Creating first document using CreateMemory API:");
    System.out.println("   Document: " + firstDoc.filename);
    System.out.println("   Content Type: " + firstDoc.contentType);
    System.out.println("   Method: Individual memory creation");
    System.out.println();
    
    singleMemory = createSingleMemory(demoSpace.getSpaceId(), firstDoc);
    
    if (singleMemory != null) {
        System.out.println("üéØ Single memory creation completed successfully!");
    } else {
        System.out.println("‚ö†Ô∏è  Single memory creation failed");
    }
} else {
    System.out.println("‚ö†Ô∏è  Cannot create memory: missing space or documents");
}

üìù Creating first document using CreateMemory API:
   Document: company_handbook.txt
   Content Type: text/plain
   Method: Individual memory creation

‚úÖ Created single memory: company_handbook.txt
   Memory ID: a1adb4d9-5183-48c9-b377-242f865c7823
   Content Type: text/plain
   Status: PENDING

üéØ Single memory creation completed successfully!


In [10]:
// Demonstrate retrieving a memory by ID using getMemory
if (singleMemory != null) {
    try {
        System.out.println("üìñ Retrieving memory details using getMemory API:");
        System.out.println("   Memory ID: " + singleMemory.getMemoryId());
        System.out.println();
        
        // Retrieve the memory without content
        Memory retrievedMemory = memoriesApi.getMemory(singleMemory.getMemoryId(), false, false);
        
        System.out.println("‚úÖ Successfully retrieved memory:");
        System.out.println("   Memory ID: " + retrievedMemory.getMemoryId());
        System.out.println("   Space ID: " + retrievedMemory.getSpaceId());
        System.out.println("   Status: " + retrievedMemory.getProcessingStatus());
        System.out.println("   Content Type: " + retrievedMemory.getContentType());
        System.out.println("   Created At: " + retrievedMemory.getCreatedAt());
        System.out.println("   Updated At: " + retrievedMemory.getUpdatedAt());
        
        if (retrievedMemory.getMetadata() != null) {
            System.out.println("\n   üìã Metadata:");
            Map<String, String> metadata = (Map<String, String>) retrievedMemory.getMetadata();
            for (Map.Entry<String, String> entry : metadata.entrySet()) {
                System.out.println("      " + entry.getKey() + ": " + entry.getValue());
            }
        }
        
        // Now retrieve with content included
        System.out.println("\nüìñ Retrieving memory with content:");
        Memory retrievedWithContent = memoriesApi.getMemory(singleMemory.getMemoryId(), true, false);
        
        if (retrievedWithContent.getOriginalContent() != null) {
            // Decode the base64 encoded content
            byte[] decodedBytes = Base64.getDecoder().decode(retrievedWithContent.getOriginalContent());
            String decodedContent = new String(decodedBytes, "UTF-8");
            
            System.out.println("‚úÖ Content retrieved and decoded:");
            System.out.println("   Content length: " + decodedContent.length() + " characters");
            String preview = decodedContent.length() > 200 ? 
                decodedContent.substring(0, 200) + "..." : decodedContent;
            System.out.println("   First 200 chars: " + preview);
        } else {
            System.out.println("‚ö†Ô∏è  No content available");
        }
            
    } catch (ApiException e) {
        System.out.println("‚ùå Error retrieving memory: " + e.getMessage());
        System.out.println("   Status code: " + e.getCode());
    } catch (Exception e) {
        System.out.println("‚ùå Unexpected error: " + e.getMessage());
        e.printStackTrace();
    }
} else {
    System.out.println("‚ö†Ô∏è  No memory available to retrieve");
}

üìñ Retrieving memory details using getMemory API:
   Memory ID: a1adb4d9-5183-48c9-b377-242f865c7823

‚úÖ Successfully retrieved memory:
   Memory ID: a1adb4d9-5183-48c9-b377-242f865c7823
   Space ID: 845d400b-3604-4efc-9b4e-c190d0241561
   Status: PENDING
   Content Type: text/plain
   Created At: 1764981059895
   Updated At: 1764981059895

   üìã Metadata:
      source: sample_documents
      filename: company_handbook.txt
      ingestion_method: single

üìñ Retrieving memory with content:
‚úÖ Content retrieved and decoded:
   Content length: 2342 characters
   First 200 chars: ACME Corporation Employee Handbook

Welcome to ACME Corporation! This handbook provides essential information about our company policies, procedures, and culture.

COMPANY OVERVIEW
ACME Corporation is...


In [11]:
// Create the remaining documents using batch memory creation
void createBatchMemories(String spaceId, List<DocumentInfo> documents) {
    
    // Prepare batch memory requests using our saved chunking configuration
    List<MemoryCreationRequest> memoryRequests = new ArrayList<>();
    for (DocumentInfo doc : documents) {
        
        // Create memory request with our saved chunking configuration
        MemoryCreationRequest memoryRequest = new MemoryCreationRequest();
        memoryRequest.setSpaceId(spaceId);
        memoryRequest.setContentType(doc.contentType);
        memoryRequest.setChunkingConfig(demoChunkingConfig);   // Reuse saved chunking configuration
        
        // Use appropriate content field based on binary flag
        if (doc.isBinary) {
            memoryRequest.setOriginalContentB64(doc.contentB64);  // Base64 for PDFs
        } else {
            memoryRequest.setOriginalContent(doc.content);         // Plain text
        }
        
        Map<String, String> metadata = new HashMap<>();
        metadata.put("filename", doc.filename);
        metadata.put("source", "sample_documents");
        metadata.put("ingestion_method", "batch");
        memoryRequest.setMetadata(metadata);
        
        memoryRequests.add(memoryRequest);
    }
    
    try {
        // Create batch request
        BatchMemoryCreationRequest batchRequest = new BatchMemoryCreationRequest();
        batchRequest.setRequests(memoryRequests);
        
        System.out.println("üì¶ Creating " + memoryRequests.size() + " memories using BatchCreateMemory API:");
        
        // Execute batch creation - this may return void on success
        memoriesApi.batchCreateMemory(batchRequest);
        
        System.out.println("‚úÖ Batch creation request submitted successfully");
        
    } catch (ApiException e) {
        System.out.println("‚ùå Error during batch creation: " + e.getMessage());
        System.out.println("   Response code: " + e.getCode());
    } catch (Exception e) {
        System.out.println("‚ùå Unexpected error during batch creation: " + e.getMessage());
        e.printStackTrace();
    }
}

if (demoSpace != null && sampleDocs.size() > 1) {
    // Create the remaining documents (skip the first one we already created)
    List<DocumentInfo> remainingDocs = sampleDocs.subList(1, sampleDocs.size());
    createBatchMemories(demoSpace.getSpaceId(), remainingDocs);
    
    System.out.println("\nüìã Total Memory Creation Summary:");
    System.out.println("   üìÑ Single CreateMemory: 1 document");
    System.out.println("   üì¶ Batch CreateMemory: " + remainingDocs.size() + " documents submitted");
    System.out.println("   ‚è≥ Check processing status in the next cell");
    
} else {
    System.out.println("‚ö†Ô∏è  Cannot create batch memories: insufficient documents or missing space");
}

üì¶ Creating 4 memories using BatchCreateMemory API:
‚úÖ Batch creation request submitted successfully

üìã Total Memory Creation Summary:
   üìÑ Single CreateMemory: 1 document
   üì¶ Batch CreateMemory: 4 documents submitted
   ‚è≥ Check processing status in the next cell


In [12]:
// List all memories in our space to verify they're ready
if (demoSpace != null) {
    try {
        MemoryListResponse memoriesResponse = memoriesApi.listMemories(demoSpace.getSpaceId(), null, null, null, null, null, null);
        List<Memory> memories = memoriesResponse.getMemories();
        
        System.out.println("üìö Memories in space '" + demoSpace.getName() + "':");
        System.out.println("   Total memories: " + (memories != null ? memories.size() : 0));
        System.out.println();
        
        if (memories != null) {
            for (int i = 0; i < memories.size(); i++) {
                Memory memory = memories.get(i);
                Map<String, String> metadata = (Map<String, String>) memory.getMetadata();
                String filename = metadata != null ? metadata.getOrDefault("filename", "Unknown") : "Unknown";
                String description = metadata != null ? metadata.getOrDefault("description", "No description") : "No description";

                System.out.println("   " + (i + 1) + ". " + filename);
                System.out.println("      Status: " + memory.getProcessingStatus());
                System.out.println("      Description: " + description);
                System.out.println("      Created: " + memory.getCreatedAt());
                System.out.println();
            }
        }
        
    } catch (ApiException e) {
        System.out.println("‚ùå Error listing memories: " + e.getMessage());
    }
}

üìö Memories in space 'RAG Demo Knowledge Base (Java)':
   Total memories: 5

   1. technical_documentation.txt
      Status: PENDING
      Description: No description
      Created: 1764981060133

   2. employee_handbook.pdf
      Status: PENDING
      Description: No description
      Created: 1764981060133

   3. security_policy.txt
      Status: PENDING
      Description: No description
      Created: 1764981060133

   4. product_faq.txt
      Status: PENDING
      Description: No description
      Created: 1764981060133

   5. company_handbook.txt
      Status: PENDING
      Description: No description
      Created: 1764981059895



In [13]:
// Monitor processing status for all created memories
boolean waitForProcessingCompletion(String spaceId, int maxWaitSeconds) {
    System.out.println("‚è≥ Waiting for document processing to complete...");
    System.out.println("   üí° Note: Batch memories are processed asynchronously, so we check by listing all memories in the space");
    System.out.println();
    
    long startTime = System.currentTimeMillis();
    long maxWaitMs = maxWaitSeconds * 1000L;
    
    while (System.currentTimeMillis() - startTime < maxWaitMs) {
        try {
            // List memories in our space
            MemoryListResponse memoriesResponse = memoriesApi.listMemories(spaceId, null, null, null, null, null, null);
            List<Memory> memories = memoriesResponse.getMemories();
            
            if (memories == null) {
                System.out.println("üìä No memories found in space");
                return false;
            }
            
            // Check processing status
            Map<String, Integer> statusCounts = new HashMap<>();
            for (Memory memory : memories) {
                String status = memory.getProcessingStatus();
                statusCounts.put(status, statusCounts.getOrDefault(status, 0) + 1);
            }
            
            System.out.println("üìä Processing status: " + statusCounts + " (Total: " + memories.size() + " memories)");
            
            // Check if all are completed
            boolean allCompleted = true;
            for (Memory memory : memories) {
                if (!"COMPLETED".equals(memory.getProcessingStatus())) {
                    allCompleted = false;
                    break;
                }
            }
            
            if (allCompleted) {
                System.out.println("‚úÖ All documents processed successfully!");
                return true;
            }
                
            // Check for any failures
            int failedCount = statusCounts.getOrDefault("FAILED", 0);
            if (failedCount > 0) {
                System.out.println("‚ùå " + failedCount + " memories failed processing");
                return false;
            }
            
            Thread.sleep(5000);  // Wait 5 seconds before checking again
            
        } catch (ApiException e) {
            System.out.println("‚ùå Error checking processing status: " + e.getMessage());
            return false;
        } catch (InterruptedException e) {
            System.out.println("‚èπÔ∏è Interrupted while waiting for processing");
            return false;
        }
    }
    
    System.out.println("‚è∞ Timeout waiting for processing (waited " + maxWaitSeconds + "s)");
    return false;
}

boolean processingComplete = false;
if (demoSpace != null) {
    // Wait for processing to complete for all memories (single + batch)
    // Since batchCreateMemory returns void, we monitor by listing all memories
    processingComplete = waitForProcessingCompletion(demoSpace.getSpaceId(), 120);
    
    if (processingComplete) {
        System.out.println("üéâ Ready for semantic search and retrieval!");
        System.out.println("üìà Batch API benefit: Multiple documents submitted in a single API call");
        System.out.println("üîß Consistent chunking: All memories use demoChunkingConfig");
    } else {
        System.out.println("‚ö†Ô∏è  Some documents may still be processing. You can continue with the tutorial.");
    }
} else {
    System.out.println("‚ö†Ô∏è  Skipping processing check - no space available");
}

‚è≥ Waiting for document processing to complete...
   üí° Note: Batch memories are processed asynchronously, so we check by listing all memories in the space

üìä Processing status: {PENDING=5} (Total: 5 memories)


üìä Processing status: {COMPLETED=5} (Total: 5 memories)
‚úÖ All documents processed successfully!
üéâ Ready for semantic search and retrieval!
üìà Batch API benefit: Multiple documents submitted in a single API call
üîß Consistent chunking: All memories use demoChunkingConfig


## Semantic Search & Retrieval

### Why Semantic Search?

**Traditional keyword search**:
- Matches exact words or simple variations
- Misses conceptually similar content with different wording
- Example: "vacation days" won't match "time off policy"

**Semantic search**:
- Understands meaning and context
- Finds conceptually similar content regardless of exact wording
- Example: "vacation days" successfully matches "time off policy"

### How It Works

```
Query: "vacation policy" 
   ‚Üì (embed with same embedder)
Query Vector: [0.23, -0.45, ...]
   ‚Üì (compare to all chunk vectors)
Most Similar Chunks: (by cosine similarity)
   1. "TIME OFF POLICY..." (score: -0.604)
   2. "Vacation requests..." (score: -0.544)
   3. "WORK HOURS..." (score: -0.458)
```

### Understanding Relevance Scores

GoodMem uses **cosine distance** (negative cosine similarity):
- **Lower values = more relevant** (e.g., -0.6 is better than -0.4)
- **Range**: Typically -1.0 (most similar) to 0.0 (unrelated)
- **Good threshold**: Results under -0.3 are usually relevant
- **Context matters**: Exact scores vary by embedder and content

### Streaming API Benefits

GoodMem's streaming API:
- **Real-time results**: Process chunks as they arrive
- **Low latency**: Start showing results immediately
- **Memory efficient**: No need to buffer entire result set
- **Progressive UI**: Update interface as more results come in

### What We'll Do

1. Implement a semantic search function using GoodMem's streaming API
2. Process different event types (chunks, memories, metadata)
3. Display results with relevance scores
4. Test with various queries to see semantic matching in action

Now comes the exciting part! Let's perform semantic search using GoodMem's streaming API. This will:

- **Find relevant chunks** based on semantic similarity
- **Stream results** in real-time
- **Include relevance scores** for ranking
- **Return structured data** for easy processing

In [14]:
// Helper class to hold search results
class SearchResult {
    String chunkText;
    Double relevanceScore;
    Integer memoryIndex;
    String resultSetId;
    Integer chunkSequence;
    Map<String, Object> memory;
    
    SearchResult(String chunkText, Double relevanceScore, Integer memoryIndex, String resultSetId, Integer chunkSequence, Map<String, Object> memory) {
        this.chunkText = chunkText;
        this.relevanceScore = relevanceScore;
        this.memoryIndex = memoryIndex;
        this.resultSetId = resultSetId;
        this.chunkSequence = chunkSequence;
        this.memory = memory;
    }
}

List<SearchResult> semanticSearchStreaming(String query, String spaceId, int maxResults) {
    /**
     * Perform semantic search using GoodMem's streaming retrieval API.
     * 
     * Args:
     *     query: The search query
     *     spaceId: ID of the space to search
     *     maxResults: Maximum number of results to return
     * 
     * Returns:
     *     List of search results with chunks and metadata
     */
    
    try {
        System.out.println("üîç Streaming search for: '" + query + "'");
        System.out.println("üìÅ Space ID: " + spaceId);
        System.out.println("üìä Max results: " + maxResults);
        System.out.println("-".repeat(50));
        
        // // Create streaming client
        StreamingClient streamingClient = new StreamingClient(defaultClient);
        
        // // Create memory stream request
        MemoryStreamRequest streamRequest = new MemoryStreamRequest(query);
        streamRequest.setSpaceIds(List.of(spaceId));
        streamRequest.setRequestedSize(maxResults);
        streamRequest.setFetchMemory(true);
        streamRequest.setFetchMemoryContent(false);  // We don't need full content for this demo
        streamRequest.setFormat(StreamingClient.StreamingFormat.NDJSON);
        
        // // Get streaming results
        Stream<MemoryStreamResponse> stream = streamingClient.retrieveMemoryStream(streamRequest);
    
        List<SearchResult> retrievedChunks = new ArrayList<>();
        final int[] eventCount = {0};
        
        // Process streaming events
        stream.forEach(streamingEvent -> {
            eventCount[0]++;
            
            if (streamingEvent.getRetrievedItem() != null) {
                if (streamingEvent.getRetrievedItem().getChunk() != null) {
                    StreamChunkReference chunkRef = streamingEvent.getRetrievedItem().getChunk();
                    Map<String, Object> chunkData = chunkRef.getChunk();
                    
                    String chunkText = "";
                    Integer chunkSequence = 0;
                    
                    if (chunkData != null) {
                        chunkText = (String) chunkData.getOrDefault("chunkText", "");
                        Object seqObj = chunkData.get("chunkSequenceNumber");
                        if (seqObj instanceof Integer) {
                            chunkSequence = (Integer) seqObj;
                        } else if (seqObj instanceof Double) {
                            chunkSequence = ((Double) seqObj).intValue();
                        }
                    }
                    
                    SearchResult result = new SearchResult(
                        chunkText,
                        chunkRef.getRelevanceScore(),
                        chunkRef.getMemoryIndex(),
                        chunkRef.getResultSetId(),
                        chunkSequence,
                        null
                    );
                    
                    retrievedChunks.add(result);
                    
                    System.out.println("üìÑ Chunk " + retrievedChunks.size() + ":");
                    System.out.println("   Relevance: " + String.format("%.3f", chunkRef.getRelevanceScore()));
                    System.out.println("   Text: " + chunkText.substring(0, Math.min(chunkText.length(), 100)) + "...");
                    System.out.println();
                }
                else if (streamingEvent.getRetrievedItem().getMemory() != null) {
                    // Handle memory events if needed
                    Map<String, Object> memory = streamingEvent.getRetrievedItem().getMemory();
                    String memoryId = memory.containsKey("memoryId") ? memory.get("memoryId").toString() : "unknown";
                    System.out.println("üíæ Memory: " + memoryId);
                }
            }
            else if (streamingEvent.getResultSetBoundary() != null) {
                System.out.println("üîÑ " + streamingEvent.getResultSetBoundary().getKind() + 
                                 ": " + streamingEvent.getResultSetBoundary().getStageName());
            }
        });
        
        System.out.println("‚úÖ Streaming search completed: " + retrievedChunks.size() + " chunks found");
        System.out.println("   Total streaming events: " + eventCount[0]);
        return retrievedChunks;
    } catch (StreamingClient.StreamError e) {
        System.out.println("‚ùå Streaming error during search: " + e.getMessage());
        System.out.println("   Status code: " + e.getStatusCode());
        return new ArrayList<>();
    } catch (Exception e) {
        System.out.println("‚ùå Unexpected error during streaming search: " + e.getMessage());
        e.printStackTrace();
        return new ArrayList<>();
    }
}

// Test semantic streaming search with a sample query
List<SearchResult> searchResults = new ArrayList<>();
if (demoSpace != null) {
    String sampleQuery = "What is the vacation policy for employees?";
    searchResults = semanticSearchStreaming(sampleQuery, demoSpace.getSpaceId(), 5);
} else {
    System.out.println("‚ö†Ô∏è  No space available for search");
}

üîç Streaming search for: 'What is the vacation policy for employees?'
üìÅ Space ID: 845d400b-3604-4efc-9b4e-c190d0241561
üìä Max results: 5
--------------------------------------------------
üîÑ BEGIN: retrieve
üìÑ Chunk 1:
   Relevance: -0.680
   Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 2...

üìÑ Chunk 2:
   Relevance: -0.675
   Text: 1.  Eligibility 

 
All regular full-time employees are eligible for vacation benefits. 

 
2.  Accr...

üìÑ Chunk 3:
   Relevance: -0.662
   Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off w...

üìÑ Chunk 4:
   Relevance: -0.646
   Text: Vacation Pay: Vacation pay shall be based on the employee‚Äôs regular base rate and 
working schedule,...

üìÑ Chunk 5:
   Relevance: -0.643
   Text: employees can use paid vacation time in minimum increments of one day.xii 

 
Accumulating Vacation:...

üîÑ END: 
‚úÖ Streaming search compl

In [15]:
// Let's try a few different queries to see how streaming semantic search works
void testMultipleStreamingQueries(String spaceId) {
    /**
     * Test streaming semantic search with different types of queries.
     */
    
    String[] testQueries = {
        "How do I reset my password?",
        "What are the security requirements for remote work?",
        "API authentication and rate limits",
        "Employee benefits and health insurance",
        "How much does the software cost?"
    };
    
    for (int i = 0; i < testQueries.length; i++) {
        String query = testQueries[i];
        System.out.println("\nüîç Test Query " + (i + 1) + ": " + query);
        System.out.println("=".repeat(60));
        
        semanticSearchStreaming(query, spaceId, 3);
        
        System.out.println("\n" + "-".repeat(60));
    }
}

if (demoSpace != null) {
    testMultipleStreamingQueries(demoSpace.getSpaceId());
} else {
    System.out.println("‚ö†Ô∏è  No space available for testing multiple streaming queries");
}


üîç Test Query 1: How do I reset my password?
üîç Streaming search for: 'How do I reset my password?'
üìÅ Space ID: 845d400b-3604-4efc-9b4e-c190d0241561
üìä Max results: 3
--------------------------------------------------
üîÑ BEGIN: retrieve
üìÑ Chunk 1:
   Relevance: -0.370
   Text: password they use to gain access to computers or the Internet, as well as any change to 
such passwo...

üìÑ Chunk 2:
   Relevance: -0.363
   Text: - No reuse of last 12 passwords
- Must be changed every 90 days for privileged accounts
- Multi-fact...

üìÑ Chunk 3:
   Relevance: -0.305
   Text: Each classification level has specific handling, storage, and transmission requirements outlined in ...

üîÑ END: 
‚úÖ Streaming search completed: 3 chunks found
   Total streaming events: 8

------------------------------------------------------------

üîç Test Query 2: What are the security requirements for remote work?
üîç Streaming search for: 'What are the security requirements for remote work?'
ü

## Advanced Features

Congratulations! üéâ You've successfully built a semantic search system using GoodMem. Here's what you've accomplished:

### ‚úÖ What You Built
- **Document ingestion pipeline** with automatic chunking and embedding
- **Semantic search system** with relevance scoring
- **Simple Q&A system** using GoodMem's vector capabilities

### üöÄ Next Steps for Advanced Implementation

#### Reranking
Improve search quality by adding a reranking stage. **Rerankers** are specialized models that re-score search results to improve relevance:

- **Two-stage retrieval**: Fast initial retrieval with embeddings, then precise reranking
- **Better relevance**: Rerankers use cross-attention to understand query-document relationships
- **Reduced costs**: Rerank only top-K results instead of entire corpus
- **Voyage AI reranker**: Industry-leading reranking model with state-of-the-art performance

The combination of fast embedding-based retrieval followed by accurate reranking provides the best balance of speed and quality for production RAG systems.

## Configuring a Reranker

To further improve search quality, we can add a **reranker** to our RAG pipeline. While embedders provide fast semantic search, rerankers use more sophisticated models to re-score the top results for better accuracy.

### Why Use Reranking?

1. **Higher Accuracy**: Rerankers use cross-encoder architectures that directly compare queries and documents
2. **Two-Stage Pipeline**: Fast retrieval with embeddings + precise reranking = optimal performance
3. **Cost Effective**: Only rerank top-K results (e.g., top 20) rather than entire corpus

### Voyage AI Reranker

We'll use Voyage AI's `rerank-2.5` model, which provides:
- **State-of-the-art performance** on reranking benchmarks
- **Fast inference** optimized for production use
- **Simple API** that integrates seamlessly with GoodMem

**Note**: You'll need a Voyage AI API key set in your environment variable `VOYAGE_API_KEY`.

In [None]:
// Create or retrieve Voyage AI reranker and store for reuse
RerankerResponse voyageReranker = null;

String voyageApiKey = System.getenv().getOrDefault("VOYAGE_API_KEY", "");

if (voyageApiKey == null || voyageApiKey.isEmpty()) {
    System.out.println("‚ùå VOYAGE_API_KEY environment variable not set!");
    System.out.println("   Please set your Voyage AI API key:");
    System.out.println("   export VOYAGE_API_KEY='your-api-key-here'");
} else {
    try {
        // Initialize RerankersApi
        RerankersApi rerankersApi = new RerankersApi(defaultClient);

        // Check if reranker already exists
        ListRerankersResponse rerankersResponse = rerankersApi.listRerankers(null, null, null);
        List<RerankerResponse> existingRerankers = rerankersResponse.getRerankers();

        RerankerResponse existingReranker = null;
        for (RerankerResponse reranker : existingRerankers) {
            if ("VOYAGE".equals(reranker.getProviderType().toString()) &&
                "rerank-2.5".equals(reranker.getModelIdentifier())) {
                existingReranker = reranker;
                break;
            }
        }

        if (existingReranker != null) {
            voyageReranker = existingReranker;  // Store existing reranker
            System.out.println("‚úÖ Voyage reranker already exists!");
            System.out.println("   Display Name: " + voyageReranker.getDisplayName());
            System.out.println("   Reranker ID: " + voyageReranker.getRerankerId());
            System.out.println("   Model: " + voyageReranker.getModelIdentifier());
        } else {
            System.out.println("üîß Creating new Voyage reranker...");

            // Create API key authentication
            ai.pairsys.goodmem.client.model.ApiKeyAuth apiKeyAuth = new ai.pairsys.goodmem.client.model.ApiKeyAuth()
                .inlineSecret(voyageApiKey)
                .headerName("Authorization")
                .prefix("Bearer ");

            EndpointAuthentication credentials = new EndpointAuthentication()
                .kind(CredentialKind.CREDENTIAL_KIND_API_KEY)
                .apiKey(apiKeyAuth);

            // Create reranker request
            RerankerCreationRequest rerankerRequest = new RerankerCreationRequest()
                .displayName("Voyage Rerank 2.5")
                .providerType(ProviderType.VOYAGE)
                .endpointUrl("https://api.voyageai.com")
                .modelIdentifier("rerank-2.5")
                .apiPath("/v1/rerank")
                .credentials(credentials)
                .description("Voyage AI reranker for improving search result relevance");

            RerankerResponse newReranker = rerankersApi.createReranker(rerankerRequest);
            voyageReranker = newReranker;  // Store new reranker

            System.out.println("‚úÖ Successfully created Voyage reranker!");
            System.out.println("   Display Name: " + voyageReranker.getDisplayName());
            System.out.println("   Reranker ID: " + voyageReranker.getRerankerId());
            System.out.println("   Provider: " + voyageReranker.getProviderType());
            System.out.println("   Model: " + voyageReranker.getModelIdentifier());
        }
        
        // Print stored variable info
        if (voyageReranker != null) {
            System.out.println("\nüíæ Stored for reuse:");
            System.out.println("   Variable: voyageReranker");
            System.out.println("   Reranker ID: " + voyageReranker.getRerankerId());
        }
    } catch (ApiException e) {
        System.out.println("‚ùå Error creating reranker: " + e.getMessage());
    }
}

üîß Creating new Voyage reranker...
‚úÖ Successfully created Voyage reranker!
   Display Name: Voyage Rerank 2.5
   Reranker ID: bf3bbf6b-48d8-4536-ac7c-05b5e9d4ab11
   Provider: VOYAGE
   Model: rerank-2.5

üíæ Stored for reuse:
   Variable: voyageReranker
   Reranker ID: bf3bbf6b-48d8-4536-ac7c-05b5e9d4ab11


## Registering an LLM

The final component in our RAG pipeline is the **LLM (Large Language Model)** - the generation component that creates natural language responses using the retrieved and reranked context.

### Role of LLMs in RAG

After retrieving and reranking relevant chunks, the LLM:
1. **Receives the query** and retrieved context
2. **Generates a response** that synthesizes information from multiple sources
3. **Maintains coherence** while staying grounded in the retrieved facts

### OpenAI GPT-4o-mini

We'll use OpenAI's `gpt-4o-mini` model, which provides:
- **Fast inference** with low latency for real-time applications
- **Cost-effective** pricing compared to larger models
- **High quality** responses suitable for most RAG use cases
- **Function calling** support for advanced workflows

**Note**: This uses the same `OPENAI_API_KEY` environment variable as the embedder.

In [None]:
// Register OpenAI GPT-4o-mini LLM and store for reuse
LLMResponse openaiLlm = null;

String openaiApiKey = System.getenv().getOrDefault("OPENAI_API_KEY", "");

if (openaiApiKey == null || openaiApiKey.isEmpty()) {
    System.out.println("‚ùå OPENAI_API_KEY environment variable not set!");
    System.out.println("   Please set your OpenAI API key:");
    System.out.println("   export OPENAI_API_KEY='your-api-key-here'");
} else {
    try {
        // Initialize LlmsApi
        LlmsApi llmsApi = new LlmsApi(defaultClient);

        // Check if LLM already exists
        ListLLMsResponse llmsResponse = llmsApi.listLLMs(null, null, null);
        List<LLMResponse> existingLLMs = llmsResponse.getLlms();

        LLMResponse existingLLM = null;
        for (LLMResponse llm : existingLLMs) {
            if ("OPENAI".equals(llm.getProviderType().toString()) &&
                "gpt-4o-mini".equals(llm.getModelIdentifier())) {
                existingLLM = llm;
                break;
            }
        }

        if (existingLLM != null) {
            openaiLlm = existingLLM;  // Store existing LLM
            System.out.println("‚úÖ OpenAI GPT-4o-mini LLM already exists!");
            System.out.println("   Display Name: " + openaiLlm.getDisplayName());
            System.out.println("   LLM ID: " + openaiLlm.getLlmId());
            System.out.println("   Model: " + openaiLlm.getModelIdentifier());
        } else {
            System.out.println("üîß Registering new OpenAI GPT-4o-mini LLM...");

            // Create API key authentication
           ai.pairsys.goodmem.client.model.ApiKeyAuth apiKeyAuth = new ai.pairsys.goodmem.client.model.ApiKeyAuth()
                .inlineSecret(openaiApiKey)
                .headerName("Authorization")
                .prefix("Bearer ");

            EndpointAuthentication credentials = new EndpointAuthentication()
                .kind(CredentialKind.CREDENTIAL_KIND_API_KEY)
                .apiKey(apiKeyAuth);

            // Define LLM capabilities
            LLMCapabilities capabilities = new LLMCapabilities()
                .supportsChat(true)
                .supportsCompletion(false)
                .supportsFunctionCalling(true)
                .supportsSystemMessages(true)
                .supportsStreaming(true)
                .supportsSamplingParameters(true);

            // Create LLM request
            LLMCreationRequest llmRequest = new LLMCreationRequest()
                .displayName("OpenAI GPT-4o Mini")
                .providerType(LLMProviderType.OPENAI)
                .endpointUrl("https://api.openai.com/v1")
                .modelIdentifier("gpt-4o-mini")
                .apiPath("/chat/completions")
                .credentials(credentials)
                .capabilities(capabilities)
                .description("OpenAI's GPT-4o Mini model for fast and efficient text generation");

            CreateLLMResponse response = llmsApi.createLLM(llmRequest);
            LLMResponse newLLM = response.getLlm();
            openaiLlm = newLLM;  // Store new LLM

            System.out.println("‚úÖ Successfully registered OpenAI GPT-4o-mini LLM!");
            System.out.println("   Display Name: " + openaiLlm.getDisplayName());
            System.out.println("   LLM ID: " + openaiLlm.getLlmId());
            System.out.println("   Provider: " + openaiLlm.getProviderType());
            System.out.println("   Model: " + openaiLlm.getModelIdentifier());
        }
        
        // Print stored variable info
        if (openaiLlm != null) {
            System.out.println("\nüíæ Stored for reuse:");
            System.out.println("   Variable: openaiLlm");
            System.out.println("   LLM ID: " + openaiLlm.getLlmId());
        }
    } catch (ApiException e) {
        System.out.println("‚ùå Error registering LLM: " + e.getMessage());
    }
}

üîß Registering new OpenAI GPT-4o-mini LLM...
‚úÖ Successfully registered OpenAI GPT-4o-mini LLM!
   Display Name: OpenAI GPT-4o Mini
   LLM ID: fc1b0f5b-ccef-4eda-9d41-c028f6143ae4
   Provider: OPENAI
   Model: gpt-4o-mini

üíæ Stored for reuse:
   Variable: openaiLlm
   LLM ID: fc1b0f5b-ccef-4eda-9d41-c028f6143ae4


## Enhanced RAG with Reranking and LLM Generation

Now that we have all the components configured (embedder, reranker, and LLM), let's use the complete RAG pipeline! This demonstrates the full power of GoodMem:

1. **Retrieval**: Fast semantic search finds relevant chunks
2. **Reranking**: Voyage AI reranker re-scores results for better relevance  
3. **Generation**: OpenAI GPT-4o-mini generates a coherent response using the reranked context

This provides significantly better answer quality compared to simple retrieval alone.

In [None]:
// Helper classes for RAG results
class RagChunk {
    String chunkText;
    double relevanceScore;

    RagChunk(String chunkText, double relevanceScore) {
        this.chunkText = chunkText;
        this.relevanceScore = relevanceScore;
    }
}

class RagResult {
    String llmResponse;
    List<RagChunk> chunks;

    RagResult(String llmResponse, List<RagChunk> chunks) {
        this.llmResponse = llmResponse;
        this.chunks = chunks;
    }
}

// RAG pipeline function - wraps streaming with reranking and LLM generation
RagResult ragPipelineStreaming(String query, String spaceId, String rerankerId, String llmId, int maxResults) {
    /**
     * Perform semantic search with reranking and LLM generation.
     *
     * This demonstrates the complete RAG pipeline:
     * 1. Retrieval - Find relevant chunks using semantic search
     * 2. Reranking - Re-score results with reranker
     * 3. Generation - Generate answer with LLM
     *
     * @param query The search query
     * @param spaceId ID of the space to search
     * @param rerankerId ID of the reranker to use
     * @param llmId ID of the LLM for generation
     * @param maxResults Maximum number of results
     * @return RagResult containing LLM response and reranked chunks
     */

    System.out.println("üîç RAG Query: '" + query + "'");
    System.out.println("üìÅ Space ID: " + spaceId);
    System.out.println("üìä Max results: " + maxResults);
    System.out.println("======================================================================");

    try {
        // Create streaming client
        StreamingClient streamingClient = new StreamingClient(defaultClient);

        // Build post-processor configuration
        Map<String, Object> postProcessorConfig = new HashMap<>();
        postProcessorConfig.put("llm_id", llmId);           // Use passed ID
        postProcessorConfig.put("reranker_id", rerankerId);  // Use passed ID
        postProcessorConfig.put("relevance_threshold", 0.3);
        postProcessorConfig.put("max_results", maxResults);

        // Create memory stream request with post-processor
        AdvancedMemoryStreamRequest request = new  AdvancedMemoryStreamRequest(query);
        request.setSpaceIds(Collections.singletonList(spaceId));
        request.setRequestedSize(maxResults);
        request.setFetchMemory(true);
        request.setFetchMemoryContent(false);
        request.setPostProcessorName("com.goodmem.retrieval.postprocess.ChatPostProcessorFactory");
        request.setPostProcessorConfig(postProcessorConfig);
        request.setFormat(StreamingClient.StreamingFormat.NDJSON);

        final String[] llmResponse = {null};
        List<RagChunk> rerankedChunks = new ArrayList<>();

        // Process streaming events
        Stream<MemoryStreamResponse> responseStream = streamingClient.retrieveMemoryStreamAdvanced(request);
        responseStream.forEach(event -> {
            // Handle LLM-generated response
            if (event.getAbstractReply() != null && llmResponse[0] == null) {
                llmResponse[0] = event.getAbstractReply().getText();
                System.out.println("\nü§ñ LLM Generated Response:");
                System.out.println("   " + llmResponse[0]);
                System.out.println();
                System.out.println("----------------------------------------------------------------------");
            }

            // Handle reranked chunks
            if (event.getRetrievedItem() != null && event.getRetrievedItem().getChunk() != null) {
                StreamChunkReference chunkRef = event.getRetrievedItem().getChunk();
                Map<String, Object> chunkData = chunkRef.getChunk();

                String chunkText = (String) chunkData.get("chunkText");
                double relevanceScore = chunkRef.getRelevanceScore();
                rerankedChunks.add(new RagChunk(chunkText, relevanceScore));

                System.out.println("   üìÑ Chunk " + rerankedChunks.size() + ":");
                System.out.println("      Relevance: " + String.format("%.3f", relevanceScore));
                String preview = chunkText.length() > 150 ? chunkText.substring(0, 150) + "..." : chunkText;
                System.out.println("      Text: " + preview);
                System.out.println();
            }
        });

        System.out.println("======================================================================");
        System.out.println("‚úÖ RAG completed successfully");
        System.out.println("   LLM response: " + (llmResponse[0] != null ? "‚úì" : "‚úó"));
        System.out.println("   Reranked chunks: " + rerankedChunks.size());

        return new RagResult(llmResponse[0], rerankedChunks);

    } catch (StreamingClient.StreamError e) {
        System.out.println("‚ùå Streaming error during RAG: " + e.getMessage());
        System.out.println("   Status code: " + e.getStatusCode());
        return null;
    } catch (Exception e) {
        System.out.println("‚ùå Unexpected error during RAG: " + e.getMessage());
        e.printStackTrace();
        return null;
    }
}

// Usage - uses stored variables from cells 25 and 27
if (demoSpace != null && voyageReranker != null && openaiLlm != null) {
    System.out.println("Testing Complete RAG Pipeline with Reranker + LLM\n");
    
    String testQuery = "What is the vacation policy for employees?";

    RagResult ragResult = ragPipelineStreaming(
        testQuery,
        demoSpace.getSpaceId(),
        voyageReranker.getRerankerId(),  // From cell 25 stored variable
        openaiLlm.getLlmId(),             // From cell 27 stored variable
        3
    );

    if (ragResult != null) {
        System.out.println("\nüéâ RAG pipeline completed!");
        System.out.println("   LLM Response: " + (ragResult.llmResponse != null ? "Available" : "None"));
        System.out.println("   Chunks Retrieved: " + ragResult.chunks.size());
    }
} else {
    System.out.println("‚ö†Ô∏è  Cannot run RAG pipeline: missing space, reranker, or LLM");
    if (demoSpace == null) System.out.println("   - Missing space: Please run cell 11 first");
    if (voyageReranker == null) System.out.println("   - Missing reranker: Please run cell 25 first");
    if (openaiLlm == null) System.out.println("   - Missing LLM: Please run cell 27 first");
}

Testing Complete RAG Pipeline with Reranker + LLM

üîç RAG Query: 'What is the vacation policy for employees?'
üìÅ Space ID: 845d400b-3604-4efc-9b4e-c190d0241561
üìä Max results: 3
   üìÑ Chunk 1:
      Relevance: 0.863
      Text: TIME OFF POLICY
All full-time employees receive:
- 15 days of paid vacation annually (increases to 20 days after 3 years)
- 10 sick days per year
- 8 ...

   üìÑ Chunk 2:
      Relevance: 0.824
      Text: [ORGANIZATION] has established the following vacation plan to provide eligible employees 
time off with pay so that they may be free from their regula...

   üìÑ Chunk 3:
      Relevance: 0.770
      Text: 1.  Eligibility 

 
All regular full-time employees are eligible for vacation benefits. 

 
2.  Accrual 

 
Eligible employees accrue vacation in acco...


ü§ñ LLM Generated Response:
   The vacation policy for employees states that all full-time employees receive 15 days of paid vacation annually, which increases to 20 days after three years of s

## üéâ Congratulations! What You Built

You've successfully built a complete **Retrieval-Augmented Generation (RAG) system** using GoodMem! Let's recap what you accomplished.

### Components You Configured

| Component | Purpose | Function |
|-----------|---------|----------|
| **Embedder** | Convert text to vectors | Transform documents into semantic embeddings |
| **Space** | Organize and store documents | Logical container with chunking configuration |
| **Memories** | Store searchable content | Documents chunked and indexed for retrieval |
| **Reranker** | Improve search precision | Re-score results for better relevance |
| **LLM** | Generate natural language | Create coherent answers from retrieved context |

### The Complete RAG Pipeline

```
üìÑ Documents
   ‚Üì Chunking (256 chars, 25 overlap)
   ‚Üì Embedding (convert to vectors)
üóÑÔ∏è  Vector Storage (GoodMem Space)
   ‚Üì 
üîç User Query
   ‚Üì Semantic Search (retrieve top-K)
   ‚Üì Reranking (re-score for precision)
   ‚Üì Context Selection (most relevant chunks)
ü§ñ LLM Generation (synthesize answer)
   ‚Üì
‚ú® Natural Language Answer
```

### Key Concepts You Learned

1. **Embedders**: Transform text into semantic vectors for similarity search
2. **Spaces**: Logical containers for organizing and searching documents
3. **Chunking**: Breaking documents into optimal sizes for retrieval
4. **Semantic Search**: Finding conceptually similar content, not just keyword matches
5. **Reranking**: Two-stage retrieval for better precision
6. **Streaming API**: Real-time, memory-efficient result processing
7. **RAG Architecture**: Combining retrieval and generation for accurate, grounded responses

### Performance Improvements

**Basic search** (retrieval only):
- Fast retrieval using vector similarity
- Good recall, but may include less relevant results

**Enhanced RAG** (with reranker + LLM):
- Reranker improves precision significantly
- LLM synthesizes information from multiple chunks
- Better user experience with natural language answers
- Grounded in actual document content (no hallucinations)

### Next Steps & Advanced Topics

**Enhance Your RAG System**:
- **Multiple embedders**: Combine different embedders for better coverage
- **Custom chunking**: Tune chunk size/overlap for your content type
- **Metadata filtering**: Add filters to narrow search by document type, date, etc.
- **Hybrid search**: Combine semantic and keyword search
- **Context augmentation**: Include surrounding chunks for better LLM context

**Production Deployment**:
- **Monitoring**: Track query latency, relevance scores, user feedback
- **Scaling**: Horizontal scaling for high-traffic applications
- **Cost optimization**: Balance quality vs. API costs
- **Caching**: Cache frequent queries for faster responses
- **Error handling**: Robust exception handling and retry logic

**Advanced Features**:
- **Multi-space search**: Query across multiple knowledge bases
- **Query expansion**: Transform queries for better retrieval
- **Result aggregation**: Combine and deduplicate results
- **Streaming generation**: Progressive LLM responses for real-time UX
- **Fine-tuning**: Customize models for your specific domain

### Resources

- **Documentation**: [https://docs.goodmem.ai](https://docs.goodmem.ai)
- **Community**: Join discussions and share your implementations
- **Examples**: Explore more advanced use cases and patterns

---

**Great job!** You now have a solid foundation for building production RAG systems with GoodMem. üöÄ
