# Semantic Search with Couchbase GSI Vector Indexes and Jina AI

## Overview

This tutorial demonstrates building a high-performance semantic search engine using Couchbase's GSI (Global Secondary Index) vector search and Jina AI for embeddings and language models. We'll show measurable performance improvements with GSI optimization and implement a complete RAG (Retrieval-Augmented Generation) system.

**Key Features:**
- High-performance GSI vector search with BHIVE indexing
- Jina AI embeddings and language models
- Performance benchmarks showing GSI benefits
- Complete RAG workflow with caching optimization

**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled.

## How to Run This Tutorial

This tutorial is available as a Jupyter Notebook that you can run interactively on [Google Colab](https://colab.research.google.com/) or locally by setting up the Python environment.

## Prerequisites

### System Requirements

- **Couchbase Server 8.0+** or Couchbase Capella
- **Query Service enabled** (required for GSI Vector Indexes)
- **Jina AI API credentials** ([Get them here](https://jina.ai/))
- **JinaChat API credentials** ([Get them here](https://chat.jina.ai/api))

### Couchbase Capella Setup

1. **Create Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up)
2. **Configure Access:** Set up database credentials and network security  
3. **Enable Query Service:** Required for GSI vector search functionality

## Setup and Installation

### Install Required Libraries

Install the necessary packages for Couchbase GSI vector search, Jina AI integration, and LangChain RAG capabilities.

In [1]:
# Jina doesnt support openai other than 0.27
%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0rc1 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets

Note: you may need to restart the kernel to use updated packages.


### Import Required Modules

Import libraries for Couchbase GSI vector search, Jina AI models, and LangChain components.

In [2]:
import logging
import os
import time
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_community.chat_models import JinaChat
from langchain_community.embeddings import JinaEmbeddings
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType

### Configure Logging

Set up logging to track progress and capture any errors during execution.

In [3]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Suppress all logs from specific loggers
logging.getLogger('openai').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)

### Environment Configuration

Load environment variables for secure access to Jina AI and Couchbase services. Create a `.env` file with your credentials.

In [4]:
load_dotenv("./.env") 

JINA_API_KEY = os.getenv("JINA_API_KEY")
JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY")

CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost'
CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator'
CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password'
CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing'
INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina'

SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared'
COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina'
CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache'

# Check if the variables are correctly loaded
if not JINA_API_KEY:
    raise ValueError("JINA_API_KEY environment variable is not set")
if not JINACHAT_API_KEY:
    raise ValueError("JINACHAT_API_KEY environment variable is not set")

## Couchbase Connection Setup

### Connect to Cluster

Establish connection to Couchbase cluster for vector storage and retrieval operations.

In [5]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2025-10-08 11:18:34,736 - INFO - Successfully connected to Couchbase


### Setup Collections

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Clears any existing documents for clean state

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [6]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)

2025-10-08 11:18:36,208 - INFO - Bucket 'vector-search-testing' exists.
2025-10-08 11:18:36,219 - INFO - Collection 'jina' already exists. Skipping creation.
2025-10-08 11:18:38,322 - INFO - All documents cleared from the collection.
2025-10-08 11:18:38,322 - INFO - Bucket 'vector-search-testing' exists.
2025-10-08 11:18:38,327 - INFO - Collection 'jina_cache' already exists. Skipping creation.
2025-10-08 11:18:40,480 - INFO - All documents cleared from the collection.


<couchbase.collection.Collection at 0x127cdee90>

## Document Processing and Vector Store

### Create Jina Embeddings

Set up Jina AI embeddings to convert text into high-dimensional vectors that capture semantic meaning for similarity search.

In [7]:
try:
    embeddings = JinaEmbeddings(
        jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3"
    )
    logging.info("Successfully created JinaEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating JinaEmbeddings: {str(e)}")

2025-10-08 11:18:56,191 - INFO - Successfully created JinaEmbeddings


### Create GSI Vector Store

Set up the GSI vector store for high-performance vector storage and similarity search using Couchbase's Query Service.

In [8]:
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created GSI vector store")
except Exception as e:
    raise ValueError(f"Failed to create GSI vector store: {str(e)}")

2025-10-08 11:18:57,341 - INFO - Successfully created GSI vector store


### Index Creation Timing

**Important**: GSI Vector Indexes must be created AFTER uploading vector data. The index creation process analyzes existing vectors to optimize search performance through clustering and quantization.

### Load Dataset

Load the BBC News dataset for real-world testing data with authentic news articles covering various topics.

In [9]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-10-08 11:19:03,903 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


#### Clean Data

Remove duplicate articles to ensure clean search results.

In [10]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


#### Store Data

Process articles in batches and store them in the vector database with embeddings. We'll use 60% of the dataset for faster processing while maintaining good search quality.

In [11]:
# Calculate 60% of the dataset size and round to nearest integer
dataset_size = len(unique_news_articles)
subset_size = round(dataset_size * 0.6)

# Filter articles by length and create subset
filtered_articles = [article for article in unique_news_articles[:subset_size] 
                    if article and len(article) <= 50000]

# Process in batches
batch_size = 50

try:
    vector_store.add_texts(
        texts=filtered_articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully")
    
except CouchbaseException as e:
    logging.error(f"Couchbase error during ingestion: {str(e)}")
    raise RuntimeError(f"Error performing document ingestion: {str(e)}")
except Exception as e:
    if "Payment Required" in str(e):
        logging.error("Payment required for Jina AI API. Please check your subscription status and API key.")
        print("To resolve this error:")
        print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options")
        print("2. Ensure your API key is valid and has sufficient credits") 
        print("3. Consider upgrading your subscription plan if needed")
    else:
        logging.error(f"Unexpected error during ingestion: {str(e)}")
        raise RuntimeError(f"Failed to save documents to vector store: {str(e)}")

2025-10-08 11:20:18,363 - INFO - Document ingestion completed successfully


## Vector Search Performance Testing

Now let's demonstrate the performance benefits of GSI optimization by testing pure vector search performance. We'll compare three optimization levels:

1. **Baseline Performance**: Vector search without GSI optimization
2. **GSI-Optimized Performance**: Same search with BHIVE GSI index
3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries

### GSI Vector Index Types Overview

Before we start testing, let's understand the index types available:

**Hyperscale Vector Indexes (BHIVE):**
- **Best for**: Pure vector searches - content discovery, recommendations, semantic search
- **Performance**: High performance with low memory footprint, designed to scale to billions of vectors
- **Optimization**: Optimized for concurrent operations, supports simultaneous searches and inserts
- **Use when**: You primarily perform vector-only queries without complex scalar filtering
- **Ideal for**: Large-scale semantic search, recommendation systems, content discovery

**Composite Vector Indexes:**
- **Best for**: Filtered vector searches that combine vector search with scalar value filtering
- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope
- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data
- **Ideal for**: Compliance-based filtering, user-specific searches, time-bounded queries
- **Note**: Scalar filters take precedence over vector similarity

**Choosing the Right Index Type:**
- Start with Hyperscale Vector Index for pure vector searches and large datasets
- Use Composite Vector Index when scalar filters significantly reduce your search space
- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions

For this tutorial, we'll use **BHIVE** as it's optimized for pure semantic search scenarios.

### Index Configuration Details

The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization:

**Format**: `'IVF[<centroids>],{PQ|SQ}<settings>'`

#### **IVF (Inverted File Index) - Centroids Configuration**
- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches
- **Trade-offs**: More centroids = faster searches but slower training time
- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size
- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids)

#### **Quantization Options - Vector Compression**

**SQ (Scalar Quantization)**
- **Purpose**: Compresses vectors by reducing precision of individual components
- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision)
- **Trade-off**: Lower bits = more compression but less precision
- **Best for**: General-purpose applications where some precision loss is acceptable

**PQ (Product Quantization)**
- **Purpose**: Advanced compression using subquantizers for better precision
- **Format**: `PQ<subquantizers>x<bits>` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each)
- **Trade-off**: More complex but often better precision than SQ at similar compression ratios
- **Best for**: Applications requiring high precision with significant compression

#### **Common Configuration Examples**

```
IVF,SQ8          # Auto-selected centroids with 8-bit scalar quantization (recommended default)
IVF1000,SQ6      # 1000 centroids with 6-bit scalar quantization (higher compression)
IVF,PQ32x8       # Auto-selected centroids with 32 subquantizers of 8 bits each
IVF500,PQ16x4    # 500 centroids with 16 subquantizers of 4 bits each (high compression)
```

#### **Performance Considerations**

**Distance Interpretation**: In GSI vector search, lower distance values indicate higher similarity, while higher distance values indicate lower similarity.

**Scalability**: BHIVE indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments.

For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings).

For more information on GSI Vector Indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/vector-search.html).

### Vector Search Test Function

In [12]:
def test_vector_search_performance(vector_store, query, label="Vector Search"):
    """Test pure vector search performance and return timing metrics"""
    print(f"\n[{label}] Testing vector search performance")
    print(f"[{label}] Query: '{query}'")
    
    start_time = time.time()
    
    try:
        results = vector_store.similarity_search_with_score(query, k=3)
        end_time = time.time()
        search_time = end_time - start_time
        
        print(f"[{label}] Vector search completed in {search_time:.4f} seconds")
        print(f"[{label}] Found {len(results)} documents")
        
        if results:
            doc, distance = results[0]
            print(f"[{label}] Top result distance: {distance:.6f} (lower = more similar)")
            preview = doc.page_content[:100] + "..." if len(doc.page_content) > 100 else doc.page_content
            print(f"[{label}] Top result preview: {preview}")
        
        return search_time
    except Exception as e:
        print(f"[{label}] Vector search failed: {str(e)}")
        return None

### Test 1: Baseline Performance (No GSI Index)

Test pure vector search performance without GSI optimization.

In [13]:
# Test baseline vector search performance without GSI index
test_query = "What was manchester city manager pep guardiola's reaction to the team's current form?"
print("Testing baseline vector search performance without GSI optimization...")
baseline_time = test_vector_search_performance(vector_store, test_query, "Baseline Search")
print(f"\nBaseline vector search time (without GSI): {baseline_time:.4f} seconds\n")

Testing baseline vector search performance without GSI optimization...

[Baseline Search] Testing vector search performance
[Baseline Search] Query: 'What was manchester city manager pep guardiola's reaction to the team's current form?'
[Baseline Search] Vector search completed in 0.8305 seconds
[Baseline Search] Found 3 documents
[Baseline Search] Top result distance: 0.457932 (lower = more similar)
[Baseline Search] Top result preview: 'Promised change, but Juventus are back in crisis'

"We have entirely changed the way we think about...

Baseline vector search time (without GSI): 0.8305 seconds



### Create BHIVE GSI Index

Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store.

In [14]:
# Create GSI Vector Index for high-performance searches
print("Creating BHIVE GSI vector index...")
try:
    vector_store.create_index(
        index_type=IndexType.BHIVE, # Use IndexType.COMPOSITE for Composite index
        index_description="IVF,SQ8"
    )
    print("GSI Vector index created successfully")
    
    # Wait for index to become available
    print("Waiting for index to become available...")
    time.sleep(5)
    
except Exception as e:
    if "already exists" in str(e).lower():
        print("GSI Vector index already exists, proceeding...")
    else:
        print(f"Error creating GSI index: {str(e)}")

Creating BHIVE GSI vector index...
GSI Vector index created successfully
Waiting for index to become available...


### Alternative: Composite Index Configuration

If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above:

```python
# Alternative: Create a Composite index for filtered searches
vector_store.create_index(
    index_type=IndexType.COMPOSITE,  # Instead of IndexType.BHIVE
    index_description="IVF,SQ8"      # Same quantization settings
)
```

### Test 2: GSI-Optimized Performance

Test the same vector search with BHIVE GSI optimization.

In [15]:
# Test vector search performance with GSI index
gsi_test_query = "What happened in the latest Premier League matches?"
print("Testing vector search performance with BHIVE GSI optimization...")
gsi_time = test_vector_search_performance(vector_store, gsi_test_query, "GSI-Optimized Search")

Testing vector search performance with BHIVE GSI optimization...

[GSI-Optimized Search] Testing vector search performance
[GSI-Optimized Search] Query: 'What happened in the latest Premier League matches?'
[GSI-Optimized Search] Vector search completed in 0.6452 seconds
[GSI-Optimized Search] Found 3 documents
[GSI-Optimized Search] Top result distance: 0.394714 (lower = more similar)
[GSI-Optimized Search] Top result preview: The latest updates and analysis from the BBC.


### Test 3: Cache Benefits Testing

Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches.

In [19]:
# Set up Couchbase cache (can be applied to any search approach)
print("Setting up Couchbase cache for improved performance on repeated queries...")
cache = CouchbaseCache(
    cluster=cluster,
    bucket_name=CB_BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
)
set_llm_cache(cache)
print("✓ Couchbase cache enabled!")

Setting up Couchbase cache for improved performance on repeated queries...
✓ Couchbase cache enabled!


In [20]:
# Test cache benefits with a different query to avoid interference
cache_test_query = "What are the latest football transfer developments?"

print("Testing cache benefits with vector search...")
print("First execution (cache miss):")
cache_time_1 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - First Run")

print("\nSecond execution (cache hit - should be faster):")
cache_time_2 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - Second Run")

Testing cache benefits with vector search...
First execution (cache miss):

[Cache Test - First Run] Testing vector search performance
[Cache Test - First Run] Query: 'What are the latest football transfer developments?'
[Cache Test - First Run] Vector search completed in 0.9695 seconds
[Cache Test - First Run] Found 3 documents
[Cache Test - First Run] Top result distance: 0.394020 (lower = more similar)
[Cache Test - First Run] Top result preview: The latest updates and analysis from the BBC.

Second execution (cache hit - should be faster):

[Cache Test - Second Run] Testing vector search performance
[Cache Test - Second Run] Query: 'What are the latest football transfer developments?'
[Cache Test - Second Run] Vector search completed in 0.5252 seconds
[Cache Test - Second Run] Found 3 documents
[Cache Test - Second Run] Top result distance: 0.394020 (lower = more similar)
[Cache Test - Second Run] Top result preview: The latest updates and analysis from the BBC.


### Vector Search Performance Analysis

Let's analyze the vector search performance improvements across all optimization levels:

In [21]:
print("\n" + "="*80)
print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY")
print("="*80)

print(f"Phase 1 - Baseline Search (No GSI):     {baseline_time:.4f} seconds")
print(f"Phase 2 - GSI-Optimized Search:         {gsi_time:.4f} seconds")
if cache_time_1 and cache_time_2:
    print(f"Phase 3 - Cache Benefits:")
    print(f"  First execution (cache miss):         {cache_time_1:.4f} seconds")
    print(f"  Second execution (cache hit):         {cache_time_2:.4f} seconds")

print("\n" + "-"*80)
print("VECTOR SEARCH OPTIMIZATION IMPACT:")
print("-"*80)

# GSI improvement analysis
if baseline_time and gsi_time:
    speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf')
    time_saved = baseline_time - gsi_time
    percent_improvement = (time_saved / baseline_time) * 100
    print(f"GSI Index Benefit:      {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)")

# Cache improvement analysis
if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1:
    cache_speedup = cache_time_1 / cache_time_2
    cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100
    print(f"Cache Benefit:          {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)")
else:
    print(f"Cache Benefit:          Variable (depends on query complexity and caching mechanism)")

print(f"\nKey Insights for Vector Search Performance:")
print(f"• GSI BHIVE indexes provide significant performance improvements for vector similarity search")
print(f"• Performance gains are most dramatic for complex semantic queries")
print(f"• BHIVE optimization is particularly effective for high-dimensional embeddings")
print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance")
print(f"• These performance improvements directly benefit any application using the vector store")


VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY
Phase 1 - Baseline Search (No GSI):     0.8305 seconds
Phase 2 - GSI-Optimized Search:         0.6452 seconds
Phase 3 - Cache Benefits:
  First execution (cache miss):         0.9695 seconds
  Second execution (cache hit):         0.5252 seconds

--------------------------------------------------------------------------------
VECTOR SEARCH OPTIMIZATION IMPACT:
--------------------------------------------------------------------------------
GSI Index Benefit:      1.29x faster (22.3% improvement)
Cache Benefit:          1.85x faster (45.8% improvement)

Key Insights for Vector Search Performance:
• GSI BHIVE indexes provide significant performance improvements for vector similarity search
• Performance gains are most dramatic for complex semantic queries
• BHIVE optimization is particularly effective for high-dimensional embeddings
• Combined with proper quantization (SQ8), GSI delivers production-ready performance
• These performance impr

## Jina AI RAG Demo

### What is RAG (Retrieval-Augmented Generation)?

Now that we've optimized our vector search performance, let's demonstrate how to build a complete RAG system using Jina AI. RAG combines the power of our GSI-optimized semantic search with language model generation:

1. **Query Processing**: User question is converted to vector embedding using Jina AI
2. **Document Retrieval**: GSI BHIVE index finds most relevant documents (now with proven performance improvements)
3. **Context Assembly**: Retrieved documents provide factual context for the language model
4. **Response Generation**: Jina's language model generates intelligent answers grounded in the retrieved data

This demo shows how the vector search performance improvements we validated directly enhance the RAG workflow efficiency.

### Create Jina Language Model

Initialize Jina's chat model for generating intelligent responses based on our GSI-optimized retrieval system.

In [22]:
print("Setting up Jina AI language model for RAG demo...")
try:
    llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY)
    print("✓ JinaChat language model created successfully")
    logging.info("Successfully created JinaChat")
except Exception as e:
    print(f"✗ Error creating JinaChat: {str(e)}")
    print("Please check your JINACHAT_API_KEY and network connection.")
    raise

2025-10-08 11:24:30,099 - INFO - Successfully created JinaChat


Setting up Jina AI language model for RAG demo...
✓ JinaChat language model created successfully


### Build Optimized RAG Pipeline

Create the complete RAG pipeline that integrates our GSI-optimized vector search with Jina's language model.

In [23]:
try:
    # Create RAG prompt template for structured responses
    template = """You are a helpful assistant that answers questions based on the provided context. 
    If you cannot answer based on the context provided, respond with a generic answer. 
    Answer the question as truthfully as possible using the context below:
    
    Context:
    {context}

    Question: {question}
    
    Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)

    # Build the RAG chain: GSI-Optimized Retrieval → Context → Generation → Output
    rag_chain = (
        {
            "context": vector_store.as_retriever(search_kwargs={"k": 2}), 
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )
    print("Optimized RAG pipeline created successfully")
    print("Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response")
except Exception as e:
    raise ValueError(f"Error creating RAG pipeline: {str(e)}")

Optimized RAG pipeline created successfully
Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response


### RAG Demo with Optimized Search

Test the complete RAG system leveraging our GSI performance optimizations.

In [26]:
print("Testing RAG System with GSI-Optimized Vector Search")
print("=" * 60)

try:
    # Test with a specific query
    sample_query = "What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes?"
    print(f"User Query: {sample_query}")
    print("\nProcessing with optimized pipeline...")
    print("1. Converting query to vector embedding with Jina AI")
    print("2. Searching GSI BHIVE index for relevant documents (optimized)")
    print("3. Assembling context from retrieved documents")
    print("4. Generating intelligent response with JinaChat")
    
    start_time = time.time()
    rag_response = rag_chain.invoke(sample_query)
    end_time = time.time()
    
    print(f"\nRAG Response (completed in {end_time - start_time:.2f} seconds):")
    print("-" * 60)
    print(rag_response)
    
except Exception as e:
    if "Payment Required" in str(e):
        print("\nPayment required for Jina AI API.")
        print("To resolve:")
        print("• Visit https://jina.ai/reader/#pricing for subscription options")
        print("• Ensure your API key is valid and has sufficient credits")
    else:
        print(f"Error: {str(e)}")

Testing RAG System with GSI-Optimized Vector Search
User Query: What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes?

Processing with optimized pipeline...
1. Converting query to vector embedding with Jina AI
2. Searching GSI BHIVE index for relevant documents (optimized)
3. Assembling context from retrieved documents
4. Generating intelligent response with JinaChat

RAG Response (completed in 4.25 seconds):
------------------------------------------------------------
The new eligibility rules for transgender women competing in leading women's golf tours starting from 2025 prevent transgender women who have gone through male puberty from participating. Female players protesting led to these changes, as they called for policies to prevent those recorded as male at birth from competing in women's events.


### Multiple Query RAG Demo

Test the RAG system with various queries to demonstrate the benefits of our optimized vector search.

In [35]:
print("\nTesting Optimized RAG System with Multiple Queries")
print("=" * 55)

try:
    test_queries = [
        "What happened in the car incident on Shaftesbury Avenue in London?",
        "What did King Charles talk about in his recent Christmas speech?",
    ]

    for i, query in enumerate(test_queries, 1):
        print(f"\n--- RAG Query {i} ---")
        print(f"Question: {query}")
        
        start_time = time.time()
        response = rag_chain.invoke(query)
        end_time = time.time()
        
        print(f"Response (completed in {end_time - start_time:.2f} seconds): {response}")
        
except Exception as e:
    if "Payment Required" in str(e):
        print("Payment required for Jina AI API.")
    else:
        print(f"Error: {str(e)}")

print(f"\n✅ RAG demo completed successfully!")
print("✅ The system leverages GSI BHIVE optimization for fast document retrieval!")
print("✅ Jina AI provides high-quality embeddings and intelligent response generation!")


Testing Optimized RAG System with Multiple Queries

--- RAG Query 1 ---
Question: What happened in the car incident on Shaftesbury Avenue in London?
Response (completed in 3.32 seconds): ### Answer:
A 31-year-old man was arrested on suspicion of attempted murder after driving a car on the wrong side of the road in Shaftesbury Avenue, London, injuring four pedestrians. The incident was treated as an isolated incident and was not terror-related.

--- RAG Query 2 ---
Question: What did King Charles talk about in his recent Christmas speech?
Response (completed in 0.74 seconds): ### King Charles's Recent Christmas Speech Highlights:

- Visited a Christmas market at Battersea Power Station.
- Met with Apple chief Tim Cook at Apple's UK headquarters.
- Interacted with carol singers, Christmas shoppers, and stallholders.
- Explored the power station and visited stalls at the Curated Makers Market.

✅ RAG demo completed successfully!
✅ The system leverages GSI BHIVE optimization for fast docu

## Conclusion

You've successfully built a high-performance semantic search engine combining:
- **Couchbase GSI BHIVE indexes** for optimized vector search
- **Jina AI embeddings and language models** for intelligent processing
- **Complete RAG pipeline** with caching optimization