# Semantic Search with Couchbase GSI Vector Indexes and Jina AI

## Overview

This tutorial demonstrates building a high-performance semantic search engine using Couchbase's GSI (Global Secondary Index) vector search and Jina AI for embeddings and language models. We'll show measurable performance improvements with GSI optimization and implement a complete RAG (Retrieval-Augmented Generation) system.

**Key Features:**
- High-performance GSI vector search with BHIVE indexing
- Jina AI embeddings and language models
- Performance benchmarks showing GSI benefits
- Complete RAG workflow with caching optimization

**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled.

## How to Run This Tutorial

This tutorial is available as a Jupyter Notebook that you can run interactively on [Google Colab](https://colab.research.google.com/) or locally by setting up the Python environment.

## Prerequisites

### System Requirements

- **Couchbase Server 8.0+** or Couchbase Capella
- **Query Service enabled** (required for GSI Vector Indexes)
- **Jina AI API credentials** ([Get them here](https://jina.ai/))
- **JinaChat API credentials** ([Get them here](https://chat.jina.ai/api))

### Couchbase Capella Setup

1. **Create Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up)
2. **Configure Access:** Set up database credentials and network security  
3. **Enable Query Service:** Required for GSI vector search functionality

## Setup and Installation

### Install Required Libraries

Install the necessary packages for Couchbase GSI vector search, Jina AI integration, and LangChain RAG capabilities.

In [1]:
# Jina doesnt support openai other than 0.27
%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0rc1 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets

Note: you may need to restart the kernel to use updated packages.


### Import Required Modules

Import libraries for Couchbase GSI vector search, Jina AI models, and LangChain components.

In [2]:
import logging
import os
import time
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_community.chat_models import JinaChat
from langchain_community.embeddings import JinaEmbeddings
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType

### Configure Logging

Set up logging to track progress and capture any errors during execution.

In [3]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Suppress all logs from specific loggers
logging.getLogger('openai').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)

### Environment Configuration

Load environment variables for secure access to Jina AI and Couchbase services. Create a `.env` file with your credentials.

In [4]:
load_dotenv("./.env") 

JINA_API_KEY = os.getenv("JINA_API_KEY")
JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY")

CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost'
CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator'
CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password'
CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing'
INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina'

SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared'
COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina'
CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache'

# Check if the variables are correctly loaded
if not JINA_API_KEY:
    raise ValueError("JINA_API_KEY environment variable is not set")
if not JINACHAT_API_KEY:
    raise ValueError("JINACHAT_API_KEY environment variable is not set")

## Couchbase Connection Setup

### Connect to Cluster

Establish connection to Couchbase cluster for vector storage and retrieval operations.

In [5]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2025-09-24 14:48:22,593 - INFO - Successfully connected to Couchbase


### Setup Collections

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Creates primary index on collection for query performance
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [6]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)

2025-09-24 14:48:26,845 - INFO - Bucket 'vector-search-testing' does not exist. Creating it...
2025-09-24 14:48:28,901 - INFO - Bucket 'vector-search-testing' created successfully.
2025-09-24 14:48:28,906 - INFO - Scope 'shared' does not exist. Creating it...
2025-09-24 14:48:28,954 - INFO - Scope 'shared' created successfully.
2025-09-24 14:48:28,977 - INFO - Collection 'jina' does not exist. Creating it...
2025-09-24 14:48:29,021 - INFO - Collection 'jina' created successfully.
2025-09-24 14:48:31,135 - INFO - All documents cleared from the collection.
2025-09-24 14:48:31,135 - INFO - Bucket 'vector-search-testing' exists.
2025-09-24 14:48:31,141 - INFO - Collection 'jina_cache' does not exist. Creating it...
2025-09-24 14:48:31,181 - INFO - Collection 'jina_cache' created successfully.
2025-09-24 14:48:33,294 - INFO - All documents cleared from the collection.


<couchbase.collection.Collection at 0x13a84ae90>

## Document Processing and Vector Store

### Create Jina Embeddings

Set up Jina AI embeddings to convert text into high-dimensional vectors that capture semantic meaning for similarity search.

In [7]:
try:
    embeddings = JinaEmbeddings(
        jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3"
    )
    logging.info("Successfully created JinaEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating JinaEmbeddings: {str(e)}")

2025-09-24 14:48:36,737 - INFO - Successfully created JinaEmbeddings


### Create GSI Vector Store

Set up the GSI vector store for high-performance vector storage and similarity search using Couchbase's Query Service.

In [8]:
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created GSI vector store")
except Exception as e:
    raise ValueError(f"Failed to create GSI vector store: {str(e)}")

2025-09-24 14:48:38,611 - INFO - Successfully created GSI vector store


### Index Creation Timing

**Important**: GSI Vector Indexes must be created AFTER uploading vector data. The index creation process analyzes existing vectors to optimize search performance through clustering and quantization.

### Load Dataset

Load the BBC News dataset for real-world testing data with authentic news articles covering various topics.

In [9]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-09-24 14:48:45,318 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


#### Clean Data

Remove duplicate articles to ensure clean search results.

In [10]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


#### Store Data

Process articles in batches and store them in the vector database with embeddings. We'll use 60% of the dataset for faster processing while maintaining good search quality.

In [11]:
# Calculate 60% of the dataset size and round to nearest integer
dataset_size = len(unique_news_articles)
subset_size = round(dataset_size * 0.6)

# Filter articles by length and create subset
filtered_articles = [article for article in unique_news_articles[:subset_size] 
                    if article and len(article) <= 50000]

# Process in batches
batch_size = 50

try:
    vector_store.add_texts(
        texts=filtered_articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully")
    
except CouchbaseException as e:
    logging.error(f"Couchbase error during ingestion: {str(e)}")
    raise RuntimeError(f"Error performing document ingestion: {str(e)}")
except Exception as e:
    if "Payment Required" in str(e):
        logging.error("Payment required for Jina AI API. Please check your subscription status and API key.")
        print("To resolve this error:")
        print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options")
        print("2. Ensure your API key is valid and has sufficient credits") 
        print("3. Consider upgrading your subscription plan if needed")
    else:
        logging.error(f"Unexpected error during ingestion: {str(e)}")
        raise RuntimeError(f"Failed to save documents to vector store: {str(e)}")

2025-09-24 14:50:04,178 - INFO - Document ingestion completed successfully


## Performance Testing: Raw Search vs GSI vs Cache Optimization

### Understanding Vector Search Optimization Levels

We'll demonstrate three levels of search optimization to show the progressive performance improvements available with Couchbase:

1. **Raw Search**: Basic vector similarity search without any indexing optimization
2. **GSI BHIVE Index**: High-performance vector search using Couchbase's specialized vector indexes  
3. **Cache Optimization**: Additional performance gains from caching frequently accessed results

Each level builds upon the previous one, showing measurable performance improvements for semantic search applications.

### GSI Vector Index Types Overview

Before we start testing, let's understand the index types available:

**Hyperscale Vector Indexes (BHIVE):**
- **Best for**: Pure vector searches - content discovery, recommendations, semantic search
- **Performance**: High performance with low memory footprint, designed to scale to billions of vectors
- **Optimization**: Optimized for concurrent operations, supports simultaneous searches and inserts
- **Use when**: You primarily perform vector-only queries without complex scalar filtering
- **Ideal for**: Large-scale semantic search, recommendation systems, content discovery

**Composite Vector Indexes:**
- **Best for**: Filtered vector searches that combine vector search with scalar value filtering
- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope
- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data
- **Ideal for**: Compliance-based filtering, user-specific searches, time-bounded queries
- **Note**: Scalar filters take precedence over vector similarity

**Choosing the Right Index Type:**
- Start with Hyperscale Vector Index for pure vector searches and large datasets
- Use Composite Vector Index when scalar filters significantly reduce your search space
- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions

For this tutorial, we'll use **BHIVE** as it's optimized for pure semantic search scenarios.

### Index Configuration Details

The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization:

**Format**: `'IVF[<centroids>],{PQ|SQ}<settings>'`

#### **IVF (Inverted File Index) - Centroids Configuration**
- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches
- **Trade-offs**: More centroids = faster searches but slower training time
- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size
- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids)

#### **Quantization Options - Vector Compression**

**SQ (Scalar Quantization)**
- **Purpose**: Compresses vectors by reducing precision of individual components
- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision)
- **Trade-off**: Lower bits = more compression but less precision
- **Best for**: General-purpose applications where some precision loss is acceptable

**PQ (Product Quantization)**
- **Purpose**: Advanced compression using subquantizers for better precision
- **Format**: `PQ<subquantizers>x<bits>` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each)
- **Trade-off**: More complex but often better precision than SQ at similar compression ratios
- **Best for**: Applications requiring high precision with significant compression

#### **Common Configuration Examples**

```
IVF,SQ8          # Auto-selected centroids with 8-bit scalar quantization (recommended default)
IVF1000,SQ6      # 1000 centroids with 6-bit scalar quantization (higher compression)
IVF,PQ32x8       # Auto-selected centroids with 32 subquantizers of 8 bits each
IVF500,PQ16x4    # 500 centroids with 16 subquantizers of 4 bits each (high compression)
```

#### **Performance Considerations**

**Distance Interpretation**: In GSI vector search, lower distance values indicate higher similarity, while higher distance values indicate lower similarity.

**Scalability**: BHIVE indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments.

For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings).

For more information on GSI Vector Indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/vector-search.html).

### Performance Test Helper Function

In [12]:
def test_search_performance(vector_store, query, label="Search", show_results=True):
    """Test search performance and return timing metrics"""
    print(f"\n[{label}] Testing query: '{query}'")
    start_time = time.time()
    
    try:
        results = vector_store.similarity_search_with_score(query, k=3)
        end_time = time.time()
        search_time = end_time - start_time
        
        print(f"[{label}] Search completed in {search_time:.4f} seconds")
        print(f"[{label}] Found {len(results)} documents")
        
        if results and show_results:
            doc, distance = results[0]
            print(f"[{label}] Top result distance: {distance:.6f} (lower = more similar)")
            print(f"[{label}] Top result preview: {doc.page_content[:100]}...")
        
        return search_time, results
    except Exception as e:
        print(f"[{label}] Search failed: {str(e)}")
        return None, None

### Phase 1: Raw Search Performance (Baseline)

Test performance with basic vector search before any GSI optimization. This establishes our baseline performance.

In [13]:
print("\n" + "="*80)
print("PHASE 1: RAW SEARCH PERFORMANCE (BASELINE)")
print("="*80)
print("Testing basic vector similarity search without GSI optimization...")

# Test queries for each phase
query_phase1 = "What was manchester city manager pep guardiola's reaction to the team's current form?"
query_phase2 = "What happened in the latest Premier League matches?"
query_phase3 = query_phase2  # Same query to test cache benefits

# Phase 1: Raw search performance
baseline_time, baseline_results = test_search_performance(
    vector_store, query_phase1, "Raw Search (No GSI)"
)


PHASE 1: RAW SEARCH PERFORMANCE (BASELINE)
Testing basic vector similarity search without GSI optimization...

[Raw Search (No GSI)] Testing query: 'What was manchester city manager pep guardiola's reaction to the team's current form?'
[Raw Search (No GSI)] Search completed in 1.0584 seconds
[Raw Search (No GSI)] Found 3 documents
[Raw Search (No GSI)] Top result distance: 0.320541 (lower = more similar)
[Raw Search (No GSI)] Top result preview: 'Self-doubt, errors & big changes' - inside the crisis at Man City

Pep Guardiola has not been throu...


### Phase 2: Create GSI BHIVE Index and Test Performance

Create the high-performance BHIVE GSI index and measure the performance improvement.

In [14]:
print("\n" + "="*80)
print("PHASE 2: GSI BHIVE INDEX OPTIMIZATION")  
print("="*80)

# Create GSI Vector Index for high-performance searches
print("Creating BHIVE GSI vector index...")
try:
    vector_store.create_index(
        index_type=IndexType.BHIVE, # Use IndexType.COMPOSITE for Composite index
        index_description="IVF,SQ8"
    )
    print("✓ GSI Vector index created successfully")
    
    # Wait for index to become available
    print("Waiting for index to become available...")
    time.sleep(5)
    
except Exception as e:
    if "already exists" in str(e).lower():
        print("✓ GSI Vector index already exists, proceeding...")
    else:
        print(f"Error creating GSI index: {str(e)}")

# Test performance with GSI index using a different query
print("\nTesting GSI-optimized search performance...")
gsi_time, gsi_results = test_search_performance(
    vector_store, query_phase2, "GSI BHIVE Optimized"
)


PHASE 2: GSI BHIVE INDEX OPTIMIZATION
Creating BHIVE GSI vector index...
✓ GSI Vector index created successfully
Waiting for index to become available...

Testing GSI-optimized search performance...

[GSI BHIVE Optimized] Testing query: 'What happened in the latest Premier League matches?'
[GSI BHIVE Optimized] Search completed in 0.4883 seconds
[GSI BHIVE Optimized] Found 3 documents
[GSI BHIVE Optimized] Top result distance: 0.460026 (lower = more similar)
[GSI BHIVE Optimized] Top result preview: Garry Richardson chooses the audio highlights across the BBC over the last twelve months....


### Phase 3: Cache Optimization Testing

Test the additional performance benefits from caching by running the same query twice to demonstrate cache hit optimization.

In [16]:
print("\n" + "="*80)
print("PHASE 3: CACHE OPTIMIZATION TESTING")
print("="*80)

# Set up Couchbase cache for query result caching
print("Setting up Couchbase cache for query optimization...")
try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    set_llm_cache(cache)
    print("✓ Couchbase cache configured successfully")
except Exception as e:
    print(f"Warning: Failed to create cache: {str(e)}")
    cache = None

# Test cache performance with repeated queries
print("\nTesting cache optimization with repeated query...")

print("First execution (cache miss - will be slower):")
cache_time_1, cache_results_1 = test_search_performance(
    vector_store, query_phase3, "GSI + Cache (1st run)", show_results=False
)

print("\nSecond execution (cache hit - should be faster):")  
cache_time_2, cache_results_2 = test_search_performance(
    vector_store, query_phase3, "GSI + Cache (2nd run)", show_results=False
)


PHASE 3: CACHE OPTIMIZATION TESTING
Setting up Couchbase cache for query optimization...
✓ Couchbase cache configured successfully

Testing cache optimization with repeated query...
First execution (cache miss - will be slower):

[GSI + Cache (1st run)] Testing query: 'What happened in the latest Premier League matches?'
[GSI + Cache (1st run)] Search completed in 0.8192 seconds
[GSI + Cache (1st run)] Found 3 documents

Second execution (cache hit - should be faster):

[GSI + Cache (2nd run)] Testing query: 'What happened in the latest Premier League matches?'
[GSI + Cache (2nd run)] Search completed in 0.5779 seconds
[GSI + Cache (2nd run)] Found 3 documents


### Complete Performance Summary

Compare all three optimization levels and show the cumulative performance improvements.

In [17]:
print("\n" + "="*90)
print("COMPLETE PERFORMANCE OPTIMIZATION SUMMARY")
print("="*90)

# Display results from all phases
times = []
labels = []

if baseline_time:
    times.append(baseline_time)
    labels.append("Phase 1: Raw Search")
    
if gsi_time:
    times.append(gsi_time) 
    labels.append("Phase 2: GSI BHIVE")
    
if cache_time_2:
    times.append(cache_time_2)
    labels.append("Phase 3: GSI + Cache")

# Print timing summary
for i, (time_val, label) in enumerate(zip(times, labels)):
    print(f"{label:<25}: {time_val:.4f} seconds")

# Calculate improvements
if len(times) >= 2:
    print(f"\n{'Performance Improvements:':<25}")
    print("-" * 50)
    
    if baseline_time and gsi_time:
        gsi_improvement = ((baseline_time - gsi_time) / baseline_time) * 100
        gsi_speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf')
        print(f"GSI vs Raw Search:        {gsi_speedup:.2f}x faster ({gsi_improvement:.1f}% improvement)")
    
    if cache_time_1 and cache_time_2 and cache_time_1 > cache_time_2:
        cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100
        cache_speedup = cache_time_1 / cache_time_2
        print(f"Cache vs Non-cached:      {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)")
    
    if baseline_time and cache_time_2:
        total_improvement = ((baseline_time - cache_time_2) / baseline_time) * 100
        total_speedup = baseline_time / cache_time_2 if cache_time_2 > 0 else float('inf')
        print(f"Total Optimization:       {total_speedup:.2f}x faster ({total_improvement:.1f}% improvement)")

print(f"\n{'Key Optimization Benefits:':<25}")
print("-" * 50)
print("• GSI BHIVE Index: Optimized vector clustering and quantization")
print("• Cache System: Eliminates redundant computations for repeated queries") 
print("• Scalability: Both optimizations scale well with dataset size")
print("• Production Ready: Suitable for high-volume, low-latency applications")


COMPLETE PERFORMANCE OPTIMIZATION SUMMARY
Phase 1: Raw Search      : 1.0584 seconds
Phase 2: GSI BHIVE       : 0.4883 seconds
Phase 3: GSI + Cache     : 0.5779 seconds

Performance Improvements:
--------------------------------------------------
GSI vs Raw Search:        2.17x faster (53.9% improvement)
Cache vs Non-cached:      1.42x faster (29.5% improvement)
Total Optimization:       1.83x faster (45.4% improvement)

Key Optimization Benefits:
--------------------------------------------------
• GSI BHIVE Index: Optimized vector clustering and quantization
• Cache System: Eliminates redundant computations for repeated queries
• Scalability: Both optimizations scale well with dataset size
• Production Ready: Suitable for high-volume, low-latency applications


## Jina AI Language Model and RAG Integration

### Understanding RAG (Retrieval-Augmented Generation)

RAG combines semantic search with language model generation for intelligent, context-aware responses:

1. **Query Processing**: User question is converted to vector embedding using Jina AI
2. **Document Retrieval**: GSI vector search finds most relevant documents from our knowledge base
3. **Context Assembly**: Retrieved documents provide factual context for the language model
4. **Response Generation**: Jina's language model generates intelligent answers grounded in the retrieved data

This approach ensures responses are factually grounded rather than relying solely on the model's training knowledge.

### Create Jina Language Model

Initialize Jina's chat model for generating intelligent responses based on retrieved context.

In [18]:
print("\n" + "="*60)
print("SETTING UP JINA AI LANGUAGE MODEL")
print("="*60)

try:
    llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY)
    print("✓ JinaChat language model created successfully")
    logging.info("Successfully created JinaChat")
except Exception as e:
    print(f"✗ Error creating JinaChat: {str(e)}")
    print("Please check your JINACHAT_API_KEY and network connection.")
    raise

2025-09-24 14:51:57,158 - INFO - Successfully created JinaChat



SETTING UP JINA AI LANGUAGE MODEL
✓ JinaChat language model created successfully


## Intelligent Response Generation with Jina AI

### Understanding RAG (Retrieval-Augmented Generation)

RAG combines the power of semantic search with language model generation. The process works as follows:

1. **Query Processing**: User question is converted to vector embedding
2. **Document Retrieval**: GSI vector search finds most relevant documents
3. **Context Assembly**: Retrieved documents provide context for the language model
4. **Response Generation**: Jina's language model generates intelligent answers based on the context

This approach ensures responses are grounded in actual data rather than relying solely on the model's training knowledge.

### Build RAG Pipeline

Create the complete RAG pipeline that integrates GSI vector search with Jina's language model.

In [19]:
try:
    # Create RAG prompt template for structured responses
    template = """You are a helpful assistant that answers questions based on the provided context. 
    If you cannot answer based on the context provided, respond with a generic answer. 
    Answer the question as truthfully as possible using the context below:
    
    Context:
    {context}

    Question: {question}
    
    Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)

    # Build the RAG chain: Retrieval → Context → Generation → Output
    rag_chain = (
        {
            "context": vector_store.as_retriever(search_kwargs={"k": 2}), 
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )
    print("RAG pipeline created successfully")
    print("Components: GSI Vector Search → Context Assembly → Jina Language Model → Response")
except Exception as e:
    raise ValueError(f"Error creating RAG pipeline: {str(e)}")

RAG pipeline created successfully
Components: GSI Vector Search → Context Assembly → Jina Language Model → Response


### Test Intelligent Response Generation

Test the complete RAG system to see how it generates intelligent responses based on retrieved context.

In [20]:
print("Testing Intelligent Response Generation")
print("=" * 50)

try:
    # Test with a specific query
    sample_query = "What was manchester city manager pep guardiola's reaction to the team's current form?"
    print(f"User Query: {sample_query}")
    print("\nProcessing...")
    print("1. Converting query to vector embedding")
    print("2. Searching GSI vector index for relevant documents")
    print("3. Assembling context from retrieved documents")
    print("4. Generating intelligent response with Jina AI")
    
    rag_response = rag_chain.invoke(sample_query)
    print(f"\nIntelligent Response:")
    print("-" * 30)
    print(rag_response)
    
except Exception as e:
    if "Payment Required" in str(e):
        print("\nPayment required for Jina AI API.")
        print("To resolve:")
        print("• Visit https://jina.ai/reader/#pricing for subscription options")
        print("• Ensure your API key is valid and has sufficient credits")
    else:
        print(f"Error: {str(e)}")

Testing Intelligent Response Generation
User Query: What was manchester city manager pep guardiola's reaction to the team's current form?

Processing...
1. Converting query to vector embedding
2. Searching GSI vector index for relevant documents
3. Assembling context from retrieved documents
4. Generating intelligent response with Jina AI

Intelligent Response:
------------------------------
Pep Guardiola has shown self-doubt and concern regarding Manchester City's recent performance.


### Demonstrate RAG with Multiple Queries

Test the RAG system with various queries to show its versatility and the benefits of caching for repeated queries.

In [21]:
print("\nTesting RAG System with Multiple Queries")
print("=" * 45)

try:
    test_queries = [
        "What happened in the match between Fulham and Liverpool?",
        "What are the latest developments in football transfers?",
        "What happened in the match between Fulham and Liverpool?",  # Repeated to show caching
    ]

    for i, query in enumerate(test_queries, 1):
        print(f"\n--- Query {i} ---")
        print(f"Question: {query}")
        
        if i == 3:
            print("(This is a repeated query - should benefit from caching)")
        
        response = rag_chain.invoke(query)
        print(f"Response: {response}")
        
except Exception as e:
    if "Payment Required" in str(e):
        print("Payment required for Jina AI API.")
    else:
        print(f"Error: {str(e)}")


Testing RAG System with Multiple Queries

--- Query 1 ---
Question: What happened in the match between Fulham and Liverpool?
Response: ### Match Summary: Fulham vs. Liverpool

In the Premier League match between Fulham and Liverpool, the game ended in a 2-2 draw. Liverpool played the majority of the game with ten men. Arne Slot, the Liverpool head coach, described his team's performance as "impressive."

--- Query 2 ---
Question: What are the latest developments in football transfers?
Response: ### Latest Developments in Football Transfers

The latest developments in football transfers involve Everton Football Club undergoing new ownership with the Friedkin Group, potential managerial changes, and ongoing contract negotiations with key players like Dominic Calvert-Lewin.

--- Query 3 ---
Question: What happened in the match between Fulham and Liverpool?
(This is a repeated query - should benefit from caching)
Response: ### Match Summary: Fulham vs. Liverpool

In the Premier League mat

## Conclusion

You've successfully built a high-performance semantic search engine combining:
- **Couchbase GSI BHIVE indexes** for optimized vector search
- **Jina AI embeddings and language models** for intelligent processing
- **Complete RAG pipeline** with caching optimization