# Introduction
In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Jina](https://jina.ai/) as the AI-powered embedding and language model provider, utilizing Global Secondary Index (GSI) Vector Indexes. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. 

GSI Vector Indexes provide several advantages over traditional Full-Text Search approaches:
- **Higher Performance**: Optimized for vector-first workloads with support for billions of vectors
- **Scalability**: Better performance for high QPS (queries per second) scenarios

This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI Vector Indexes from scratch. If you prefer to use the Full-Text Search (FTS) approach instead, please take a look at [this tutorial](https://developer.couchbase.com/tutorial-jina-couchbase-rag-with-fts).

# How to run this tutorial

This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/jinaai/RAG_with_Couchbase_and_Jina_AI.ipynb).

You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment.

# Before you start

## System Requirements

**Important**: GSI Vector Indexes require **Couchbase Server 8.0.0+** or Couchbase Capella. Make sure your cluster meets this requirement before proceeding.

## Get Credentials for Jina AI

* Please follow the [instructions](https://jina.ai/) to generate the Jina AI credentials.
* Please follow the [instructions](https://chat.jina.ai/api) to generate the JinaChat credentials.

## Create and Deploy Your Cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a cluster. Capella automatically supports GSI Vector Indexes and provides the latest features.

To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met:

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.
* Ensure the Query Service is enabled (required for GSI Vector Indexes).

# Setting the Stage: Installing Necessary Libraries
To build our semantic search engine using GSI Vector Indexes, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations and GSI Vector Index interactions, LangChain handles AI model integrations, and Jina provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the vector-intensive and computationally complex tasks required for semantic search with GSI Vector Indexes.

In [1]:
# Jina doesnt support openai other than 0.27
%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0rc1 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets

Note: you may need to restart the kernel to use updated packages.


# Importing Necessary Libraries
The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models.

In [2]:
import logging
import os
import time
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_community.chat_models import JinaChat
from langchain_community.embeddings import JinaEmbeddings
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType

# Setup Logging
Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script.


In [3]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Suppress all logs from specific loggers
logging.getLogger('openai').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)

# Loading Sensitive Informnation
In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.

In [4]:
load_dotenv("./.env") 

JINA_API_KEY = os.getenv("JINA_API_KEY")
JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY")

CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost'
CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator'
CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password'
CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing'
INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina'

SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared'
COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina'
CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache'

# Check if the variables are correctly loaded
if not JINA_API_KEY:
    raise ValueError("JINA_API_KEY environment variable is not set")
if not JINACHAT_API_KEY:
    raise ValueError("JINACHAT_API_KEY environment variable is not set")

# Connecting to the Couchbase Cluster
Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.



In [5]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2025-09-23 12:12:41,385 - INFO - Successfully connected to Couchbase


## Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Creates primary index on collection for query performance
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [23]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)


2025-09-23 12:21:06,924 - INFO - Bucket 'vector-search-testing' exists.
2025-09-23 12:21:06,936 - INFO - Collection 'jina' already exists. Skipping creation.
2025-09-23 12:21:09,022 - INFO - All documents cleared from the collection.
2025-09-23 12:21:09,023 - INFO - Bucket 'vector-search-testing' exists.
2025-09-23 12:21:09,029 - INFO - Collection 'jina_cache' already exists. Skipping creation.
2025-09-23 12:21:11,037 - INFO - All documents cleared from the collection.


<couchbase.collection.Collection at 0x1251a9010>

# Creating Jina Embeddings
Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using Jina, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents.



In [8]:
try:
    embeddings = JinaEmbeddings(
        jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3"
    )
    logging.info("Successfully created JinaEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating JinaEmbeddings: {str(e)}")

2025-09-23 12:12:46,949 - INFO - Successfully created JinaEmbeddings


# Setting Up the GSI Vector Store
A vector store is where we'll keep our embeddings and perform vector similarity searches using GSI Vector Indexes. With GSI Vector Indexes, we use the Query Service instead of the Search Service, which provides better performance for vector-first workloads and can scale to billions of vectors.

LangChain provides support for GSI Vector Indexes through the `CouchbaseQueryVectorStore` class. This class leverages the Query Service directly and provides access to the advanced capabilities of GSI Vector Indexes while maintaining full compatibility with the LangChain ecosystem.

In [9]:
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created GSI vector store")
except Exception as e:
    raise ValueError(f"Failed to create GSI vector store: {str(e)}")


2025-09-23 12:12:48,281 - INFO - Successfully created GSI vector store


## Index Creation Timing

**GSI Vector Indexes must be created AFTER upserting your vectors into the database.** The index creation process involves sophisticated training and optimization steps that rely on the existing data to:

- **Analyze vector distribution**: The indexing algorithm examines the actual vector data to understand clustering patterns and density
- **Configure centroids optimally**: Based on your dataset's characteristics, the system determines the best centroid placement for efficient partitioning
- **Set up quantization parameters**: The compression settings are tuned according to your specific vector dimensions and value ranges
- **Train clustering algorithms**: Machine learning algorithms analyze your data to create the most effective search structures

Creating an index on an empty collection would result in an error. And creating it on a small subset of the vectors will result in suboptimal performance since the training algorithms have sparse data to learn from.

# Load the BBC News Dataset
To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

In [10]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-09-23 12:12:54,295 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


## Cleaning up the Data
We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

In [11]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


## Saving Data to the GSI Vector Store
To efficiently handle the large number of articles with GSI Vector Indexes, we process them in batches. This batch processing approach helps manage memory usage and provides better control over the ingestion process while leveraging the superior performance of GSI Vector Indexes.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using our custom GSI vector store's add_texts method, we add the filtered articles to our vector database.

We use a conservative batch size of 50 to ensure reliable operation.
The optimal batch size depends on many factors including:
- Document sizes being inserted
- Available system resources
- Network conditions
- Concurrent workload
- GSI Vector Index build performance

Consider measuring performance with your specific workload before adjusting.

In [15]:
# Calculate 60% of the dataset size and round to nearest integer
dataset_size = len(unique_news_articles)
subset_size = round(dataset_size * 0.6)

# Filter articles by length and create subset
filtered_articles = [article for article in unique_news_articles[:subset_size] 
                    if article and len(article) <= 50000]

# Process in batches
batch_size = 50

try:
    vector_store.add_texts(
        texts=filtered_articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully")
    
except CouchbaseException as e:
    logging.error(f"Couchbase error during ingestion: {str(e)}")
    raise RuntimeError(f"Error performing document ingestion: {str(e)}")
except Exception as e:
    if "Payment Required" in str(e):
        logging.error("Payment required for Jina AI API. Please check your subscription status and API key.")
        print("To resolve this error:")
        print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options")
        print("2. Ensure your API key is valid and has sufficient credits") 
        print("3. Consider upgrading your subscription plan if needed")
    else:
        logging.error(f"Unexpected error during ingestion: {str(e)}")
        raise RuntimeError(f"Failed to save documents to vector store: {str(e)}")

2025-09-23 12:15:24,527 - INFO - Document ingestion completed successfully


# Creating GSI Vector Indexes

Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where **GSI Vector Indexes** come into play. Unlike traditional FTS vector search that requires JSON index definitions, GSI Vector Indexes are created programmatically using SQL++ DDL statements through the Query Service.

## Index Types

### BHIVE (Hyperscale Vector Indexes)
- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search
- **Use when**: You primarily perform vector-only queries without complex scalar filtering
- **Features**: High performance with low memory footprint, optimized for concurrent operations, designed to scale to billions of vectors

### Composite Vector Indexes
- **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering
- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of the dataset  
- **Features**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope using SQL++ WHERE clauses
- **Important**: Scalar filters take precedence over vector similarity, which improves performance for filtered searches but may miss semantically relevant results that don't match the scalar criteria

## Index Configuration Details

The `index_metadata` parameter allows you to control how Couchbase optimizes vector storage and search performance. The `description` field uses a specific format that defines the indexing strategy:

### Format: `'IVF[<centroids>],{PQ|SQ}<settings>'`

#### **IVF (Inverted File Index) - Centroids Configuration**
- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches
- **Trade-offs**: More centroids = faster searches but slower training time
- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size
- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids)

#### **Quantization Options - Vector Compression**

**SQ (Scalar Quantization)**
- **Purpose**: Compresses vectors by reducing precision of individual components
- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision)
- **Trade-off**: Lower bits = more compression but less precision
- **Best for**: General-purpose applications where some precision loss is acceptable

**PQ (Product Quantization)**
- **Purpose**: Advanced compression using subquantizers for better precision
- **Format**: `PQ<subquantizers>x<bits>` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each)
- **Trade-off**: More complex but often better precision than SQ at similar compression ratios
- **Best for**: Applications requiring high precision with significant compression

### **Common Configuration Examples**

```
IVF,SQ8          # Auto-selected centroids with 8-bit scalar quantization (recommended default)
IVF1000,SQ6      # 1000 centroids with 6-bit scalar quantization (higher compression)
IVF,PQ32x8       # Auto-selected centroids with 32 subquantizers of 8 bits each
IVF500,PQ16x4    # 500 centroids with 16 subquantizers of 4 bits each (high compression)
```

## Tutorial Configuration

This tutorial is configured for:
- **Similarity Metric**: Dot product similarity
- **Database**: Bucket `vector-search-testing`, scope `shared`, collection `jina`
- **Index Type**: BHIVE (optimal for pure semantic search without complex filtering)

## Performance Considerations

**Distance Interpretation**: In GSI vector search, the score represents the vector distance between query and document embeddings. **Lower distances indicate higher similarity**, while higher distances indicate lower similarity.

**Scalability**: BHIVE indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments.

For more information on GSI Vector Indexes, please follow the [documentation](https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/vector-search.html).


In [16]:
vector_store.create_index(
    index_type=IndexType.BHIVE,
    index_description="IVF,SQ8"
)
# Note: To create a COMPOSITE index, the below code can be used.
# vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="cohere_composite_index", index_description="IVF,SQ8")

# Setting Up a Couchbase Cache
To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly.

Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience.


In [17]:
try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

2025-09-23 12:15:55,197 - INFO - Successfully created cache


# Creating the Jina Language Model (LLM)
Language models are AI systems that are trained to understand and generate human language. We'll be using Jina's language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses.

The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user.



In [18]:
try:
    llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY)
    logging.info("Successfully created JinaChat")
except Exception as e:
    logging.error(f"Error creating JinaChat: {str(e)}. Please check your API key and network connection.")
    raise

2025-09-23 12:15:56,444 - INFO - Successfully created JinaChat


## Perform GSI Vector Search
Semantic search with GSI Vector Indexes involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search using the Query Service and GSI Vector Indexes by comparing the query vector against the stored document vectors using SQL++ vector functions.

GSI Vector Indexes provide several advantages over traditional FTS approaches:

### Key Benefits of GSI Vector Search:
- **Superior Performance**: Optimized for vector-first workloads with better query performance
- **Scalability**: Can handle billions of vectors efficiently 
- **SQL++ Integration**: Leverages familiar SQL++ syntax with vector-specific functions
- **Index Types**: Choose between BHIVE (hyperscale) and Composite indexes based on use case

In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of our custom CouchbaseGSIVectorStore. This method uses SQL++ APPROX_VECTOR_DISTANCE functions to search the GSI Vector Index for the most relevant documents based on vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space.

The GSI approach uses the Query Service instead of the Search Service, providing better performance for vector-focused workloads and enabling advanced filtering capabilities. Vector distances are calculated using the configured similarity metric (dot product, cosine, or Euclidean) directly within the GSI Vector Index.

In [19]:
def perform_semantic_search(query, vector_store):    
    try:
        start_time = time.time()
        search_results = vector_store.similarity_search_with_score(query, k=5)
        search_elapsed_time = time.time() - start_time
        
        logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds")
        return search_results, search_elapsed_time
        
    except Exception as e:
        error_str = str(e)
        logging.error(f"Search error: {error_str}")
        if "Payment Required" in error_str:
            raise RuntimeError("Payment required for Jina AI API. Please check your subscription status and API key.")
        else:
            raise RuntimeError(f"Search failed: {error_str}")

try:
    query = "What was manchester city manager pep guardiola's reaction to the team's current form?"
    search_results, search_elapsed_time = perform_semantic_search(query, vector_store)
    
    # Display search results
    print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):")
    print("-"*80)
    for doc, score in search_results:
        print(f"Score: {score:.4f}, Text: {doc.page_content}")
        print("-"*80)
        
except RuntimeError as e:
    print(f"Error: {str(e)}")

2025-09-23 12:15:58,350 - INFO - Semantic search completed in 0.59 seconds



Semantic Search Results (completed in 0.59 seconds):
--------------------------------------------------------------------------------
Score: 0.3205, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City

Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved.

Pep Guardiola's Man City

# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain
Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain.

The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation.

In [20]:
try:
    template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
    {context}

    Question: {question}"""
    prompt = ChatPromptTemplate.from_template(template)

    rag_chain = (
        {"context": vector_store.as_retriever(search_kwargs={"k": 2}), "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    logging.info("Successfully created RAG chain")
except Exception as e:
    raise ValueError(f"Error creating RAG chain: {str(e)}")

2025-09-23 12:15:59,465 - INFO - Successfully created RAG chain


In [21]:
try:
    # Create chain with k=2
    # Start with k=4 and gradually reduce if token limit exceeded
    # k=4 -> k=3 -> k=2 based on token limit warnings
    # Final k=2 produced valid response about Guardiola in 2.33 seconds
    current_chain = (
        {
            "context": vector_store.as_retriever(search_kwargs={"k": 2}),
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )
    
    # Try to get response
    start_time = time.time()
    rag_response = current_chain.invoke(query)
    elapsed_time = time.time() - start_time
    
    logging.info(f"RAG response generated in {elapsed_time:.2f} seconds using k=2")
    print(f"RAG Response: {rag_response}")
    print(f"Response generated in {elapsed_time:.2f} seconds")
    
except Exception as e:
    if "Payment Required" in str(e):
        logging.error("Payment required for Jina AI API. Please check your subscription status and API key.")
        print("To resolve this error:")
        print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options")
        print("2. Ensure your API key is valid and has sufficient credits")
        print("3. Consider upgrading your subscription plan if needed")
    else:
        raise RuntimeError(f"Unexpected error: {str(e)}")

2025-09-23 12:16:23,712 - INFO - RAG response generated in 5.36 seconds using k=2


RAG Response: Pep Guardiola has shown self-doubt and agitation in response to Manchester City's recent poor form, expressing concerns about his ability to address the team's crisis.
Response generated in 5.36 seconds


# Using Couchbase as a caching mechanism
Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key.

For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently.

In [22]:
try:
    queries = [
        "What happened in the match between Fullham and Liverpool?",
        "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query
        "What happened in the match between Fullham and Liverpool?", # Repeated query
    ]

    for i, query in enumerate(queries, 1):
        print(f"\nQuery {i}: {query}")
        start_time = time.time()
        response = rag_chain.invoke(query)
        elapsed_time = time.time() - start_time
        print(f"Response: {response}")
        
        print(f"Time taken: {elapsed_time:.2f} seconds")
except Exception as e:
    if "Payment Required" in str(e):
        logging.error("Payment required for Jina AI API. Please check your subscription status and API key.")
        print("To resolve this error:")
        print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options")
        print("2. Ensure your API key is valid and has sufficient credits")
        print("3. Consider upgrading your subscription plan if needed")
    else:
        raise RuntimeError(f"Unexpected error: {str(e)}")


Query 1: What happened in the match between Fullham and Liverpool?
Response: I'm sorry, but the information provided does not mention any details about a match between Fullham and Liverpool.
Time taken: 2.25 seconds

Query 2: What was manchester city manager pep guardiola's reaction to the team's current form?
Response: Pep Guardiola has shown self-doubt and agitation in response to Manchester City's recent poor form, expressing concerns about his ability to address the team's crisis.
Time taken: 0.59 seconds

Query 3: What happened in the match between Fullham and Liverpool?
Response: I'm sorry, but the information provided does not mention any details about a match between Fullham and Liverpool.
Time taken: 0.53 seconds


## Conclusion
By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Jina. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine.