## Introduction
In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Cohere](https://cohere.com/)
 as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using Couchbase Hyperscale and Composite Vector Index from scratch. Alternatively if you want to perform semantic search using the Search Vector Index, please take a look at [this.](https://developer.couchbase.com/tutorial-cohere-couchbase-rag-with-search-vector-index)

## How to run this tutorial

This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/cohere/query_based/RAG_with_Couchbase_and_Cohere.ipynb).

You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment.

## Before you start

## Get Credentials for Cohere

Please follow the [instructions](https://dashboard.cohere.com/welcome/register) to generate the Cohere credentials.

## Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.

To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

Note: To run this tutorial, you will need Capella with Couchbase Server version 8.0 or above as Hyperscale and Composite Vector Index search is supported only from version 8.0

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.

## Setting the Stage: Installing Necessary Libraries
To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks.

In [None]:
%pip install --quiet datasets==3.5.0 langchain-couchbase==1.0.1 langchain-cohere==0.5.0 python-dotenv==1.1.1

## Importing Necessary Libraries
The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models.

In [None]:
import getpass
import json
import logging
import os
import time
from datetime import timedelta
from uuid import uuid4

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException,
                                  InternalServerFailureException,
                                  QueryIndexAlreadyExistsException,
                                  ServiceUnavailableException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_cohere import ChatCohere, CohereEmbeddings
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType

## Setup Logging
Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script.


In [None]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Supress Excessive logging
logging.getLogger('openai').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)
logging.getLogger('langchain_cohere').setLevel(logging.ERROR)


## Loading Sensitive Informnation
In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.

In [None]:
load_dotenv()

COHERE_API_KEY = os.getenv('COHERE_API_KEY') or getpass.getpass('Enter your Cohere API key: ')
CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: cohere): ') or 'cohere'
CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache'

# Check if the variables are correctly loaded
if not COHERE_API_KEY:
    raise ValueError("COHERE_API_KEY is not provided and is required.")

## Connect to Couchbase
The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding.




In [None]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

## Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [None]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in scopes
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)


## Create Embeddings
Embeddings are created using the Cohere API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Cohere to generate embeddings for the text in the BBC News dataset.

In [None]:
try:
    embeddings = CohereEmbeddings(
        cohere_api_key=COHERE_API_KEY,
        model="embed-english-v3.0",
    )
    logging.info("Successfully created CohereEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating CohereEmbeddings: {str(e)}")

## Set Up Vector Store
The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched.


In [None]:
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding = embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created vector store")
except Exception as e:
    raise ValueError(f"Failed to create vector store: {str(e)}")

## Load the BBC News Dataset
To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

In [None]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

## Cleaning up the Data
We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

In [None]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

## Saving Data to the Vector Store
To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:
1. Memory Efficiency: Processing in smaller batches prevents memory overload
2. Progress Tracking: Easier to monitor and track the ingestion progress
3. Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation.
The optimal batch size depends on many factors including:
- Document sizes being inserted
- Available system resources
- Network conditions
- Concurrent workload

Consider measuring performance with your specific workload before adjusting.


In [None]:
batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")


## Create Language Model (LLM)
The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs.


In [None]:
try:
    llm = ChatCohere(
        cohere_api_key=COHERE_API_KEY,
        model="command-a-03-2025",
        temperature=0
    )
    logging.info("Successfully created Cohere LLM with model command")
except Exception as e:
    raise ValueError(f"Error creating Cohere LLM: {str(e)}")

## Vector Search Performance Testing

Now let's demonstrate the performance benefits of Hyperscale Vector Index by testing pure vector search performance. We'll compare:

1. **Baseline Performance**: Vector search without Hyperscale index optimization
2. **Hyperscale-Optimized Performance**: Same search with Hyperscale index

### Test 1: Baseline Performance (No Hyperscale Index)

Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined.

In the code below, we perform a baseline semantic search before creating the Hyperscale index to establish a performance baseline.

In [None]:
query = "What was manchester city manager pep guardiola's reaction to the team's current form?"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    baseline_time = time.time() - start_time

    logging.info(f"Baseline search completed in {baseline_time:.2f} seconds")

    # Display search results
    print(f"\nBaseline Semantic Search Results (completed in {baseline_time:.2f} seconds):")
    print("-" * 80)
    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content[:200]}...")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

## Optimizing Vector Search with Hyperscale and Composite Vector Index

While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Couchbase's query-based vector indexing.

Couchbase offers three types of vector indexes, but for Hyperscale and Composite Vector Index based search we focus on two main types:

Hyperscale Vector Indexes
- Best for pure vector searches - content discovery, recommendations, semantic search
- High performance with low memory footprint - designed to scale to billions of vectors
- Optimized for concurrent operations - supports simultaneous searches and inserts
- Use when: You primarily perform vector-only queries without complex scalar filtering
- Ideal for: Large-scale semantic search, recommendation systems, content discovery

Composite Vector Indexes 
- Best for filtered vector searches - combines vector search with scalar value filtering
- Efficient pre-filtering - scalar attributes reduce the vector comparison scope
- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data
- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries

Choosing the Right Index Type
- Start with Hyperscale Vector Index for pure vector searches and large datasets
- Use Composite Vector Index when scalar filters significantly reduce your search space
- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions

For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html).


## Understanding Index Configuration (Couchbase 8.0 Feature)

The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization:

Format: `'IVF[<centroids>],{PQ|SQ}<settings>'`

Centroids (IVF - Inverted File):
- Controls how the dataset is subdivided for faster searches
- More centroids = faster search, slower training  
- Fewer centroids = slower search, faster training
- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

Quantization Options:
- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
- PQ (Product Quantization): PQ<subquantizers>x<bits> (e.g., PQ32x8)
- Higher values = better accuracy, larger index size

Common Examples:
- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization  
- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings).

In the code below, we demonstrate creating a Hyperscale index. This method takes an index type (HYPERSCALE or COMPOSITE) and description parameter for optimization settings. Alternatively, Hyperscale and Composite Vector Indexes can be created manually from the Couchbase UI.

In [None]:
vector_store.create_index(index_type=IndexType.HYPERSCALE, index_name="cohere_hyperscale_index",index_description="IVF,SQ8")

### Test 2: Hyperscale-Optimized Performance

The example below shows running the same similarity search, but now using the Hyperscale index we created above. You'll notice improved performance as the index efficiently retrieves data.

**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria.

Note: In Hyperscale and Composite Vector Index search, the distance represents the vector distance between the query and document embeddings. Lower distance indicates higher similarity, while higher distance indicates lower similarity.

In [None]:
query = "What was manchester city manager pep guardiola's reaction to the team's current form?"

try:
    # Perform the semantic search with Hyperscale index
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    hyperscale_time = time.time() - start_time

    logging.info(f"Hyperscale search completed in {hyperscale_time:.2f} seconds")

    # Display search results
    print(f"\nHyperscale Semantic Search Results (completed in {hyperscale_time:.2f} seconds):")
    print("-" * 80)
    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content[:200]}...")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

## Performance Summary

In [None]:
print("\n" + "="*60)
print("PERFORMANCE SUMMARY")
print("="*60)

print(f"Baseline Search Time:     {baseline_time:.4f} seconds")

if baseline_time and hyperscale_time:
    speedup = baseline_time / hyperscale_time if hyperscale_time > 0 else float('inf')
    percent_improvement = ((baseline_time - hyperscale_time) / baseline_time) * 100 if baseline_time > 0 else 0
    print(f"Hyperscale Search Time:   {hyperscale_time:.4f} seconds ({speedup:.2f}x faster, {percent_improvement:.1f}% improvement)")

print("\n" + "-"*60)
print("Index Recommendation:")
print("-"*60)
print("- Hyperscale: Best for pure vector searches, scales to billions of vectors")
print("- Composite: Best for filtered searches combining vector + scalar filters")

### Alternative: Composite Index Configuration

If your use case requires complex filtering with scalar attributes, you can create a Composite index instead:

```python
from langchain_couchbase.vectorstores import IndexType
vector_store.create_index(
    index_type=IndexType.COMPOSITE,  # Instead of IndexType.HYPERSCALE
    index_name="cohere_composite_index",
    index_description="IVF,SQ8"
)
```

Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but Hyperscale is more efficient for pure semantic search across news articles.

## Set Up Cache
 A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries.


In [None]:
try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

## Retrieval-Augmented Generation (RAG) with Couchbase and Langchain
Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain.

The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation.

In [None]:
try:
    template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
    {context}

    Question: {question}"""
    prompt = ChatPromptTemplate.from_template(template)

    rag_chain = (
        {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    logging.info("Successfully created RAG chain")
except Exception as e:
    raise ValueError(f"Error creating RAG chain: {str(e)}")

In [None]:
start_time = time.time()
try:
    rag_response = rag_chain.invoke(query)
    rag_elapsed_time = time.time() - start_time
    print(f"RAG Response: {rag_response}")
    print(f"RAG response generated in {rag_elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

## Using Couchbase as a caching mechanism
Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key.

For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently.

In [None]:
try:
    queries = [
        "What happened in the match between Fullham and Liverpool?",
        "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query
        "What happened in the match between Fullham and Liverpool?", # Repeated query
    ]

    for i, query in enumerate(queries, 1):
        print(f"\nQuery {i}: {query}")
        start_time = time.time()
        response = rag_chain.invoke(query)
        elapsed_time = time.time() - start_time
        print(f"Response: {response}")
        print(f"Time taken: {elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

## Conclusion

You've built a high-performance semantic search engine using Couchbase Hyperscale/Composite indexes with Cohere and LangChain. For the Search Vector Index alternative, see the [search_based tutorial](https://developer.couchbase.com/tutorial-cohere-couchbase-rag-with-search-vector-index).