# Introduction 
In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Deepseek V3 as the language model provider (via OpenRouter or direct API)](https://deepseek.ai/) and OpenAI for embeddings. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-openrouter-deepseek-with-global-secondary-index/)

# How to run this tutorial

This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/openrouter-deepseek/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb).

You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment.

# Before you start

## Get Credentials for OpenRouter and Deepseek
* Sign up for an account at [OpenRouter](https://openrouter.ai/) to get your API key
* OpenRouter provides access to Deepseek models, so no separate Deepseek credentials are needed
* Store your OpenRouter API key securely as it will be used to access the models
* For [Deepseek](https://deepseek.ai/) models, you can use the default models provided by OpenRouter

## Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.

To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.

## Setting the Stage: Installing Necessary Libraries

To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks.

In [1]:
%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-deepseek==0.1.3 langchain-openai==0.3.13 python-dotenv==1.1.0

Note: you may need to restart the kernel to use updated packages.


## Importing Necessary Libraries

The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.

In [2]:
import getpass
import json
import logging
import os
import time
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException,
                                InternalServerFailureException,
                                QueryIndexAlreadyExistsException,ServiceUnavailableException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore
from langchain_openai import OpenAIEmbeddings

## Setup Logging
Logging is configured to track the progress of the script and capture any errors or warnings.

In [3]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

# Suppress httpx logging
logging.getLogger('httpx').setLevel(logging.CRITICAL)

## Environment Variables and Configuration

This section handles loading and validating environment variables and configuration settings:
#
1. API Keys:
   - Supports either direct Deepseek API or OpenRouter API access
   - Prompts for API key input if not found in environment
   - Requires OpenAI API key for embeddings
#
2. Couchbase Settings:
   - Connection details (host, username, password)
   - Bucket, scope and collection names
   - Vector search index configuration
   - Cache collection settings
#
The code validates that all required credentials are present before proceeding.
It allows flexible configuration through environment variables or interactive prompts,
with sensible defaults for local development.


In [4]:
# Load environment variables from .env file if it exists
load_dotenv()

# API Keys
# Allow either Deepseek API directly or via OpenRouter
DEEPSEEK_API_KEY = os.getenv('DEEPSEEK_API_KEY')
OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY')

if not DEEPSEEK_API_KEY and not OPENROUTER_API_KEY:
    api_choice = input('Choose API (1 for Deepseek direct, 2 for OpenRouter): ')
    if api_choice == '1':
        DEEPSEEK_API_KEY = getpass.getpass('Enter your Deepseek API Key: ')
    else:
        OPENROUTER_API_KEY = getpass.getpass('Enter your OpenRouter API Key: ')

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ')

# Couchbase Settings
CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing'
INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_deepseek): ') or 'vector_search_deepseek'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: deepseek): ') or 'deepseek'
CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache'

# Check if required credentials are set
required_creds = {
    'OPENAI_API_KEY': OPENAI_API_KEY,
    'CB_HOST': CB_HOST,
    'CB_USERNAME': CB_USERNAME,
    'CB_PASSWORD': CB_PASSWORD,
    'CB_BUCKET_NAME': CB_BUCKET_NAME
}

# Add the API key that was chosen
if DEEPSEEK_API_KEY:
    required_creds['DEEPSEEK_API_KEY'] = DEEPSEEK_API_KEY
elif OPENROUTER_API_KEY:
    required_creds['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY
else:
    raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided")

for cred_name, cred_value in required_creds.items():
    if not cred_value:
        raise ValueError(f"{cred_name} is not set")

# Connecting to the Couchbase Cluster
Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.



In [5]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2025-05-25 14:39:18,465 - INFO - Successfully connected to Couchbase


## Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties.

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Creates primary index on collection for query performance
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [6]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Ensure primary index exists
        try:
            cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()
            logging.info("Primary index present or created successfully.")
        except Exception as e:
            logging.warning(f"Error creating primary index: {str(e)}")

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)


2025-05-25 14:39:19,580 - INFO - Bucket 'vector-search-testing' exists.
2025-05-25 14:39:21,409 - INFO - Collection 'deepseek' already exists. Skipping creation.
2025-05-25 14:39:24,342 - INFO - Primary index present or created successfully.
2025-05-25 14:39:24,604 - INFO - All documents cleared from the collection.
2025-05-25 14:39:24,606 - INFO - Bucket 'vector-search-testing' exists.
2025-05-25 14:39:26,535 - INFO - Collection 'cache' already exists. Skipping creation.
2025-05-25 14:39:29,589 - INFO - Primary index present or created successfully.
2025-05-25 14:39:29,813 - INFO - All documents cleared from the collection.


<couchbase.collection.Collection at 0x12bc6b490>

# Loading Couchbase Vector Search Index

Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.

This OpenRouter Deepseek vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `deepseek`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly.

For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html).


In [7]:
try:
    with open('deepseek_index.json', 'r') as file:
        index_definition = json.load(file)
except Exception as e:
    raise ValueError(f"Error loading index definition: {str(e)}")

# Creating or Updating Search Indexes

With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine.Creating search indexes placeholder text.

In [8]:
try:
    scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()

    # Check if index already exists
    existing_indexes = scope_index_manager.get_all_indexes()
    index_name = index_definition["name"]

    if index_name in [index.name for index in existing_indexes]:
        logging.info(f"Index '{index_name}' found")
    else:
        logging.info(f"Creating new index '{index_name}'...")

    # Create SearchIndex object from JSON definition
    search_index = SearchIndex.from_json(index_definition)

    # Upsert the index (create if not exists, update if exists)
    scope_index_manager.upsert_index(search_index)
    logging.info(f"Index '{index_name}' successfully created/updated.")

except QueryIndexAlreadyExistsException:
    logging.info(f"Index '{index_name}' already exists. Skipping creation/update.")
except ServiceUnavailableException:
    raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.")
except InternalServerFailureException as e:
    logging.error(f"Internal server error: {str(e)}")
    raise

2025-05-25 14:39:31,015 - INFO - Index 'vector_search_deepseek' found
2025-05-25 14:39:31,770 - INFO - Index 'vector_search_deepseek' already exists. Skipping creation/update.


## Creating the Embeddings client
This section creates an OpenAI embeddings client using the OpenAI API key.
The embeddings client is configured to use the "text-embedding-3-small" model,
which converts text into numerical vector representations.
These vector embeddings are essential for semantic search and similarity matching.
The client will be used by the vector store to generate embeddings for documents.

In [9]:
try:
    embeddings = OpenAIEmbeddings(
        api_key=OPENAI_API_KEY,
        model="text-embedding-3-small"
    )
    logging.info("Successfully created OpenAI embeddings client")
except Exception as e:
    raise ValueError(f"Error creating OpenAI embeddings client: {str(e)}")

2025-05-25 14:39:32,003 - INFO - Successfully created OpenAI embeddings client


##  Setting Up the Couchbase Vector Store
A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used.

In [10]:
try:
    vector_store = CouchbaseSearchVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        index_name=INDEX_NAME,
    )
    logging.info("Successfully created vector store")
except Exception as e:
    raise ValueError(f"Failed to create vector store: {str(e)}")

2025-05-25 14:39:35,246 - INFO - Successfully created vector store


## Load the BBC News Dataset
To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

In [11]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-05-25 14:39:41,364 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


## Cleaning up the Data
We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

In [12]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


## Saving Data to the Vector Store
To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:
1. Memory Efficiency: Processing in smaller batches prevents memory overload
2. Progress Tracking: Easier to monitor and track the ingestion progress
3. Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation.
The optimal batch size depends on many factors including:
- Document sizes being inserted
- Available system resources
- Network conditions
- Concurrent workload

Consider measuring performance with your specific workload before adjusting.


In [13]:
batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2025-05-25 14:41:37,848 - INFO - Document ingestion completed successfully.


## Setting Up a Couchbase Cache
To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly.

Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience.


In [14]:
try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

2025-05-25 14:41:40,203 - INFO - Successfully created cache


## Setting Up the LLM Model
In this section, we set up the Large Language Model (LLM) for our RAG system. We're using the Deepseek model, which can be accessed through two different methods:

1. **Deepseek API Key**: This is obtained directly from Deepseek's platform (https://deepseek.ai) by creating an account and subscribing to their API services. With this key, you can access Deepseek's models directly using the `ChatDeepSeek` class from the `langchain_deepseek` package.

2. **OpenRouter API Key**: OpenRouter (https://openrouter.ai) is a service that provides unified access to multiple LLM providers, including Deepseek. You can obtain an API key by creating an account on OpenRouter's website. This approach uses the `ChatOpenAI` class from `langchain_openai` but with a custom base URL pointing to OpenRouter's API endpoint.

The key difference is that OpenRouter acts as an intermediary service that can route your requests to various LLM providers, while the Deepseek API gives you direct access to only Deepseek's models. OpenRouter can be useful if you want to switch between different LLM providers without changing your code significantly.

In our implementation, we check for both keys and prioritize using the Deepseek API directly if available, falling back to OpenRouter if not. The model is configured with temperature=0 to ensure deterministic, focused responses suitable for RAG applications.


In [None]:
from langchain_deepseek import ChatDeepSeek
from langchain_openai import ChatOpenAI

if DEEPSEEK_API_KEY:
    try:
        llm = ChatDeepSeek(
            api_key=DEEPSEEK_API_KEY,
            model_name="deepseek-chat",
            temperature=0
        )
        logging.info("Successfully created Deepseek LLM client")
    except Exception as e:
        raise ValueError(f"Error creating Deepseek LLM client: {str(e)}")
elif OPENROUTER_API_KEY:
    try:
        llm = ChatOpenAI(
            api_key=OPENROUTER_API_KEY,
            base_url="https://openrouter.ai/api/v1",
            model="deepseek/deepseek-chat-v3.1", 
            temperature=0,
        )
        logging.info("Successfully created Deepseek LLM client through OpenRouter")
    except Exception as e:
        raise ValueError(f"Error creating Deepseek LLM client: {str(e)}")
else:
    raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided")

2025-05-25 14:41:40,237 - INFO - Successfully created Deepseek LLM client through OpenRouter


## Perform Semantic Search
Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined.

In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison.

In [16]:
query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    search_elapsed_time = time.time() - start_time

    logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds")

    # Display search results
    print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):")
    print("-" * 80)

    for doc, score in search_results:
        print(f"Score: {score:.4f}, Text: {doc.page_content}")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

2025-05-25 14:41:41,802 - INFO - Semantic search completed in 1.56 seconds



Semantic Search Results (completed in 1.56 seconds):
--------------------------------------------------------------------------------
Score: 0.6303, Text: The Littler effect - how darts hit the bullseye

Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson.

One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double th

## Retrieval-Augmented Generation (RAG) with Couchbase and LangChain
Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain.

The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation.

In [17]:
# Create RAG prompt template
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that answers questions based on the provided context."),
    ("human", "Context: {context}\n\nQuestion: {question}")
])

# Create RAG chain
rag_chain = (
    {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)
logging.info("Successfully created RAG chain")

2025-05-25 14:41:41,810 - INFO - Successfully created RAG chain


In [18]:
try:
    start_time = time.time()
    rag_response = rag_chain.invoke(query)
    rag_elapsed_time = time.time() - start_time

    print(f"RAG Response: {rag_response}")
    print(f"RAG response generated in {rag_elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

RAG Response: In his recent 2025 PDC World Championship second-round match against Ryan Meikle, **Luke Littler** achieved several notable milestones and records:

1. **Tournament Record Set Average**:  
   - Littler hit a **140.91 set average** in the fourth set, the highest ever recorded in the tournament for a single set. This included three consecutive legs finished in 11, 10, and 11 darts.

2. **Near Nine-Darter**:  
   - He narrowly missed a nine-dart finish (the pinnacle of darts perfection) by millimeters when he failed to land double 12 in the fourth set.

3. **Overall Performance**:  
   - Despite a slow start and admitted nerves, he secured a **3-1 victory** with a dominant fourth set, hitting **four maximum 180s** and maintaining an overall match average of **100.85**.

4. **Emotional Impact**:  
   - The 17-year-old became emotional post-match, cutting short his on-stage interview due to the intensity of the moment, later calling it the "toughest game" he’d ever played.

Th

## Using Couchbase as a caching mechanism
Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key.

For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently.


In [19]:
try:
    queries = [
        "What happened in the match between Fullham and Liverpool?",
        "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", # Repeated query
        "What happened in the match between Fullham and Liverpool?", # Repeated query
    ]

    for i, query in enumerate(queries, 1):
        print(f"\nQuery {i}: {query}")
        start_time = time.time()

        response = rag_chain.invoke(query)
        elapsed_time = time.time() - start_time
        print(f"Response: {response}")
        print(f"Time taken: {elapsed_time:.2f} seconds")

except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")


Query 1: What happened in the match between Fullham and Liverpool?
Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Key highlights include:

1. **Red Card Incident**: Liverpool played most of the match with 10 men after Andy Robertson received a red card in the 17th minute for denying a goalscoring opportunity. He had earlier been injured in a tackle by Fulham's Issa Diop.

2. **Comeback Resilience**: Despite the numerical disadvantage, Liverpool twice came from behind. Diogo Jota scored an 86th-minute equalizer to secure a point. Fulham's Antonee Robinson praised Liverpool, noting it "didn’t feel like they had 10 men" due to their aggressive, high-pressing approach.

3. **Performance Metrics**: Liverpool dominated possession (over 60%) and led in key attacking stats (shots, big chances, touches in the opposition box), showcasing their determination even with a player deficit.

4. **Manager & Player Reactions**: 
   - Manager Arne Slot commended his t

## Conclusion
By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Deepseek(via Openrouter). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine.