# Agent-Based RAG with Couchbase GSI Vector Search and CrewAI

## Overview

In this guide, we will walk you through building a powerful semantic search engine using [Couchbase](https://www.couchbase.com) as the backend database and [CrewAI](https://github.com/crewAIInc/crewAI) for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial uses Couchbase's **Global Secondary Index (GSI)** vector search capabilities, which offer high-performance vector search optimized for large-scale applications. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-couchbase-rag-using-fts)

## How to Run This Tutorial

This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.

You can either:
- Download the notebook file and run it on [Google Colab](https://colab.research.google.com)
- Run it on your system by setting up the Python environment

## Prerequisites

### Couchbase Requirements

1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up)
   - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy a free tier operational cluster
   - This account provides you with an environment where you can explore and learn about Capella
   - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html)
   - **Important**: This tutorial requires Couchbase Server **8.0+** for GSI vector search capabilities

### Couchbase Capella Configuration

When running Couchbase using Capella, the following prerequisites need to be met:
- Create the database credentials to access the required bucket (Read and Write) used in the application
- Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access)

## Setup and Installation

### Installing Necessary Libraries

We'll install the following key libraries:
- `datasets`: For loading and managing our training data
- `langchain-couchbase`: To integrate Couchbase with LangChain for GSI vector storage and caching
- `langchain-openai`: For accessing OpenAI's embedding and chat models
- `crewai`: To create and orchestrate our AI agents for RAG operations
- `python-dotenv`: For securely managing environment variables and API keys

These libraries provide the foundation for building a semantic search engine with GSI vector embeddings, database integration, and agent-based RAG capabilities.

In [1]:
%pip install --quiet datasets==4.1.0 langchain-couchbase==0.5.0rc1 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1

Note: you may need to restart the kernel to use updated packages.


### Import Required Modules

The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.

In [2]:
import getpass
import json
import logging
import os
import time
from datetime import timedelta
from uuid import uuid4

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.diagnostics import PingState, ServiceType
from couchbase.exceptions import (InternalServerFailureException,
                                  QueryIndexAlreadyExistsException,
                                  ServiceUnavailableException,
                                  CouchbaseException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from crewai.tools import tool
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy, IndexType
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from crewai import Agent, Crew, Process, Task

### Configure Logging

Logging is configured to track the progress of the script and capture any errors or warnings.

In [3]:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

# Suppress httpx logging
logging.getLogger('httpx').setLevel(logging.CRITICAL)

### Load Environment Configuration

In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values.

In [4]:
# Load environment variables
load_dotenv("./.env")

# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ")
if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set")

CB_HOST = os.getenv('CB_HOST') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or 'vector-search-testing'
SCOPE_NAME = os.getenv('SCOPE_NAME') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or 'crew'

print("Configuration loaded successfully")

Configuration loaded successfully


## Couchbase Connection Setup

### Connect to Cluster

Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.

In [5]:
# Connect to Couchbase
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    print("Successfully connected to Couchbase")
except Exception as e:
    print(f"Failed to connect to Couchbase: {str(e)}")
    raise

Successfully connected to Couchbase


### Setup Collections

Create and configure Couchbase bucket, scope, and collection for storing our vector data.

1. **Bucket Creation:**
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties.

2. **Scope Management:**  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. **Collection Setup:**
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

**Additional Tasks:**
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [6]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)

2025-09-24 13:31:23 [INFO] Bucket 'vector-search-testing' does not exist. Creating it...
2025-09-24 13:31:25 [INFO] Bucket 'vector-search-testing' created successfully.
2025-09-24 13:31:25 [INFO] Scope 'shared' does not exist. Creating it...
2025-09-24 13:31:26 [INFO] Scope 'shared' created successfully.
2025-09-24 13:31:26 [INFO] Collection 'crew' does not exist. Creating it...
2025-09-24 13:31:26 [INFO] Collection 'crew' created successfully.
2025-09-24 13:31:28 [INFO] All documents cleared from the collection.


<couchbase.collection.Collection at 0x31be03770>

## Understanding GSI Vector Search

### GSI Vector Index Configuration

Semantic search with GSI requires creating a Global Secondary Index optimized for vector operations. Unlike FTS-based vector search, GSI vector indexes offer two distinct types optimized for different use cases:

#### GSI Vector Index Types

##### Hyperscale Vector Indexes (BHIVE)

- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search
- **Performance**: High performance with low memory footprint, optimized for concurrent operations
- **Scalability**: Designed to scale to billions of vectors
- **Use when**: You primarily perform vector-only queries without complex scalar filtering

##### Composite Vector Indexes

- **Best for**: Filtered vector searches that combine vector search with scalar value filtering
- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope
- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data
- **Note**: Scalar filters take precedence over vector similarity

#### Understanding Index Configuration

The `index_description` parameter controls how Couchbase optimizes vector storage and search through centroids and quantization:

**Format**: `'IVF[<centroids>],{PQ|SQ}<settings>'`

**Centroids (IVF - Inverted File):**
- Controls how the dataset is subdivided for faster searches
- More centroids = faster search, slower training  
- Fewer centroids = slower search, faster training
- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

**Quantization Options:**
- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
- PQ (Product Quantization): PQ<subquantizers>x<bits> (e.g., PQ32x8)
- Higher values = better accuracy, larger index size

**Common Examples:**
- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization  
- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings).

For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html).

In [7]:
# GSI Vector Index Configuration
# Unlike FTS indexes, GSI vector indexes are created programmatically through the vector store
# We'll configure the parameters that will be used for index creation

# Vector configuration
DISTANCE_STRATEGY = DistanceStrategy.COSINE  # Cosine similarity
INDEX_TYPE = IndexType.BHIVE  # Using BHIVE for high-performance vector 
INDEX_DESCRIPTION = "IVF,SQ8"  # Auto-selected centroids with 8-bit scalar quantization

# To create a Composite Index instead, use the following:
# INDEX_TYPE = IndexType.COMPOSITE  # Combines vector search with scalar filtering

print("GSI vector index configuration prepared")

GSI vector index configuration prepared


## OpenAI Configuration

This section initializes two key OpenAI components needed for our RAG system:

1. **OpenAI Embeddings:**
   - Uses the 'text-embedding-3-small' model
   - Converts text into high-dimensional vector representations (embeddings)
   - These embeddings enable semantic search by capturing the meaning of text
   - Required for vector similarity search in Couchbase

2. **ChatOpenAI Language Model:**
   - Uses the 'gpt-4o' model
   - Temperature set to 0.2 for balanced creativity and focus
   - Serves as the cognitive engine for CrewAI agents
   - Powers agent reasoning, decision-making, and task execution
   - Enables agents to:
     - Process and understand retrieved context from vector search
     - Generate thoughtful responses based on that context
     - Follow instructions defined in agent roles and goals
     - Collaborate with other agents in the crew
   - The relatively low temperature (0.2) ensures agents produce reliable, consistent outputs while maintaining some creative problem-solving ability

Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication.
In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them to interpret tasks, retrieve relevant information via the RAG system, and generate appropriate outputs based on their specialized roles and expertise.

In [8]:
# Initialize OpenAI components
embeddings = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY,
    model="text-embedding-3-small"
)

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model="gpt-4o",
    temperature=0.2
)

print("OpenAI components initialized")

OpenAI components initialized


## Document Processing and Vector Store Setup

### Create Couchbase GSI Vector Store

Set up the GSI vector store where we'll store document embeddings for high-performance semantic search.

In [9]:
# Setup GSI vector store with OpenAI embeddings
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        distance_metric=DISTANCE_STRATEGY
    )
    print("GSI Vector store initialized successfully")
    logging.info("GSI Vector store setup completed")
except Exception as e:
    logging.error(f"Failed to initialize GSI vector store: {str(e)}")
    raise RuntimeError(f"GSI Vector store initialization failed: {str(e)}")

2025-09-24 13:31:32 [INFO] GSI Vector store setup completed


GSI Vector store initialized successfully


### Load BBC News Dataset

To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

In [10]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-09-24 13:31:39 [INFO] Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


#### Data Cleaning

Remove duplicate articles for cleaner search results.

In [11]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


#### Save Data to Vector Store

To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:
1. **Memory Efficiency**: Processing in smaller batches prevents memory overload
2. **Error Handling**: If an error occurs, only the current batch is affected
3. **Progress Tracking**: Easier to monitor and track the ingestion progress
4. **Resource Management**: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including document sizes, available system resources, network conditions, and concurrent workload.

In [12]:
batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2025-09-24 13:32:55 [INFO] Document ingestion completed successfully.


## CrewAI Agent Setup for Performance Testing

### What is CrewAI?

CrewAI enables us to create specialized AI agents that collaborate to handle different aspects of the RAG workflow:

- **Research Agent**: Finds and analyzes relevant documents using vector search
- **Writer Agent**: Takes research findings and creates polished, structured responses
- **Collaborative Workflow**: Agents work together, with the writer building on the researcher's findings

This multi-agent approach produces higher-quality responses than single-agent systems by separating research and writing expertise.

### Create Vector Search Tool

In [13]:
import time

# Create GSI vector retriever optimized for high-performance searches
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  # Return top 4 most similar documents
)

# Define the GSI vector search tool using the @tool decorator
@tool("gsi_vector_search")
def search_tool(query: str) -> str:
    """Search for relevant documents using GSI vector similarity.
    Input should be a simple text query string.
    Returns a list of relevant document contents from GSI vector search.
    Use this tool to find detailed information about topics using high-performance GSI indexes."""
    
    # Invoke the GSI vector retriever
    docs = retriever.invoke(query)

    # Format the results with distance information
    formatted_docs = "\n\n".join([
        f"Document {i+1}:\n{'-'*40}\n{doc.page_content}"
        for i, doc in enumerate(docs)
    ])
    return formatted_docs

### Create CrewAI Agents

In [14]:
# Custom response template
response_template = """
Analysis Results
===============
{%- if .Response %}
{{ .Response }}
{%- endif %}

Sources
=======
{%- for tool in .Tools %}
* {{ tool.name }}
{%- endfor %}

Metadata
========
* Confidence: {{ .Confidence }}
* Analysis Time: {{ .ExecutionTime }}
"""

# Create research agent
researcher = Agent(
    role='Research Expert',
    goal='Find and analyze the most relevant documents to answer user queries accurately',
    backstory="""You are an expert researcher with deep knowledge in information retrieval 
    and analysis. Your expertise lies in finding, evaluating, and synthesizing information 
    from various sources. You have a keen eye for detail and can identify key insights 
    from complex documents. You always verify information across multiple sources and 
    provide comprehensive, accurate analyses.""",
    tools=[search_tool],
    llm=llm,
    verbose=False,  # Set to False for cleaner performance testing
    memory=True,
    allow_delegation=False
)

# Create writer agent
writer = Agent(
    role='Technical Writer',
    goal='Generate clear, accurate, and well-structured responses based on research findings',
    backstory="""You are a skilled technical writer with expertise in making complex 
    information accessible and engaging. You excel at organizing information logically, 
    explaining technical concepts clearly, and creating well-structured documents. You 
    ensure all information is properly cited, accurate, and presented in a user-friendly 
    manner. You have a talent for maintaining the reader's interest while conveying 
    detailed technical information.""",
    llm=llm,
    verbose=False,  # Set to False for cleaner performance testing
    memory=True,
    allow_delegation=False
)

print("CrewAI agents created successfully")

CrewAI agents created successfully


### How the RAG Workflow Works

The complete RAG process:
1. **User Query** → Research Agent
2. **Vector Search** → GSI BHIVE index finds similar documents
3. **Document Analysis** → Research Agent analyzes and synthesizes findings
4. **Response Writing** → Writer Agent creates polished, structured response
5. **Final Output** → User receives comprehensive, well-formatted answer

## Performance Testing: RAG Workflow Before vs After GSI

Let's compare the complete end-to-end CrewAI RAG workflow performance before and after creating the BHIVE GSI index.

### Testing Function

In [15]:
def test_complete_rag_workflow(query_text, label="RAG Workflow"):
    """Test complete CrewAI RAG workflow and return timing metrics"""
    print(f"\n[{label}] Testing complete RAG workflow")
    print(f"[{label}] Query: '{query_text}'")
    print(f"[{label}] Starting CrewAI agents...")
    
    start_time = time.time()
    
    try:
        # Create tasks for the crew
        research_task = Task(
            description=f"Research and analyze information relevant to: {query_text}",
            agent=researcher,
            expected_output="A detailed analysis with key findings and supporting evidence"
        )
        
        writing_task = Task(
            description="Create a comprehensive and well-structured response",
            agent=writer,
            expected_output="A clear, comprehensive response that answers the query",
            context=[research_task]
        )
        
        # Create and execute crew
        crew = Crew(
            agents=[researcher, writer],
            tasks=[research_task, writing_task],
            process=Process.sequential,
            verbose=False,  # Disable verbose for cleaner performance testing
            cache=False,    # Disable cache for fair comparison
            planning=False  # Disable planning for faster execution
        )
        
        result = crew.kickoff()
        end_time = time.time()
        
        workflow_time = end_time - start_time
        print(f"[{label}] Complete RAG workflow completed in {workflow_time:.2f} seconds")
        print(f"[{label}] Response preview: {str(result)[:150]}...")
        
        return workflow_time
    except Exception as e:
        print(f"[{label}] RAG workflow failed: {str(e)}")
        return None

### Test 1: Baseline Performance (No GSI Index)

Test the complete RAG workflow without GSI optimization.

In [16]:
# Test baseline performance without GSI index
test_query = "What are the latest developments in football transfers?"
print("Testing baseline CrewAI RAG performance without GSI optimization...")
baseline_time = test_complete_rag_workflow(test_query, "Without GSI Index")
print(f"\nBaseline RAG workflow time (without GSI): {baseline_time:.2f} seconds\n")

[92m13:33:28 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:33:28 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai


Testing baseline CrewAI RAG performance without GSI optimization...

[Without GSI Index] Testing complete RAG workflow
[Without GSI Index] Query: 'What are the latest developments in football transfers?'
[Without GSI Index] Starting CrewAI agents...


[92m13:33:31 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13:33:31 [INFO] Wrapper: Completed Call, calling success_handler
2025-09-24 13:33:31 [INFO] Retrying request to /embeddings in 0.445302 seconds
[92m13:33:33 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:33:33 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai
[92m13:35:32 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13:35:32 [INFO] Wrapper: Completed Call, calling success_handler
[92m13:35:32 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:35:32 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:35:33 [INFO] Retrying request to /chat/completions in 0.437526 seconds
[92m13:35:52 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13

[Without GSI Index] Complete RAG workflow completed in 143.46 seconds
[Without GSI Index] Response preview: The Friedkin Group's acquisition of Everton Football Club marks a significant turning point for the club, which has faced numerous challenges in recen...

Baseline RAG workflow time (without GSI): 143.46 seconds



### Create BHIVE GSI Index

Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store, which will optimize the index settings based on our data and requirements.

In [17]:
# Create GSI Vector Index for high-performance searches
print("Creating BHIVE GSI vector index...")
try:
    # Create a BHIVE index optimized for pure vector searches
    vector_store.create_index(
        index_type=INDEX_TYPE,  # BHIVE index type
        index_description=INDEX_DESCRIPTION  # IVF,SQ8 for optimized performance
    )
    print(f"GSI Vector index created successfully")
    logging.info(f"BHIVE index created with description '{INDEX_DESCRIPTION}'")
    
    # Wait a moment for index to be available
    print("Waiting for index to become available...")
    time.sleep(5)
    
except Exception as e:
    # Index might already exist, which is fine
    if "already exists" in str(e).lower():
        print(f"GSI Vector index already exists, proceeding...")
        logging.info(f"Index already exists")
    else:
        logging.error(f"Failed to create GSI index: {str(e)}")
        raise RuntimeError(f"GSI index creation failed: {str(e)}")

Creating BHIVE GSI vector index...


2025-09-24 13:36:35 [INFO] BHIVE index created with description 'IVF,SQ8'


GSI Vector index created successfully
Waiting for index to become available...


### Test 2: GSI-Optimized Performance

Test the same RAG workflow with BHIVE GSI optimization.

In [18]:
# Test complete RAG workflow with GSI index
print("Testing CrewAI RAG performance with BHIVE GSI optimization...")
gsi_rag_time = test_complete_rag_workflow(test_query, "With BHIVE GSI")

[92m13:36:41 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:36:41 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai


Testing CrewAI RAG performance with BHIVE GSI optimization...

[With BHIVE GSI] Testing complete RAG workflow
[With BHIVE GSI] Query: 'What are the latest developments in football transfers?'
[With BHIVE GSI] Starting CrewAI agents...


[92m13:36:43 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13:36:43 [INFO] Wrapper: Completed Call, calling success_handler
[92m13:36:43 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:36:43 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai
[92m13:38:34 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13:38:34 [INFO] Wrapper: Completed Call, calling success_handler
[92m13:38:34 - LiteLLM:INFO[0m: utils.py:3258 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-09-24 13:38:34 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai
[92m13:38:55 - LiteLLM:INFO[0m: utils.py:1260 - Wrapper: Completed Call, calling success_handler
2025-09-24 13:38:55 [INFO] Wrapper: Completed Call, calling success_handler


[With BHIVE GSI] Complete RAG workflow completed in 133.87 seconds
[With BHIVE GSI] Response preview: The recent acquisition of Everton Football Club by the Friedkin Group marks a pivotal moment for the club, which has been struggling both on and off t...


### Performance Results

Compare the end-to-end performance improvement:

In [19]:
# Calculate performance improvement for complete RAG workflow
if baseline_time and gsi_rag_time:
    speedup = baseline_time / gsi_rag_time if gsi_rag_time > 0 else float('inf')
    time_saved = baseline_time - gsi_rag_time
    percent_improvement = (time_saved / baseline_time) * 100
    
    print("\n" + "="*70)
    print("COMPLETE RAG WORKFLOW PERFORMANCE COMPARISON")
    print("="*70)
    print(f"CrewAI RAG without GSI: {baseline_time:.2f} seconds")
    print(f"CrewAI RAG with GSI:    {gsi_rag_time:.2f} seconds")
    print(f"Performance improvement: {speedup:.2f}x faster")
    print(f"Time saved:              {time_saved:.2f} seconds ({percent_improvement:.1f}% improvement)")
    
else:
    print("\nUnable to calculate performance comparison due to workflow errors")


COMPLETE RAG WORKFLOW PERFORMANCE COMPARISON
CrewAI RAG without GSI: 143.46 seconds
CrewAI RAG with GSI:    133.87 seconds
Performance improvement: 1.07x faster
Time saved:              9.59 seconds (6.7% improvement)


## Interactive Demo

Test the complete optimized RAG system with a real query.

### Demo Function

In [20]:
def process_interactive_query(query, researcher, writer):
    """Run complete RAG workflow with CrewAI agents"""
    print(f"\nProcessing Query: {query}")
    print("=" * 80)
    
    # Create tasks
    research_task = Task(
        description=f"Research and analyze information relevant to: {query}",
        agent=researcher,
        expected_output="A detailed analysis with key findings"
    )
    
    writing_task = Task(
        description="Create a comprehensive response",
        agent=writer,
        expected_output="A clear, well-structured answer",
        context=[research_task]
    )
    
    # Execute crew
    crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, writing_task],
        process=Process.sequential,
        verbose=True,
        cache=True,
        planning=True
    )
    
    try:
        start_time = time.time()
        result = crew.kickoff()
        elapsed_time = time.time() - start_time
        
        print(f"\nCompleted in {elapsed_time:.2f} seconds")
        print("=" * 80)
        print("RESPONSE")
        print("=" * 80)
        print(result)
                
        return elapsed_time
    except Exception as e:
        print(f"Error: {str(e)}")
        return None

### Run Demo

In [21]:
# Disable logging for cleaner output
logging.disable(logging.CRITICAL)

# Run demo with a sample query
demo_query = "What are the key details about the FA Cup third round draw?"
final_time = process_interactive_query(demo_query, researcher, writer)

if final_time:
    print(f"\n\n✅ Demo completed successfully in {final_time:.2f} seconds")
    print("Your agent-based RAG system is ready for production! 🚀")


Processing Query: What are the key details about the FA Cup third round draw?


[1m[93m 
[2025-09-24 13:39:07][INFO]: Planning the crew execution[00m
[EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'key'


Output()

Output()

Output()

Output()


Completed in 46.47 seconds
RESPONSE
**The FA Cup Third Round Draw: An Overview**

The FA Cup, officially known as The Football Association Challenge Cup, is one of the oldest and most prestigious knockout football competitions in the world. Established in 1871, it is renowned for its rich history and tradition, offering a unique platform where lower-league teams can compete against top-tier clubs. The third round of the FA Cup is particularly significant as it marks the entry of Premier League and Championship clubs into the tournament, adding a new level of excitement and competition.

**Key Dates and Schedule**

The third-round ties of the FA Cup are scheduled to be played over the weekend of Saturday, 11 January. This stage of the competition is eagerly anticipated by fans and teams alike, as it often features thrilling matchups and the potential for "giant-killings," where lower-league teams defeat higher-ranked opponents.

**Competition Format**

The FA Cup third round introduces

## Conclusion

You have successfully built a powerful agent-based RAG system that combines Couchbase's high-performance GSI vector storage capabilities with CrewAI's multi-agent architecture. This tutorial demonstrated the complete pipeline from data ingestion to intelligent response generation, with real performance benchmarks showing the dramatic improvements GSI indexing provides.